| |
|
|
||||
![]() |
||||||
| |
|
|||||
|
Procedural Rendering on Playstation 2 VU Dataflow Designs In this section we’ll look behind some of the design decisions we’ll need to make when translating the Lifeform algorithm to PS2. The first decision is where to draw the line between CPU and VU calculations. There are several options, in order of difficulty:
For this tutorial I chose the second option as a halfway house between full optimization and simple triangle lists. Following down this path is not too far removed from ordinary engine programming and it leaves the final program wide open for further optimizations. Keep
Your Eye On The Prize GIF Tags are the essence of PS2 graphics programming. They tell the GS how much data to expect and in what format and they contain all the necessary information for an xgkick instruction to transfer data from VU1 to the GS automatically. Create the correct GIF Tag for your set of N vertices and everything else is pretty much automatic. The complications in PS2 programming arise when you start to work out how to get the GIF Tags to the GS. There are three routes into the GS. One is direct and driven by DMA (Path 3), one is indirect and goes via VIF1 (Path 2) and is useful for redirecting part of a DMA stream to the GS and the third is very indirect (Path 1) and requires you to DMA data into VU1 memory and executed a program that ends in a xgkick. Choosing
a Data Flow Design The diagrams are pretty abstract in that they only outline the data movement and control signals necessary but don’t show areas for constants, precalculated data or uploading the VU program code. We’ll be covering all these details later when we go in-depth into the actual algorithm used to render the Lifeform primitives. The other point to note is that none of these diagrams take into consideration the effect of texture uploads on the rendering sequence. This is a whole other tutorial for another day... Single
Buffer Benefits:
Process a large amount of data in one go.
First, the VIF unpacks a chunk of data into VU1 Memory (unpack). Next the VU program is called to process the data (mscal). When the data has been processed the result is transferred from VU1 Memory to the GS by an xgkick command. Because the GIF is going to be reading the transformed data from the VU memory we can’t upload more data until the xgkick has finished, hence the need for a flush. (there are three VIF flush commands flush, flushe and flusha, where flush waits for the end of both the VU program and the data transfer to the GS.) When the flush returns the process loops. Double
Buffer Benefits:
Uploading in parallel with rendering.
Although more parallel, VU calculation is still serialized. Data is unpacked to area A. DMA then waits for the VU to finish transferring buffer B to the GS with a flush (for the first iteration this should return immediately). The VU then processes buffer A into buffer B (mscal) while the DMA stream waits for the program to finish (flushe). When the program has finished processing buffer A the DMA is free to upload more data into it, while simultaneously buffer B is being transferred to the GS via Path 1 (xgkick). The DMA stream then waits for buffer B to finish being transferred (flush) and the process loops back to the beginning. Quad
Buffer Benefits:
Good use of parallelism – uploading, calculating and rendering all
take place simultaneously,
much like a RISC instruction pipeline. The best technique for out-of-place processing of vertices or data amplification. Drawbacks: Data can only be processed in <8KB chunks. There are three devices accessing the same area of memory at the same time – VU, VIF and GIF. The VU has read/write priority (at 300MHz) over the GIF (150MHz) which has priority over the VIF (150MHz). Higher priority devices cause lower priority devices to stall if there is any contention meaning there are hidden wait-states in this technique.
First, the DMA stream sets the base and offset for double buffering – usually the base is 0 and the offset is half of VU1 memory, 512 quads. The data is uploaded into buffer A (unpack), remembering to use the double buffer offset. The program is called (mscal) which swaps the TOP and TOPS registers, so any subsequent unpack instructions will be directed to buffer C. The DMA stream then immediately unpacks data to buffer C and attempts to execute another mscal. This instruction cannot be executed as the VU is already running a program so the DMA stream will stall until the VU has finished processing buffer A into B. When the VU has finished processing, the mscal will succeed causing the TOP and TOPS registers to again be swapped. The VU program will begin to process buffer C into D while simultaniously transferring buffer B to the GS. This process of stalls and buffer swaps continues until all VIF packets have been completed. Triple
Buffer Benefits: All the benefits of quad buffering with larger buffer sizes. Best technique for simple in-place transform and lighting of precalculated vertices. Drawbacks: Cannot use TOP and TOPS registers – you must handle all offsets by hand and remember which buffer to use between VU programs. Three streams of read/writes again introduce hidden wait states.
Data is transferred directly to buffer A (all destination pointers must be handled directly by the VIF codes – TOP and TOPS cannot be used) and processing is started on it. Simultaneously, data is transferred to buffer B and another mscal is attempted. This will stall until processing of buffer A is finished. Processing on Buffer B is started while buffer A is being rendered (xgkick). Meanwhile buffer C is being uploaded. The three-buffer pipeline continues to rotate A->B->C until all VIF packets are completed. Parallel
Processing Benefits: All units are fully stressed. VU1 can be using any of the previous techniques for rendering. Drawbacks: Moving data from VU0 to Scratchpad efficiently is a complex issue. Large amounts of main memory are needed as buffers.
With VU1 running one of the previous techniques (e.g. quad buffering), the gaps in GS rendering are filled by a Path 3 DMA stream of GIF tags and data from main memory. Each of the GIF tags must be marked EOP=1 (end of primitive) allowing VU1 to interrupt the GIF tag stream at the end of any primitive in the stream. Data is moved from Scratchpad (SPR) to the next-frame buffer using burst mode. Using slice mode introduces too many delays in bus transfer as the DMAC has to arbitrate between three different streams. Better to allow the SPR data to hog the bus for quick one-off transfers. Note in the diagram how both the VIF1 mscal and the VU1 xgkick instructions are subject to stalls if the receiving hardware is not ready for the new data. ______________________________________________________ |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|