Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Procedural Rendering on Playstation 2
View All     RSS
May 17, 2021
arrowPress Releases
May 17, 2021
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Procedural Rendering on Playstation 2


September 26, 2001 Article Start Previous Page 5 of 7 Next
 

VU Dataflow Designs

In this section we’ll look behind some of the design decisions we’ll need to make when translating the Lifeform algorithm to PS2. The first decision is where to draw the line between CPU and VU calculations. There are several options, in order of difficulty:

  1. The main program calculates an abstract list of triangle strips for the VU to transform and light.
  2. The main program calculates transform and lighting matrices each primitive (sphere, torus). The VU is passed a static description of an example primitive in object space which the VU transforms into world space, lights and displays.
  3. The main program sends the VU a description of a single horn plus an example primitive in object space. The VU iterates along the horn, calculating the transform and lighting matrices for each rib and instances the primitives into world space using these matrices.
  4. The main program sends the VU a numerical description of the entire model, the VU does everything else.

For this tutorial I chose the second option as a halfway house between full optimization and simple triangle lists. Following down this path is not too far removed from ordinary engine programming and it leaves the final program wide open for further optimizations.

Keep Your Eye On The Prize
Before setting out it’s useful to remember what the ultimate aim of the design is - to send correct GS Tags and associated data to the GS. A GIF Tag contains quite a bit of information, but only a few of them are important.

GIF Tags are the essence of PS2 graphics programming. They tell the GS how much data to expect and in what format and they contain all the necessary information for an xgkick instruction to transfer data from VU1 to the GS automatically. Create the correct GIF Tag for your set of N vertices and everything else is pretty much automatic.

The complications in PS2 programming arise when you start to work out how to get the GIF Tags to the GS. There are three routes into the GS. One is direct and driven by DMA (Path 3), one is indirect and goes via VIF1 (Path 2) and is useful for redirecting part of a DMA stream to the GS and the third is very indirect (Path 1) and requires you to DMA data into VU1 memory and executed a program that ends in a xgkick.

Choosing a Data Flow Design
The first problem is to choose a dataflow design. The next few pages contain examples of basic data flows that can be used as starting points for your algorithms. Each dataflow is described two ways – once using it’s physical layout in memory showing where data moves to and from, and once as a time line showing how stall conditions and VIF instructions control the synchronization between the different pieces of hardware.

The diagrams are pretty abstract in that they only outline the data movement and control signals necessary but don’t show areas for constants, precalculated data or uploading the VU program code. We’ll be covering all these details later when we go in-depth into the actual algorithm used to render the Lifeform primitives.

The other point to note is that none of these diagrams take into consideration the effect of texture uploads on the rendering sequence. This is a whole other tutorial for another day...

Single Buffer
Single buffering takes one big buffer of source data and processes it in-place. After processing the result is transferred to the GS with an xgkick, during which the DMA stream has to wait for completion of rendering before uploading new data for the next pass.

Benefits: Process a large amount of data in one go.

Drawbacks: Processing is nearly serial – DMA, VU and GS are mostly left waiting.


Single Buffer Layout and Timing

First, the VIF unpacks a chunk of data into VU1 Memory (unpack).

Next the VU program is called to process the data (mscal). When the data has been processed the result is transferred from VU1 Memory to the GS by an xgkick command. Because the GIF is going to be reading the transformed data from the VU memory we can’t upload more data until the xgkick has finished, hence the need for a flush. (there are three VIF flush commands flush, flushe and flusha, where flush waits for the end of both the VU program and the data transfer to the GS.)

When the flush returns the process loops.

Double Buffer
Double Buffering speeds up the operation by allowing you to upload new data simultaneously with rendering the previous buffer.

Benefits: Uploading in parallel with rendering.

Works with Data Amplification where the VU generates more data than uploaded, e.g. 16 verts expanded into a Bezier Patch. Areas A and B do not need to be the same size.

Drawbacks:
Less data per chunk.


Double Buffer Layout and Timing

Although more parallel, VU calculation is still serialized. Data is unpacked to area A. DMA then waits for the VU to finish transferring buffer B to the GS with a flush (for the first iteration this should return immediately).

The VU then processes buffer A into buffer B (mscal) while the DMA stream waits for the program to finish (flushe). When the program has finished processing buffer A the DMA is free to upload more data into it, while simultaneously buffer B is being transferred to the GS via Path 1 (xgkick). The DMA stream then waits for buffer B to finish being transferred (flush) and the process loops back to the beginning.

Quad Buffer
Quad buffering is the default choice for most PS2 VU programs.The VU memory is split into two areas of equal size, each area double bhuffered. When set up correctly, the TOP and TOPS VU registers will automatically transfer data to the correct buffers.

Benefits: Good use of parallelism – uploading, calculating and rendering all take place simultaneously, much like a RISC instruction pipeline.

Works well with the double buffering registers TOP and TOPS, which may have caching advantages.

The best technique for out-of-place processing of vertices or data amplification.

Drawbacks: Data can only be processed in <8KB chunks.

There are three devices accessing the same area of memory at the same time – VU, VIF and GIF. The VU has read/write priority (at 300MHz) over the GIF (150MHz) which has priority over the VIF (150MHz). Higher priority devices cause lower priority devices to stall if there is any contention meaning there are hidden wait-states in this technique.


Quad Buffer Layout and Timing

First, the DMA stream sets the base and offset for double buffering – usually the base is 0 and the offset is half of VU1 memory, 512 quads.

The data is uploaded into buffer A (unpack), remembering to use the double buffer offset. The program is called (mscal) which swaps the TOP and TOPS registers, so any subsequent unpack instructions will be directed to buffer C.

The DMA stream then immediately unpacks data to buffer C and attempts to execute another mscal. This instruction cannot be executed as the VU is already running a program so the DMA stream will stall until the VU has finished processing buffer A into B.

When the VU has finished processing, the mscal will succeed causing the TOP and TOPS registers to again be swapped. The VU program will begin to process buffer C into D while simultaniously transferring buffer B to the GS.

This process of stalls and buffer swaps continues until all VIF packets have been completed.

Triple Buffer
Another pipeline technique for parallel uploading, calculation and rendering, this technique relies on in-place processing of vertices.

Benefits: All the benefits of quad buffering with larger buffer sizes.

Best technique for simple in-place transform and lighting of precalculated vertices.

Drawbacks: Cannot use TOP and TOPS registers – you must handle all offsets by hand and remember which buffer to use between VU programs.

Three streams of read/writes again introduce hidden wait states.


Triple Buffer Layout and Timing

Data is transferred directly to buffer A (all destination pointers must be handled directly by the VIF codes – TOP and TOPS cannot be used) and processing is started on it.

Simultaneously, data is transferred to buffer B and another mscal is attempted. This will stall until processing of buffer A is finished.

Processing on Buffer B is started while buffer A is being rendered (xgkick). Meanwhile buffer C is being uploaded. The three-buffer pipeline continues to rotate A->B->C until all VIF packets are completed.

Parallel Processing
This technique is the simplest demonstration of how to use the PS2 to it’s maximum – all units are fully stressed. All the previous techniques have used just one VU for processing and the GS has been left waiting for more data to render. In this example we use precalculated buffers of GIF tags and data to fill the gaps in GS processing, at the cost of large amounts of main memory. Many of the advanced techniques on PS2 are variations optimized to use less memory.

Benefits: All units are fully stressed.

VU1 can be using any of the previous techniques for rendering.

Drawbacks: Moving data from VU0 to Scratchpad efficiently is a complex issue.

Large amounts of main memory are needed as buffers.


Parallel Processing Data Flow and Timing

With VU1 running one of the previous techniques (e.g. quad buffering), the gaps in GS rendering are filled by a Path 3 DMA stream of GIF tags and data from main memory. Each of the GIF tags must be marked EOP=1 (end of primitive) allowing VU1 to interrupt the GIF tag stream at the end of any primitive in the stream.

Data is moved from Scratchpad (SPR) to the next-frame buffer using burst mode. Using slice mode introduces too many delays in bus transfer as the DMAC has to arbitrate between three different streams. Better to allow the SPR data to hog the bus for quick one-off transfers.

Note in the diagram how both the VIF1 mscal and the VU1 xgkick instructions are subject to stalls if the receiving hardware is not ready for the new data.


Article Start Previous Page 5 of 7 Next

Related Jobs

Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Senior Project Manager
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Senior Producer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Project Manager
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.17.21]

Producer





Loading Comments

loader image