| |
|
|
||||
![]() |
||||||
| |
|
|||||
|
Procedural Rendering on Playstation 2 The Design The Lifeform program has to instance many copies of the same primitive, each one with a different object to world transformation matrix. To achieve this efficiently we can use a variation on the Triple buffering system. This explanation will add a few more practical detail than the earlier dataflow examples. First, we unpack the data for an example primitive into buffer A. This data packet contains everything we need to render a single primitive – GIF tag and vertices – except the vertices are all in object space. We will need to transform the verts to world space and calculate RGB values for the vertices for Gouraud shading.
Next we
upload an object-to-screen transformation matrix which is the concatenation
of: where the object-world matrix was calculated by the Horn algorithm. Multiplying the object space vertices with this matrix will transform them directly to screen space ready for conversion to raster space. We then execute the VU program which transforms the object space verts in buffer A to buffer B and xgkicks them. Simultaneously we upload a new header to VU memory and attempt to start processing buffer B with an mscal, stalling until the VU has finished processing. Here is a diagram of the data flow rendering two horns, the first a horn of three torii and the second a horn of two spheres. Due to the first torus, say, taking up a lot of screen space, it causes the xgkick of the second torus to wait until rendering is complete: Untransformed
Primitives We want the spheres and toruses to render as quickly as possible. Here we come to a dilemma. In PC graphics we are always told to minimize the bus bandwidth and use indexed primitives. It turns out that VU code is not well suited to indirection – it’s optimized for blasting through VU instructions linearly. The overhead of reordering indexed primitives far outweighs the benefits in bus upload speed so, at least for this program, the best solution is just to produce long triangle strips. A torus can simply be represented as one single long strip if we are allowed to skip rendering certain triangles. The ADC bit in can achieve this – we pass the ADC bit to the VU program in the w component of our surface normals, but you could just as easily pass it in the lowest bit of any 32-bit float. The accuracy is almost always not required. Spheres cannot be properly described by a single tristrip without duplicating a lot of vertices and edges. Here I opted to produce spheres as (in this order) two trifans (top and bottom) plus one tristrip for the inner “bands” of triangles. This is the reason we have two VU programs – the sphere must embed three GIF tags in it’s stream rather than just one for the torus.
We are free to calculate the vertices as for an indexed primitive, but they must be stored as triangle strips in a DMA packet. Here is the code used to do the conversion for the torus and there is very similar code for the sphere. First we dynamically create a new DMA packet and the GIF tag:
Then we setup the pointers to our indexed vertices. Vertices are in vertex_data[] , normals are in normal_data[] and indices are in strip_table[].
uint128 *vptr = (uint128 *)vertex_data; Finally we loop over the vertex indices to produce a single packet of vertices and normals that can be used to instance toruses:
Here is the layout I used. The VU memory is broken into five areas – the constants, the header, the untransformed vertices with GIF tags (buffer A) and the two buffers for transformed vertices (buffers B & C).
Constants. The constants for rendering a horn are a 3x3 light direction matrix for parallel lighting, and a 3x4 matrix of light colors (three RGB lights plus ambient). The light direction matrix is a transposed matrix of unit vectors allowing lighting to be calculated as a single 3x3 matrix multiply. To light a vertex normal we have to calculate the dot product between the surface normal and direction to the light source. In math, the end result looks like this:
color = Ksurface * ( Ilight * N.L ) where N is the unit surface normal, I is illumination and K is a reflectance function (e.g. the surface color). Because the VU units don’t have a special dot product instruction we have to piece this together out of multiplies and adds. It turns out that doing three dot products takes the same time as doing one so we may as well use three light sources:
color = Ksurface * Sum(n, In * N.Ln) So, first we calculate the dot products into a single vector – this only works because our light vectors are stored in a transposed matrix:
Then we
multiply through by the light colors to get the final vertex color:
All this information is calculated during the previous frame, embedded in a DMA packet and uploaded once per primitive at rendering time. Untransformed Vertices. After the header is stored a GIF Tag (from which we can work out the number of vertices in the packet) and the untransformed vertices and normals. The VU
Program
VCL compiles the program resulting in an inner loop of 22 cycles. This can be improved (see later) but it’s not bad for so little effort. Running
Order
The first
job the program has is to upload the VU programs to VU Program Memory.
There are two programs in the packet, one for transforming and lighting
toruses and one for transforming and A short
script generates an object file that can be linked into your executable,
and also defines four global variables for you to use as extern pointers.
vu1_packet_begin and vu1_packet_end
allow you to get the starting address and (should you want it) the length
of the of the DMA packet. torus_start_here
and sphere_start_here
are the starting addresses of the two programs relative to the start
of VU Program Memory. You can use these values for the mscal VIF instruction.
The program then enters it’s rendering loop. The job of the rendering loop is to render the previous frame and calculate the DMA packets for the next frame. To do this we define two global DMA lists in uncached accelerated main memory:
For the first iteration we fill the previous frame with an empty DMA tag so that it will do nothing. last_packet->End();
From this
point on all data for the next frame gets appended to The next job is to upload the constants. This is done once per frame, just in case you want to animate the lighting for each render. Also in this packet we set up the double buffering base and offset. void upload_light_constants(CVifSCDmaPacket
*packet, mat_44 &direction, mat_44 &color) After all this it’s time to actually render some primitives. First we have to upload the untransformed vertices in to buffer A. These verts are calculated once, procedurally, at the beginning of the program and stored in a VIF RET packet, allowing the DMA stream to execute a call and return something like a function call.
if(inform->type
== torus) { After the data had been uploaded to buffer A we can set about generating instances of the primitive. To do this, all we have to set the header information at and call the program. Lather, rinse, repeat. void Torus::add_header_packet(CVifSCDmaPacket
*packet, So we’ve generated all the horns and filled the DMA stream for the next frame. All that’s left to do is to flip the double buffered screen to show the previous render, swap the buffer pointers (making the current packet into the previous packet) and render the previous frame.
// wait for vsync Further
Optimizations and Tweaks
Results
______________________________________________________ |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|