Shading subsystem
To clarify the term shader, Figure 3 shows the data structure that defines
one. A shader is a data structure that describes “how to compute colors”
for rendering polygons. The term building a shader refers to the process
of transforming such a data structure into a stream of GX commands, which
configure the hardware to produce the desired output. One should note
that during the shader build many features are activated dynamically.
For instance, if an object should get tinted a color multiplication is
added to the final output color whatever shader was setup before. In addition,
layered fog adds another stage at the end of the color computation that
blends depending on the pixel’s height in world space and distance from
the camera in eye space between a fog color and the original pixel color.
Of course, it would be nice, if the complete shader subsystem would be
dynamic like that. However, experience shows that many shaders can only
be built by hand into an optimal (i.e. least cycle usage) setup.
typedef struct tShader {
char mName[16];
// shader name,
as in Maya...
tShaderLimit mShaderLimit;
// shading limit, i.e. type...
tShaderFlags mShaderFlags;
// additional flags...
tShaderColor mColor;
//
material color, if used...
tShaderColor mSpecularColor;
// specular color, if used...
f32 mReflectivity;
// reflectivity strength...
f32 mEmbossScale;
// emboss depth...
f32 mMovieFade;
// movie shader
weight...
tShaderDataDescriptor mDataInfo;
// texture info...
} tShader;
Figure 3: A ‘tShader’ data structure describing
how to render polygons.
Mostly the structure’s members describe various properties of a shader
(like the material color, the specular color and cosine power for phong
shaders, the reflectivity for reflective phong shaders, etc.). The most
important member is the mShaderLimit , which describes what kind of shading
is actually performed. The term limit illustrates at what point the color
computation actually should stop. Roughly, the following shading limits
were implemented:
- diffuse
- mapped
- illuminated
mapped
- phong
- phong
mapped
- phong
mapped gloss
- phong
illuminated mapped
- phong
illuminated mapped gloss
- reflective
phong
- reflective
phong mapped
- reflective
phong mapped gloss
- reflective
phong illuminated mapped
- reflective
phong illuminated mapped gloss
- bump
mapped
- bump
illuminated mapped
- bump
phong mapped
- bump
phong mapped gloss
- bump
phong illuminated mapped
- bump
phong illuminated mapped gloss
- bump
reflective phong mapped
- bump
reflective phong mapped gloss
- bump
reflective phong illuminated mapped
- bump
reflective phong illuminated mapped gloss
- emboss
mapped
- emboss
illuminated mapped
This clearly shows, that an automatic way of generating code to generate
shader setups would be very nice, since all the shaders listed above need
to be maintained. However, as mentioned above, a couple of features where
added in automagically already, like self-shadowing and tinting for example.
typedef struct tConfig {
// resources currently allocated...
GXTevStageID stage;
// current tevstage...
GXTexCoordID coord;
// current texture coordinate...
GXTexMapID map;
// current texture map...
// ...
} tConfig;
void shadingConfig_ResetAllocation(tConfig *pConfig);
void shadingConfig_Allocate(tConfig *pConfig, s32 stages, s32 coords,
s32 maps);
void shadingConfig_Flush(tConfig *pConfig);
Figure 4: Data structure ‘tConfig’ describing
resource allocation.
Since a shader setup is built by various functions, one needs to keep
track of various hardware resources. For example, texture environment
stages, texture matrices, texture coordinates, texture maps and such need
to be allocated in a sequential manner. Some resources require special
order requirements; texture coordinates for emboss mapping always need
to be generated last. Therefore, infrastructure is needed to deal with
the allocation problems. The solution here is another tiny data structure
tConfig (c.f. figure 4).
This structure holds information about the resources used during setup.
Before any resources are used, the allocation information is reset using
the shadingConfig_ResetAllocation();
call. For each tevstage, texture coordinate and etcetera used, a call
to shadingConfig_Allocate();
is made, which marks the corresponding resources as being allocated. When
one now calls a subroutine that inserts additional GX commands, the tConfig
structure is passed along as a parameter. The called function can now
have a look at the structure’s members and knows what tevstage, texture
coordinate and etcetera to use next. When the shader construction is done,
the function shadingConfig_Flush();
is called, which actually passes the number ofresources used to the hardware.
Error checking can be preformed here as well.

Figure 5: Shading subsystem client control flow.
The shading subsystem needs to maintain a global GX state as well. This
is because the shading subsystem is not the only client to GX. Other parts
of the game program will issue GX commands and setup their own rendering
methods in specific ways. However, since the shading subsystem has to
initialize quite a bit of GX state to work properly, a protocol needs
to be introduced to reduce the amount of redundant state changes (c.f.
figure 5). The straightforward solution of initializing the complete GX
state as needed by the shading subsystem is of course way to slow. A boolean
variable keeps track if the shading subsystem has initialized GX for its
usage. Every time it is about to be used, a function shading_PreRenderInit();
is called. This function checks the flag. If it’s false GX is initialized
and the flag is set to true. The next time some shading needs to be done,
the boolean is already true and the setup can be skipped. On the other
hand, when other parts of the game program do some ‘hard coded’ GX usage,
they need to reset that boolean by calling shading_GxDirty();.
Finally, the shading subsystem keeps books about various default settings
for the texture environment. If subsequent shader setups share some settings
that are the same, a couple of GX commands can be skipped, since it is
in an already known state. If another shader is setup, the function shading_CleanupDirtyState();
cleans the marked dirty state and leaves GX in the expected way behind.
Those optimizations helped quite a bit in the end to maintain a reasonable
frame rate.
Lighting Pipeline
The actual purpose of the so-called lighting pipeline is to deliver light
color values per shaded pixel. As mentioned before, all lights are classified
into either global or local lights and the methods of computing light
color values vary. The results of global lighting can be computed in three
different ways: per vertex, per pixel using emboss mapping, and per pixel
using bump mapping.
All three of these methods come in two variants one with self-shadowing
and one without. When self-shadowing is enabled, the directional component
of the global light is not added to the output color value if the pixel
to be shaded falls in shadow. The ambient color is the only term that
then contributes to the global lighting. The conditional add is facilitated
using the two different color channels GX_COLOR0
and GX_COLOR1. The first
one carries the directional component of the global light whereas the
second one is assigned to all local lights and the ambient term.
Local lights are always computed per vertex using the lighting hardware
and are fed into GX_COLOR1.
Note that both channels are combined when self-shadowing is not enabled.
There is a tiny problem when color per vertex is used for painting and
two channels are used. The hardware is not able to feed one set of color
per vertex data without sending the same data twice into the graphics
processor. Therefore, one needs to decide if the local lights are computed
unpainted (which only leads to visible artifacts, if local lights are
contributing) or if color per vertex data is sent twice into both color
channels, eating up vertex performance. Experience showed that not painting
the local lights was quite ok, and nobody really noticed.
The control flow of the lighting pipeline is a bit tricky. The problem
here is that at an object’s rendering time all local lights possibly intersecting
the object’s bounding sphere, need to be known (c.f. figure 6).

Figure 6: Lighting pipeline, control flow.
This requires the creation of a local light database that holds information
about all local lights influencing the visible geometry. The game program
needs to add all local lights to the database before any rendering takes
place. Care needs to be taken when it comes down to culling lights for
visibility against the view frustum. The reason is that the distance attenuation
function as used per default by GX has no precise cutoff point and therefore
setting up a point light with a 50m radius does not mean that no light
will contribute to any polygons starting at any distance > 50m. Light
will pop on and off if the lights are collected by software culling assuming
a 50m radius. A fludge factor of 2.0f proved to be quite successful here.
Once all visible lights are added, rendering can begin. First, all logical
lights are transformed into an array of GXLightObj
objects. This array is double (or triple) buffered to let the graphics
processor always have a private copy to read from while the CPU is generating
the array for the next frame. An array is constructed since each object
is likely to receive it’s own set of local lights. Instead of storing
many copies of GXLightObj
objects in the FIFO using GXLoadLightObjImm();
we instruct the graphics processor to fetch lights indirectly from the
constructed array using the faster GXLoadLightObjIdx();
function.
As rendering of an object starts, all intersecting lights are collected
(up to a maximum number of eight lights) and loaded from the array. Note
that one should remember which lights are loaded to avoid unnecessary
loads. The light mask is computed (an eight bit value as sent to GXSetChanCtrl();
) that describes what lights actually contribute to the lighting calculation
in hardware. This mask value is the interface between the shading subsystem
and the lighting pipeline. This is because GXSetChanCtrl();
not only specifies what lights are enabled but also how color per vertex
is used and what kind of light calculation is performed. Therefore, parameters
to this GX call are coming in from the two different subsystems at different
times. By storing the light mask value as an easy accessible variable
and making sure, that no bogus values are loaded in case of the first
shader setup (i.e. when no lights have been collected yet), this problem
can be solved.