Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Shader Integration: Merging Shading Technologies on the Nintendo Gamecube
View All     RSS
September 25, 2020
arrowPress Releases
September 25, 2020
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Shader Integration: Merging Shading Technologies on the Nintendo Gamecube

October 2, 2002 Article Start Previous Page 4 of 5 Next

Method 8: Projected Shadows

A similar technique is projecting shadows. In this case, the shadow is not cast on the object in question itself but onto receiver geometry. Once again, the object is pre-rendered from the imaginary point of light using an orthogonal projection (same as in method 7). However, now one does not grab Z values but the outline of the object in the color buffer. This outline is rendered in a second pass on the receiver geometry (using the receiver geometry itself and compute texture coordinates from its vertices). Strictly speaking, with the texturing capabilities there is no need to render the receiver geometry in a second pass, however, allowing multiple shadows to be cast on one piece of receiving geometry would add much to the complexity of the algorithm. Nevertheless, if the number of shadows is fixed, one should render them in a single go to save on transform time (c.f. figure 15 for more details).

This approach results in a big requirement for the geometry subsystem. It needs to be able to return chunks of geometry. Of course, one just needs to re-render those polygons that are affected by the shadow outline. Fast methods of constructing these chunks are an important requirement.

During the second render pass, one has two options to shade the fragments: darkening the pixels that fall in shadow, or re-rendering the pixels as described by the receivers material properties omitting global directional light.

It is worth noting that the pre-render passes for the projected shadows can be easily combined with the pre-rendering passes required for self-shadowing. This is because both render the object from the same virtual point of light. Instead of rendering one depth map and another outline just the depth map is rendered and grabbed. Nothing changes for the self-shadowing technique. However, the during rendering the projected shadows onto the receiver geometry the actual Z values are fetched and compared against 0xff, which represents the far clipping plane during the pre-render pass for self-shadowing. If the fetched depth is < 0xff the receiver geometry falls into shadow otherwise it is exposed to light.

Figure 15: Projected shadows.

Method 9: Projected Reflections

Projected Reflections are very similar to projected shadows (c.f. figure 16). A receiving piece of geometry is re-rendered here as well. In this case, the object is pre-rendered from an imaginary viewpoint, which the object would be looked at if the camera would be mirrored at the reflective plane. In this case, however, it is important that a perspective projection is used to project the geometry onto the receiving geometry.

If the reflective geometry is intersecting the mirroring plane (as it could be the case with water being the reflective element in a scene), pixels falling underneath that plane must be carefully cut-off. Outputting an alpha of zero for those pixels (using the same techniques as described in method 10, layered fog) and configuring GX to skip alpha = 0 pixels does the job.

Figure 16: Projected reflections.

Method 10: Layered Fog

To compute layered fog (e.g. fog, which changes its intensity not only based on the distance to the camera but also on the height in world space) one needs to compute/lookup an intensity value, which describes how much a pixel is fogged. To do so, a texture coordinate generation is setup that transforms vertices back from eye into world space using the dual transform feature. It’s nice that one can actually use exactly the same matrix (e.g. GX_PNMTX0) to do the first part of the texture coordinate transformation. The second matrix multiply maps the Y component onto a ramp texture. Scaling and transformation have to be carefully adjusted to map the wanted gradient of world Y coordinates onto the U [0, …, 1] range. In the same manner eye Z is mapped into V [0, …, 1], c.f. figure 17 and code fraction 4 for more details.

Figure 17: Layered Fog.

Note that one could use a color texture instead of an intensity texture to get results that are even more advanced.

There is a drawback generating layered fog like this. It only works right as long as the camera is reasonably outside the fog volume. As soon as it dives deeply into it and looks up, the un-fogged polygons higher up are still visible. However, that can be compensated with figuring out how much the camera is in fog (height wise) and then fog everything, i.e. dynamically adjusting the fog ceiling.

Method 11: Custom Dithering

For some surfaces banding is a problem. Especially sky textures with their subtle color gradients suffer when the frame buffer is configured using a destination alpha buffer that is required by so many rendering methods (only six bits are stored per component to allow for an additional six bit alpha channel). The built in hardware dither helps already, however, the results could be better. Adding a repetitive pattern to the outputted pixels fools the human eye into not recognizing the banding as much as before. The pattern just needs to be a “four x four” pixel texture that contains biased positive and negative offsets that are added to the outputted pixels. Additional control can be given, when the dither pattern is multiplied by a factor before adding and therefore adjustment of the dither strength is possible. The only problem here is that the dither pattern must be screen-space aligned. Each vertex must be transformed into screen space aligned texture coordinates. A trick similar to the one used in method 10 (layered fog) helps here (c.f. figure 18). The incoming vertices are transformed into eye space using the regular model to eye matrix and using the dual transform feature the dither pattern is aligned onto the screen.

Figure 18: Custom dithering.


Many of the sketched shading methods require pre-rendering objects. This suggests that pre-rendering has to become an integral component of the game program and measures need to be taken to ensure proper resource usage for both processing time and texture storage. The first thing is that pre-render passes should be combined when possible (c.f. method 7 and method 8). This is quite an obvious gain both in time and in storage.

In addition, storage should be organized in pools that provide a fixed number of slots to be rendered into. Before pre-rendering starts, all objects that require it have to be gathered, sorted by distance from the camera and then get slots assigned (the first couple of slots can even have a slightly bigger texture size). If the pool runs out of slots, pre-rendering stops and some object will loose their respective properties (like self shadowing).

The last and most important thing is that shading and geometric information have to be strictly separated. It must be possible to render an object fully lit, with all texturing features, etc. and to the same extent it must be easy to render just raw, fully lit polygons of the same geometry. This is because many pre-render passes just need the outline or depth values. Sending more data into the graphics processor, like texture coordinates or colors will just slow the process down. It also must be possible to render objects from different viewpoints. Unfortunately, this is only possible with the construction of different sets of matrices that all need to be computed by the CPU. Storage organization of these matrices is important. In addition, one does not need to forget that rendering the same geometry with the same set of matrices does not mean that those matrices need to be recomputed.

Figure 19: Texture coordinate generation.

Texture Coordinate Generation

A couple of shading methods are using the fragments orientation (i.e. normal data) as a basis and it is a common step to transform the model space normals into eye space first. Therefore, it is a good idea to split all texture coordinate generations into two passes. The first pass is shared between all methods and the math needs to be revised to operate on the proper eye space normals (c.f. figure 19).

Indirect texturing

A unique and very interesting feature of the Nintendo Gamecube is the indirect texture unit. It is capable of modifying/generating texture coordinates per pixel and therefore allows for a wide variation of effects. Rippled decals, heat effects and shockwaves are common uses. When it is used together with grabbing the frame buffer, the results are impressive. Figure 20 illustrates the control flow when rendering shockwaves. The problem here is that texture coordinates for the shockwave geometry needs to be computed. One more time, the dual transform feature of the texturing hardware helps. The model coordinates are transformed into eye space and then projected onto the screen using the same matrix as loaded with GXSetProjection(); . The output is rendered back into the frame buffer. The point of grabbing the frame buffer must be determined carefully since it can not be truly the very last thing since this would affect all overlays and score displays as well.

Figure 20: Indirect texturing example.

Merging different shading algorithms does not come without any effort, however, the balanced architecture of the Nintendo Gamecube supports a wide variety of methods that combined together make the visual difference.

To summarize: Care must be taken while combining algorithms, shader should be constructed out of components or parts, and separation of global and local lights helps a great deal. Consistent lighting during runtime is a big step forward. Make all geometry lit dynamically during runtime and use color per vertex just for painting, not pre-lighting. Per pixel methods usually give better results, however, they are more expensive. Geometry and shading should be strictly separated.

Landscape Shading

The landscape in Rogue Leader is height-map based, and uniformly divided into smaller render-units called meta-tiles (c.f. figure 21). One meta-tile covers 128x128 meters, and all triangles belonging to a meta-tile must use the same shader. This restriction is enforced to simplify and improve the triangle stripping of the landscape (on the Nintendo GameCube™, as most other modern graphics pipelines, efficient triangle stripping is important to obtain high performance). When deciding the size of the meta-tiles, local lighting also has to be considered. Since the Nintendo Gamecube has a limited number of hardware lights (eight), large meta-tiles imply “fewer lights per area”. In addition, the larger the meta-tiles, the greater the chance to draw too much geometry with lighting enabled when it’s not necessary (which is not good – enabling hardware lights certainly do not come for free).

Figure 21: Landscape split up in metatiles and their LOD values.

When the landscape engine was programmed, the geometry part was implemented before the landscape shaders (c.f. figure 22). This meant that during the working phase for the landscape geometry, it was hard to see how high polygon-count was needed to make the results look sufficiently detailed and complex. As the shaders started to get in place, it became clear that the complexity of the geometry could be reduced without sacrificing the complexity in visual appearance (in the final game, a typical landscape view uses about 14k triangles). This experience shows the importance of a balance between the complexity in geometry, and the shaders applied to that geometry.

Figure 22: Geometry vs. shading.

Landscape Texturing

A texture layer is defined to be an affinely transformed repeated texture image. This implies that one texture image can give rise to several texture layers. Texture layers are applied to the landscape by vertically projecting them onto the surface; which is ok since the surface is a height-map, so any vertical line only intersects the surface once. Besides being easy to implement on both the tools and engine side, this approach is also memory efficient, since the texture coordinates do not need to be stored/loaded, but are derived directly from the position of the vertices (c.f. figure 23).

Figure 23: Texture coordinate generation.

The actual texturing of the landscape is done by blending/mixing several texture layers together across the surface. This multi-texturing approach eliminates the need for transition textures, and gives a varied look from a relatively few high-detail texture images. Since the scaling of a texture layer can be non-uniform, this approach can also help to combat the texture stretching which is common in height-map based landscapes in areas with steep slopes. According to the gradient of the height-map surface, one can apply texture layers that are scaled down in roughly that direction.

The work of specifying how the texture layers should be blended together was done in the in-house level design program called L3DEdit. This tool has features for managing texture layers, as well as blending them together. More technically, for each texture layer, L3DEdit maintains a corresponding gray-scale image which says how much of that texture layer should be present. These gray-scale images are called mix-maps, and the sum of all corresponding pixels from all mix-maps should always be one (or 255, if you like). L3DEdit can preview the blended landscape on the texture artist’s workstation, and allows for interactive changes to the mix-maps.

Figure 24: Mixing three texture layers.

For performance and memory reasons, on the Nintendo GameCube™ a meta-tile is not allowed to use more than three different texture layers blended together. In the data conversion each meta-tile that has non-trivial blending, is assigned a 32x32 pixels texture image that contains the mix-map information for that meta-tile. For meta-tiles that blend two or three texture layers, we store this information in four bits (I4) and eight bits (IA4) mix-map textures, accordingly. Duplicate mix-map texture tiles are ignored to preserve memory. To avoid seams between adjacent meta-tiles due to bi-linear texture filtering, only 31x31 unique pixels are used for each meta-tile – the last pixel rows are copied from the next, adjacent meta-tiles. To implement the texture layer blending on the Nintendo GameCube™, the texture environment is then set up to compute

mIT0 + (1-mI)T1

for blending two texture layers, and

mIT0 + mAT1 + (1-mI-mA)T2

for blending three texture layers (c.f. figure 24 for the most complicated case).

Landscape Self-Shadowing

The self-shadowing of the height-map surface is pre-computed as a shadow table in the data conversion. It turned out that doing ground-sun line segment intersections with the landscape in real-time was too expensive; even when the results were sparsely updated and cached. The shadow table stores the ground-sun intersection result for 256 different sun positions for each height-map vertex (in the game in-between values are interpolated). On a meta-tile basis, series of 256 intersection results is efficiently encoded for in a state-change array, and the number of state-changes is stored using a simple form of static Huffman encoding. The size of the shadow table is of course very dependent on how the sun is moving. In most levels, the shadow table is small (150-300k), but in some level where the sun is raising the table is >800k. In the game engine, meta-tile shadow information is decoded when it’s needed, and cached together with other vertex data. To get softer shadows a nine point/tap filter is applied to the shadow values (c.f. figure 25).

Figure 25: Landscape self shadowing.

This filter smoothes the shadow values, and can be efficiently implemented using only four additions and two shifts (by reusing previously computed columns for adjacent vertices). These shadow values are then sent to the texture environment as a color per vertex. The polygon interpolated shadow value is then multiplied with the global light color.

Together with the texture layers, additional effect maps are used to further enhance the detail/depth impression: emboss style bump-mapping, far-distance detail map, and a cloud map. The emboss style bump mapping is only used for up-close meta-tiles. A color per vertex value describes how to fade the map in/out over distance (these values are the level of detail morphing values from the height-map geometry computation). The far-distance detail map is used to break up repeated texturing far away, and it’s faded out close to the camera. This fade is done similar to how the emboss map is fading. The cloud map is used to give the impression of clouds casting moving shadows on the ground. It’s also just a vertically projected map, but this time with an animated translation.

Landscape Shader Optimizations

At first the height-map tiles were drawn in front to back order, but since this resulted in a lot of shader changes, it turned out that it was far more efficient to first sort the meta-tiles in shader order (and within one set of meta-tiles using the same shader, sort front to back).

A trick that is worth mentioning is how to avoid sending the same bi-normals and tangents for emboss mapping repeatedly to the transform unit (XF) of the graphics processor. It turns out that if these vectors are not present in the vertex format, XF will provide the previously transformed bi-normal and tangent, which reside in internal registers. Thus, if a dummy triangle is drawn with the bi-normal and tangent immediately before the landscape is drawn, then there is no need to send the same vectors over again for the rest of the height-map triangles. This means that only one vertex format is needed for the entire landscape, and it saves memory, transfer bandwidth and most importantly transform performance.

The landscape in Rogue Leader uses a multi-texturing approach that is implemented with a specialized set of shaders for blending/mixing texture layers. For efficient processing, the landscape if divided into manageable render-units. This is also important for utilizing the local lighting hardware support. To obtain high performance, it’s important to achieve efficient triangle stripping, and minimize the number of shader changes.

Article Start Previous Page 4 of 5 Next

Related Jobs

Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States

Senior Engine Programmer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States

Senior Technical Designer
Random42 — London, England, United Kingdom

UE4 Technical Artist
Evil Empire
Evil Empire — Bordeaux, France

Senior Technical Developer

Loading Comments

loader image