|
Rendering polygons is not as easy as it used to be, the reason being
the vast amount of different rendering techniques, methods and algorithms
available. Not only is choosing the right ones a problem, even worse,
all that are selected need to work together. This means, one cannot select
algorithms just based on their visible results on screen, but one must
also have a look at the resources needed. Input and output format and
values need to match when one rendering method provides the input for
another. Therefore, different techniques need to be integrated, i.e. they
must be made compatible. Not only by means of input/output values, but
also by means of their resource usage and their general environment. All
must be considered and matched. Sometimes however, the selection of one
particular rendering method comes with such a heavy resource penalty (i.e.
fill rate, transform rate, storage space, cpu-time, etc.) that some other,
already selected and implemented methods, need to be simplified and/or
revised so that the added method integrates well into the general framework.
At some point, one figures, that there are some algorithms, which simply
do not work together. In that case, only one may be used, the other one
needs to be replaced by a more compatible method.
All those problems hint that a general approach to shading is quite valuable
and helps in making the right decisions. The general approach of Star
Wars: Rogue Leader on the Nintendo Gamecube is described and the decisions
made based on it, are outlined. This includes sketching down most of the
shading algorithms, their specific implementations, and noting some technical
details; as well as various clever bits and pieces.
The second part of this features introduces the application of the principles
introduced to the area of landscape shading/texturing.
Potential Algorithms
Technically many shading methods work on current generation hardware
as on the Nintendo
Gamecube. These methods include (but are not limited to):
- dynamically
lit polygons with global and local lights specular highlights
- illumination
maps
- reflection
mapping
- emboss
mapping (shift and subtract of a height field)
- bump
mapping (per pixel calculations for diffuse, specular and reflective
components)
- projected
shadows
- self-shadowing
- shadow
volumes
- projected
reflections
- layered
fog
- polynomial
texture mapping
- displacement
maps
- multiple
texture mapping
- custom
dithering
For each of those methods, one is guaranteed to find a couple of implementations
on various platforms and many references in books and scientific papers.
The problem is however, that all those methods are looked at on their
own. One has to find a common environment to integrate them. This process
is (naturally) not hassle free and leads to certain limitations.
Limitations
The problems that can arise during integrating (or just using/implementing)
different shading methods fall into four different classes: algorithm
compatibility, hardware limitations, performance metrics, and memory limitations.
Some shading methods are not compatible. It can be as simple as a self-shadowing
pre-render pass producing eight bit z-depth values and a shadow projection
method requiring eight bit alpha values. In addition, sometimes one algorithm
needs geometry data being preprocessed in a very specific way to be feasible
on the target, while another needs a completely different representation
of the same data. Worst case here would be allocating twice the storage
amount, which is not the best solution at all.
All hardware has its limits. They can be as minor as the maximum number
of constant color registers in the texture environment (TEV) and as major
as the fact that there is just one destination-alpha (stencil) buffer.
However, if one has five shading methods, which all are using one constant
color register to combine colors and the hardware supports just four,
one has a problem. This can only be resolved in cutting a feature or in
clever reuse of the color registers (i.e. making some colors just intensities
and therefore end up with three color registers plus four intensity registers
(rgba := four intensity values)). The flexible design of the Nintendo
Gamecube hardware allows for many tricks like that, where one limited
resource is substituted by another one with almost no additional cost.
However, if you want to use two shading methods which both are based on
a stencil buffer approach, you can’t introduce any tradeoff, since there
is just one of it. The rendering need to be done in two passes or some
alternative method needs to be used.
Each system has a specific performance metric (and it will never change).
Determining which operations are relatively cheap and which are slightly
expensive is a basic requirement to figure out what can be done and what
will hurt in the large-scale application of a specific method. Luckily,
the Nintendo Gamecube is powerful and by being so, allows for many state
of the art techniques. Fill rate is not a problem and due to the eight
textures that can be applied in one go (i.e. in one render pass), the
requirements to the transform unit are not as elaborate as on other architectures.
As another advantage, the PowerPC CPU performs well and memory access
is very fast. Moreover, when it really should come down to hands-on assembly,
one can find a lot of literature and examples due to the wide application
of the CPU family in other areas. However, even the tightest assembly
loop can’t process infinite amounts of data. If a shading method requires
elaborate pre-render passes and pre-computing per frame on the CPU, the
amounts of polygons is limited. Also, should complicated shading make
it a requirement that there is more than one pre-render pass per object,
things can quickly get slow. In that case, it can be beneficial to merge
two of those passes into one, if the selected algorithms allow that (and
that’s indeed possible when pre-rendering for self-shadowing and projected
shadows).
Almost the same goes without much further discussion for memory storage.
It is another inherently limited resource. Shading methods that require
many additional texture channels and/or pre-render buffers lose against
methods that don’t have such requirements.
General approach
As outlined before, a common environment can give the required structure
to host the different shading methods that are about to be used at once.
Of course, it’s quite difficult to plot such an environment beforehand
without knowing exactly where the problems are. Therefore, this is an
iterative process for the novice. The two major points, which guided Rogue
Leader’s shading subsystem are consistent lighting for all geometry and
a consistent illumination model.
The fact, that all geometry was lit the same in a consistent way helped
tremendously to achieve the required look. By doing so, there was no room
for lighting error, which may have been introduced if things where partly
pre-lit and party dynamically lit. It’s always very hard to keep things,
once they go different ways, consistent. In addition, one directional
and one ambient light on the Nintendo Gamecube are guaranteed to be computationally
for free. Therefore, that decision does not impose a performance penalty
(strictly speaking, as soon as one starts to use more complex shader setups,
even more hardware lights come at no performance penalty, because the
graphics processor computes light values in parallel to other things).
Because of that approach, color per vertex is only used as pure paint.
This means that a model may be textured completely just using intensity
textures (grayscale) and color will be applied by painting vertex colors.
To compute the material color, both values are multiplied together. The
result is then exposed to the lighting calculations.
Local lights are all computed per vertex and are added ‘on-demand’, i.e.
if an object intersects with a local light’s bounding sphere, the appropriate
lights are fetched and the lighting calculations are enabled.
Equally important to the consistent lighting of all geometry is the usage
of a consistent illumination model. Computations for different shaders
need to be done in the same consistent manner so the results are comparable
and do not depend on specific features (e.g. bump mapping, illumination
maps, specularity, etc.) being enabled or not.
Specifically, the classification of lights into global (directional and
ambient) and local (point and spot) lights helps with specific shadowing
problems. The hardware supports this quite nicely by having two color
channels (GX_COLOR0 and GX_COLOR1) that can be routed around independently
in the texture environment.
Another strict distinction is helpful. Color computations are separated
in material color and light color computations. The first takes place
in the models own domain whereas the second one relies on the game’s light
database and the shadowing techniques used. In fact, the shading subsystem
uses many different methods to compute light values (c.f. Lighting Pipeline).

Figure 1: Basic illumination model.
Figure1 illustrates the basic flow of color values through the texture
environment. This flow is the same for all different shaders used (i.e.
diffuse shader, phong shader, lambert shader, reflective shader, etc.).
Shading is a two-fold problem Technically speaking, shading polygons has
to deal with two different problem domains, which are solved at different
times during runtime. Those are configuration of the texture environment
(shading subsystem) and light collection and selection (lighting pipeline).

Figure 2: Control flow during rendering.
Shading is a Two-Fold Problem
Technically speaking, shading polygons has to deal with two different
problem domains, which are solved at different times during runtime: configuration
of the texture environment (shading subsystem) and light collection and
selection (lighting pipeline) The reason for that distinction and the
problem that comes with it are illustrated in Figure 2. It’s quite possible
that a specific shader is used for a couple of objects that are rendered
sequentially (i.e. a large number of objects of the same kind). This results
in the shader only once being translated into a sequence of GX commands,
Nintendo’s graphics API, but the local lights for each object of course
can be different. All of the objects will very likely be at different
world positions and therefore be exposed to different local lights or
even no local lights at all. During rendering, the lighting pipeline now
has to take care that GX knows about the correct lights and needs to issue
the required sequence of commands.
The reason for that distinction and the problem that comes with it are
illustrated in figure 2. It’s quite possible that a specific shader is
used for a couple of objects that are rendered sequentially (i.e. a large
number of objects of the same kind). This results in the shader only once
being translated into a sequence of GX commands, Nintendo’s graphics API,
but the local lights for each object of course can be different. All of
the objects will very likely be at different world positions and therefore
be exposed to different local lights or even no local lights at all. During
rendering, the lighting pipeline now has to take care that GX knows about
the correct lights and needs to issue the required sequence of commands.
|