|
Features

Real-Time
Glow
Rendering Glows: Step by
Step
Specifying and Rendering the Sources of Glow.
The first step in rendering glows is to specify which objects
or parts of objects are the sources of glow. The color and brightness
of these glow sources will translate directly into the color and
brightness of the final glow, so this means we can easily control
the look of the glow by varying the brightness of the glow sources.
These sources could be whole pieces of geometry designated by some
object property or flag, or the sources could be restricted to some
small part of an object by using texture data. In the latter case,
the texture data masks out the parts of an object that do not glow,
turning them black in the glow source rendering. The remaining glow
source areas can have any desired color and brightness. Using texture
data is convenient and artist friendly, and it is our preferred
approach.
The glow source mask could be contained in its
own separate texture, but it is convenient to use the alpha channel
of the ordinary diffuse color texture to hold the mask values. In
this case, the texture RGB color is used to render the object normally
without alpha blending. When rendering the glow sources, the RGB
color is multiplied by the texture alpha channel. Where the alpha
value is zero, there will be no source of glow, and as the alpha
ramps up to full value, the intensity of the glow sources increases.
Figure 4 illustrates the texture RGB and alpha channels used to
designate sources of glow for a UFO. Additionally, per-vertex or
per-object values can be multiplied into the glow source (RGB×alpha)
value to animate the glow over time.
Once the glow sources have been
specified, they need to be written to an off-screen
render-target texture that we can process to create
the soft glow. This is a texture created with the
Direct3D 9 D3DUSAGE_RENDERTARGET
flag. There are two approaches for getting the glow
sources into the texture render target: (1) the entire
scene can be rendered to the texture using a method
to render the glow sources in color and all nonglowing
objects in black, or (2) the scene can be copied from
the ordinary back buffer into the texture using the
IDirect3DDevice9::StretchRect(..)
function.
Rendering the entire 3D scene again
to the texture can be costly for complex scenes, and
it requires an additional depth buffer dedicated to
the texture render target, so the StretchRect(..)
2D image copy method is preferred. StretchRect(..)
also allows us to resize and filter the back-buffer
image in the process of copying it to the texture.
This can be used to reduce the resolution and gain
performance in processing the texture to create the
glows.
For the StretchRect(..)
method, the alpha value acting as the glow source
mask can be rendered to the destination alpha value
of the ordinary back buffer. This will have no effect
on the ordinary scene, but in the StretchRect(..)
operation it will be copied to the alpha channel of
the texture. The alpha channel can then be multiplied
by the RGB color to mask out scene objects and leave
only the glow sources. After this step, the glow source
texture is blurred to create the soft look of glow.
The blur operation smooths high-frequency or point-like
features in the source texture, and the result has
only broad, low-frequency features. Because of this,
the glow and glow sources can be rendered at low resolution,
and doing so will not reduce the quality. The glow
can be created at one-third or one-quarter of the
full-screen resolution in each axis, which will greatly
improve the speed of rendering the effect.
Rendering the glow sources at low
resolution does affect aliasing on the final glow.
As the resolution of the glow source texture is reduced,
aliasing of the glow source texture increases, and
the source texels become more prone to flicker as
objects move around the scene. A single texel of glow
source may represent several pixels in the full-resolution
image, and this single glow source texel is spread
out into a large pattern of glow. This increases the
effect of the aliasing, causing the glow to flicker
and shimmer as objects move. The degree to which the
resolution can be reduced depends on how much flicker
is acceptable in the final image. This flickering
can be decreased by improving the quality of filtering
used when reducing resolution. For example, hardware-accelerated
bilinear texture filtering can be used while down-sampling
a high-resolution glow source image, and this will
greatly diminish the flickering.
Blurring the Glow Sources. Blurring the
glow sources spreads them out into a smooth, natural pattern of
glow. The blurring is accomplished in hardware using a two-dimensional
image-processing filter. The speed at which the glow effect can
be created depends largely on how efficiently the blur can be performed.
The time required to perform the blur depends on the size, in texels,
of the filter used. As the blur filter increases in size to cover
more texels, we have to read and write more texels in proportion
to the area of the 2D blur. The area is proportional to the blur
diameter squared, or d2. Doubling the diameter
of the glow would require processing four times the number of texels.
For a blur shape covering 50×50 texels, we'd have to read
2,500 texels for every single pixel of glow that we create! This
would make large-area glows very impractical, but fortunately, the
nasty diameter-squared cost can be avoided by doing the blur in
a two-step operation called a separable convolution. The
separable convolution reduces the cost from d2
to 2d, so it will cost only 100 texel reads at each pixel
to create a 50×50 glow. This calculation can be done quickly
on modern graphics hardware.
Adapting the Separable Convolution. The
technique of separable convolution was designed to save computation
in certain special cases, namely, when the convolution kernel can
be separated into the product of terms that are independent in each
axis. In this case, a two-dimensional convolution of n×m
elements can be reduced to two separate one-dimensional convolutions
of n and m elements, respectively. This greatly reduces
the computation cost of the convolution. Instead of calculating
and summing n×m samples at each point, the convolution
is reduced to a two-step process requiring only n + m
samples. First, an intermediate result image is created by sampling
and summing n elements along one axis for each point in the
result. Then, a neighborhood of m elements of the intermediate result
is sampled along the other axis to create each point in the final
result. The weighting factors for each of the n or m
samples are the profiles of the convolution along each axis. The
key concept, as far as were concerned, is that the two-step
approach can be used with any set of one-dimensional convolution
profiles. Even though a particular 2D blur shape may not be mathematically
separable, we can use two 1D profiles to approximate the shape.
We can create a wide variety of 2D blur shapes by doing only the
work of two 1D blurs.
In-depth information about separable convolutions
can be found on the Web at OpenGL.org.
For our purposes, the mathematical derivation might seem to restrict
the shape of the blurs. This is because the derivation is typically
based on only one or two separable functions, such as the two-dimensional
Gaussian. Rather than work from the perspective of the derivation
where a 2D profile is broken into two separate 1D profiles, we can
instead specify any pair of 1D profiles we like. It doesn't matter
what the shape of each profile is, as long as they produce some
interesting 2D blur result. For the images shown here, we have used
two Gaussian curves added together. One curve provides a smooth,
broad base, and the other produces a bright spike in the center.
Our Direct3D "Glow" demo (NVIDIA 2002) uses various other
profiles. Among them is a periodic sawtooth profile that produces
an interesting diffraction-like multiple-image effect. The demo
and full source code are included on the book's CD and Web site.
Convolution on the GPU. To blur in one
axis and then blur that blur in the other axis, we use render-to-texture
operations on the GPU. The rendering fetches a local neighborhood
of texels around each rendered pixel and applies the convolution
kernel weights to the neighborhood samples. A convolution can be
performed in a single rendering pass if the GPU can read all of
the neighbors in one pass, or the result can be built up over several
rendering passes using additive blending to accumulate a few neighbor
samples at a time.
Rendering is driven by a simple piece of screen-aligned
geometry. The geometry is a simple rectangle usually covering the
entire render target and composed of two triangles. Each triangle's
vertices have texture coordinates that determine the location at
which texels are sampled from the source texture. The coordinates
could also be computed in a vertex or pixel shader. If the coordinates
are set to range from 0.0 to 1.0 across the render target, then
rendering would copy the source texture into the destination. Each
pixel rendered would read a texel from its own location in the source
texture, resulting in an exact copy. Instead, the texture coordinates
for each texture sampler can be offset from each other by one or
more texels. In this case, each rendered pixel will sample a local
area of neighbors from the source texture. The same pattern of neighbors
will be sampled around each rendered pixel. This method is illustrated
in Figure 6, and it provides a convenient way to perform convolution
on the GPU. More information about the technique of neighbor sampling
and image processing on the GPU can be found in James 2001 and on
the Web at developer.nvidia.com
and gpgpu.org.
To perform the blurring convolution
operation on the graphics processor, the glow source
texture is bound to one or more texture sampler units,
and texture coordinates are computed to provide the
desired pattern of neighbor sampling. The render target
is set to a render-target texture that will hold the
result of blurring along the first axis. Call this
texture the horizontal blur texture. As each
pixel is rendered, several texture samples (neighbor
samples) are delivered to the pixel fragment processing
hardware, where they are multiplied by the weight
factors of the first 1D convolution kernel. Once the
horizontal blur has been rendered using one or more
passes of rendering to texture, the render target
is switched to another texture render target that
will hold the final blur. The horizontal blur texture
is bound to the input texture samplers, and the texture
coordinates and pixel shader weights for the second
1D convolution kernel (the vertical blur) are
applied.
After the last blur operation, the
glow is ready to be blended into the scene. The render
target is switched back to the ordinary back buffer,
and the glow texture is added to the scene by rendering
a simple rectangle with additive alpha blending.
Hardware-Specific Implementations
Direct3D 9. With Direct3D 9 ps.2.0capable
hardware, all of the neighbor samples can be read and convolved
in a single, complex pixel shader pass. The neighbor-sampling texture
coordinate offsets can be computed in a vertex shader program, but
the vs.2.0 and ps.2.0 models support only eight iterated texture
coordinates. Additional texture coordinates could be computed in
the pixel shader, but this may or may not be faster than a multipass
approach where only the eight hardware-iterated coordinates are
used in each pass. Sample vs.2.0 and ps.2.0 shaders are shown in
Listings 1 and 2. The vertex shader is designed to accept simple,
full-screen coverage geometry with vertex coordinates in homogeneous
clip space (screen space), where coordinates range from (x,
y) = ([-1, 1], [-1, 1]) to cover the full screen. These shaders
are used for both the horizontal and vertical blur steps, where
only the input constant values change between steps. The constants
specify the neighbor-sample placement and kernel weights.
|
|
vs.2.0
dcl_position
v0
dcl_normal
v1
dcl_color
v2
dcl_texcoord
v3
mov
oPos, v0 // output the vertex position
in screen space
//
Create neighbor-sampling texture coordinates
by
// offsetting a single input texture
coordinate according
// to several constants.
add
oT0, v3, c0
add
oT1, v3, c1
add
oT2, v3, c2
add
oT3, v3, c3
add
oT4, v3, c4
add
oT5, v3, c5
add
oT6, v3, c6
add
oT7, v3, c7
|
 |
 |
 |
Listing
1. Direct3D Vertex Shader to Set Texture Coordinates for Sampling
Eight Neighbors
|
|
|
ps.2.0
// Take 8 neighbor samples, apply
8 conv. kernel weights to
// them
dcl t0.xyzw // declare texture
coords
dcl t1.xyzw
dcl t2.xyzw
dcl t3.xyzw
dcl t4.xyzw
dcl t5.xyzw
dcl t6.xyzw
dcl t7.xyzw
dcl_2d s0 // declare texture sampler
//
Constants c0..c7 are the convolution
kernel weights
// corresponding to each neighbor
sample.
texld
r0, t0, s0
texld
r1, t1, s0
mul
r0, r0, c0
mad
r0, r1, c1, r0
texld
r1, t2, s0
texld
r2, t3, s0
mad
r0, r1, c2, r0
mad
r0, r2, c3, r0
texld
r1, t4, s0
texld
r2, t5, s0
mad
r0, r1, c4, r0
mad
r0, r2, c5, r0
texld
r1, t6, s0
texld
r2, t7, s0
mad
r0, r1, c6, r0
mad
r0, r2, c7, r0
mov
oC0, r0
|
 |
 |
 |
Listing
2. Direct3D Pixel Shader to Sum Eight Weighted Texture Samples
|
Note that in order to sample the texture at
the exact texel centers, a texture coordinate offset of half the
size of one texel must be added to the texture coordinates. This
must be done for Direct3D but is not required for OpenGL, because
the Direct3D convention is for coordinates to start from the texel
corner, while the OpenGL convention is to start from the texel center.
This is a simple adjustment to put into practice. For the vertex
shader in Listing 1, it requires adding the half-texel offset to
each of the constants c0
through c7. This
should be done on the CPU.
Direct3D 8. With hardware that supports
at most Direct3D 8 vertex and pixel shaders, we are limited to taking
only four neighbor samples per pass. Although this limitation will
require more rendering passes to build up a convolution of any given
size, each pass can be performed very quickly, typically at a rate
of several hundred passes per second for render-target textures
containing a few hundred thousand texels (textures sized from 256×256
to 512×512). Sample vs.1.1
and ps.1.3 shaders
are shown in Listings 3 and 4.
|
|
vs.1.1
dcl_position
v0
dcl_texcoord v3
mov oPos, v0 // output the vertex
position in screen space
// Create neighbor-sampling texture
coordinates by
// offsetting a single input texture
coordinate according
// to several constants.
add oT0, v3, c0
add oT1, v3, c1
add oT2, v3, c2
add oT3, v3, c3
|
 |
 |
 |
Listing
3. Direct3D Vertex Shader Program to Establish Neighbor Sampling
|
|
|
ps.1.3
tex t0 // sample 4
local neighbors
tex t1
tex t2
tex t3
// multiply each by kernel weight
and output the sum
mul r0, t0, c0
mad r0, t1, c1, r0
mad r0, t2, c2, r0
mad r0, t3, c3, r0
|
 |
 |
 |
Listing
4. Direct3D Pixel Shader Program to Sum Four Weighted Texture
Samples
|
Direct3D 7. Direct3D 7class hardware
lacks the convenient vertex and pixel shading capabilities of modern
graphics hardware. It is also typically limited to only two texture
samples per pass and will have a much lower fill rate. Still, the
blurring convolution can be performed using several overlapping
triangles of full-screen coverage geometry. Each pair of triangles,
arranged to form a full-screen quad, has the same vertex positions
but different vertex texture coordinates. For two-texture multisampling
hardware, each quad carries two texture coordinates, and each coordinate
is set to sample a different neighbor. Each quad's vertex color
attributes are set to the kernel weight for the particular neighbor-sample
location, and this vertex color is multiplied by the texture sample
value using the fixed-function SetTextureStageState(..)
API calls. A stack of these quads can be rendered with additive
blending in a single DrawPrimitive(..)
call to build up the convolution result.
______________________________________________________
|