It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Greg James
[Author's Bio]
and John O’Rorke
[Author's Bio]

Gamasutra
May 20, 2004

Introduction

Rendering Glows: Step by Step

Other Uses for Blur

Printer Friendly Version
   


This article is excerpted from the recently published book, GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics edited by Randima Fernando.

 

Change Login/Pwd
Post A Job
Post A Project
Post Resume
Post An Event
Post A Contractor
Post A Product
Write An Article
Get In Art Gallery
Submit News

 


 


Latest Letters to the Editor:
Perpetual Layoffs by Alexander Brandon [09.21.2007]

Casual friendliness in MMO's by Colby Poulson [09.20.2007]

Scrum deals and 'What is Scrum?' by Tom Plunket [08.29.2007]


[Submit Letter]

[View All...]
  



Upcoming Events:
Video Game Expo (VGXPO)
Philadelphia, United States
11.21.08

DIG London Game Conference
London, Canada
11.27.08

5th Australasian Conference on Interactive Entertainment
Brisbane, Australia
12.03.08

IEEE Symposium on Computational Intelligence and Games
Perth, Australia
12.15.08

2K Bot Prize
Perth, Australia
12.15.08

[Submit Event]
[View All...]

 


[Enter Forums...]

Note: Discussion forums for Gamasutra are hosted by the IGDA, which is free to join.
 


Features

Real-Time Glow

Rendering Glows: Step by Step

Specifying and Rendering the Sources of Glow. The first step in rendering glows is to specify which objects or parts of objects are the sources of glow. The color and brightness of these glow sources will translate directly into the color and brightness of the final glow, so this means we can easily control the look of the glow by varying the brightness of the glow sources. These sources could be whole pieces of geometry designated by some object property or flag, or the sources could be restricted to some small part of an object by using texture data. In the latter case, the texture data masks out the parts of an object that do not glow, turning them black in the glow source rendering. The remaining glow source areas can have any desired color and brightness. Using texture data is convenient and artist friendly, and it is our preferred approach.

The glow source mask could be contained in its own separate texture, but it is convenient to use the alpha channel of the ordinary diffuse color texture to hold the mask values. In this case, the texture RGB color is used to render the object normally without alpha blending. When rendering the glow sources, the RGB color is multiplied by the texture alpha channel. Where the alpha value is zero, there will be no source of glow, and as the alpha ramps up to full value, the intensity of the glow sources increases. Figure 4 illustrates the texture RGB and alpha channels used to designate sources of glow for a UFO. Additionally, per-vertex or per-object values can be multiplied into the glow source (RGB×alpha) value to animate the glow over time.

Once the glow sources have been specified, they need to be written to an off-screen render-target texture that we can process to create the soft glow. This is a texture created with the Direct3D 9 D3DUSAGE_RENDERTARGET flag. There are two approaches for getting the glow sources into the texture render target: (1) the entire scene can be rendered to the texture using a method to render the glow sources in color and all nonglowing objects in black, or (2) the scene can be copied from the ordinary back buffer into the texture using the IDirect3DDevice9::StretchRect(..) function.

Rendering the entire 3D scene again to the texture can be costly for complex scenes, and it requires an additional depth buffer dedicated to the texture render target, so the StretchRect(..) 2D image copy method is preferred. StretchRect(..) also allows us to resize and filter the back-buffer image in the process of copying it to the texture. This can be used to reduce the resolution and gain performance in processing the texture to create the glows.

For the StretchRect(..) method, the alpha value acting as the glow source mask can be rendered to the destination alpha value of the ordinary back buffer. This will have no effect on the ordinary scene, but in the StretchRect(..) operation it will be copied to the alpha channel of the texture. The alpha channel can then be multiplied by the RGB color to mask out scene objects and leave only the glow sources. After this step, the glow source texture is blurred to create the soft look of glow. The blur operation smooths high-frequency or point-like features in the source texture, and the result has only broad, low-frequency features. Because of this, the glow and glow sources can be rendered at low resolution, and doing so will not reduce the quality. The glow can be created at one-third or one-quarter of the full-screen resolution in each axis, which will greatly improve the speed of rendering the effect.


Figure 4. Designating Glow Sources. The alpha channel of an object's texture can be used to specify glow sources and the glow source brightness.

Rendering the glow sources at low resolution does affect aliasing on the final glow. As the resolution of the glow source texture is reduced, aliasing of the glow source texture increases, and the source texels become more prone to flicker as objects move around the scene. A single texel of glow source may represent several pixels in the full-resolution image, and this single glow source texel is spread out into a large pattern of glow. This increases the effect of the aliasing, causing the glow to flicker and shimmer as objects move. The degree to which the resolution can be reduced depends on how much flicker is acceptable in the final image. This flickering can be decreased by improving the quality of filtering used when reducing resolution. For example, hardware-accelerated bilinear texture filtering can be used while down-sampling a high-resolution glow source image, and this will greatly diminish the flickering.

Blurring the Glow Sources. Blurring the glow sources spreads them out into a smooth, natural pattern of glow. The blurring is accomplished in hardware using a two-dimensional image-processing filter. The speed at which the glow effect can be created depends largely on how efficiently the blur can be performed. The time required to perform the blur depends on the size, in texels, of the filter used. As the blur filter increases in size to cover more texels, we have to read and write more texels in proportion to the area of the 2D blur. The area is proportional to the blur diameter squared, or d2. Doubling the diameter of the glow would require processing four times the number of texels. For a blur shape covering 50×50 texels, we'd have to read 2,500 texels for every single pixel of glow that we create! This would make large-area glows very impractical, but fortunately, the nasty diameter-squared cost can be avoided by doing the blur in a two-step operation called a separable convolution. The separable convolution reduces the cost from d2 to 2d, so it will cost only 100 texel reads at each pixel to create a 50×50 glow. This calculation can be done quickly on modern graphics hardware.

Adapting the Separable Convolution. The technique of separable convolution was designed to save computation in certain special cases, namely, when the convolution kernel can be separated into the product of terms that are independent in each axis. In this case, a two-dimensional convolution of n×m elements can be reduced to two separate one-dimensional convolutions of n and m elements, respectively. This greatly reduces the computation cost of the convolution. Instead of calculating and summing n×m samples at each point, the convolution is reduced to a two-step process requiring only n + m samples. First, an intermediate result image is created by sampling and summing n elements along one axis for each point in the result. Then, a neighborhood of m elements of the intermediate result is sampled along the other axis to create each point in the final result. The weighting factors for each of the n or m samples are the profiles of the convolution along each axis. The key concept, as far as we’re concerned, is that the two-step approach can be used with any set of one-dimensional convolution profiles. Even though a particular 2D blur shape may not be mathematically separable, we can use two 1D profiles to approximate the shape. We can create a wide variety of 2D blur shapes by doing only the work of two 1D blurs.

In-depth information about separable convolutions can be found on the Web at OpenGL.org. For our purposes, the mathematical derivation might seem to restrict the shape of the blurs. This is because the derivation is typically based on only one or two separable functions, such as the two-dimensional Gaussian. Rather than work from the perspective of the derivation where a 2D profile is broken into two separate 1D profiles, we can instead specify any pair of 1D profiles we like. It doesn't matter what the shape of each profile is, as long as they produce some interesting 2D blur result. For the images shown here, we have used two Gaussian curves added together. One curve provides a smooth, broad base, and the other produces a bright spike in the center. Our Direct3D "Glow" demo (NVIDIA 2002) uses various other profiles. Among them is a periodic sawtooth profile that produces an interesting diffraction-like multiple-image effect. The demo and full source code are included on the book's CD and Web site.


Figure 5. The Two-Step Separable Approach for Creating Blurs Efficiently. First, the glow source points in (a) are blurred in one axis to produce the temporary result in (b). This result is then blurred in the other axis to produce the final blur shown in (c).

Convolution on the GPU. To blur in one axis and then blur that blur in the other axis, we use render-to-texture operations on the GPU. The rendering fetches a local neighborhood of texels around each rendered pixel and applies the convolution kernel weights to the neighborhood samples. A convolution can be performed in a single rendering pass if the GPU can read all of the neighbors in one pass, or the result can be built up over several rendering passes using additive blending to accumulate a few neighbor samples at a time.

Rendering is driven by a simple piece of screen-aligned geometry. The geometry is a simple rectangle usually covering the entire render target and composed of two triangles. Each triangle's vertices have texture coordinates that determine the location at which texels are sampled from the source texture. The coordinates could also be computed in a vertex or pixel shader. If the coordinates are set to range from 0.0 to 1.0 across the render target, then rendering would copy the source texture into the destination. Each pixel rendered would read a texel from its own location in the source texture, resulting in an exact copy. Instead, the texture coordinates for each texture sampler can be offset from each other by one or more texels. In this case, each rendered pixel will sample a local area of neighbors from the source texture. The same pattern of neighbors will be sampled around each rendered pixel. This method is illustrated in Figure 6, and it provides a convenient way to perform convolution on the GPU. More information about the technique of neighbor sampling and image processing on the GPU can be found in James 2001 and on the Web at developer.nvidia.com and gpgpu.org.

To perform the blurring convolution operation on the graphics processor, the glow source texture is bound to one or more texture sampler units, and texture coordinates are computed to provide the desired pattern of neighbor sampling. The render target is set to a render-target texture that will hold the result of blurring along the first axis. Call this texture the horizontal blur texture. As each pixel is rendered, several texture samples (neighbor samples) are delivered to the pixel fragment processing hardware, where they are multiplied by the weight factors of the first 1D convolution kernel. Once the horizontal blur has been rendered using one or more passes of rendering to texture, the render target is switched to another texture render target that will hold the final blur. The horizontal blur texture is bound to the input texture samplers, and the texture coordinates and pixel shader weights for the second 1D convolution kernel (the vertical blur) are applied.

After the last blur operation, the glow is ready to be blended into the scene. The render target is switched back to the ordinary back buffer, and the glow texture is added to the scene by rendering a simple rectangle with additive alpha blending.

Hardware-Specific Implementations

Direct3D 9. With Direct3D 9 ps.2.0–capable hardware, all of the neighbor samples can be read and convolved in a single, complex pixel shader pass. The neighbor-sampling texture coordinate offsets can be computed in a vertex shader program, but the vs.2.0 and ps.2.0 models support only eight iterated texture coordinates. Additional texture coordinates could be computed in the pixel shader, but this may or may not be faster than a multipass approach where only the eight hardware-iterated coordinates are used in each pass. Sample vs.2.0 and ps.2.0 shaders are shown in Listings 1 and 2. The vertex shader is designed to accept simple, full-screen coverage geometry with vertex coordinates in homogeneous clip space (screen space), where coordinates range from (x, y) = ([-1, 1], [-1, 1]) to cover the full screen. These shaders are used for both the horizontal and vertical blur steps, where only the input constant values change between steps. The constants specify the neighbor-sample placement and kernel weights.


Figure 6. Convolution and Image Processing. (a) Rendering a simple rectangle geometry across the screen drives the processing of textures at each pixel. (b) By offsetting texture coordinates, each pixel rendered can sample a local area of neighbors. (c) The same neighbor-sampling pattern in (b) is shown with all samples coming from a single texture source.



vs.2.0
dcl_position v0
dcl_normal v1
dcl_color v2
dcl_texcoord v3

mov oPos, v0 // output the vertex position in screen space

// Create neighbor-sampling texture coordinates by
// offsetting a single input texture coordinate according
// to several constants.

add oT0, v3, c0
add oT1, v3, c1
add oT2, v3, c2
add oT3, v3, c3
add oT4, v3, c4
add oT5, v3, c5
add oT6, v3, c6
add oT7, v3, c7

 

Listing 1. Direct3D Vertex Shader to Set Texture Coordinates for Sampling Eight Neighbors



ps.2.0
// Take 8 neighbor samples, apply 8 conv. kernel weights to
// them

dcl t0.xyzw // declare texture coords
dcl t1.xyzw
dcl t2.xyzw
dcl t3.xyzw
dcl t4.xyzw
dcl t5.xyzw
dcl t6.xyzw
dcl t7.xyzw
dcl_2d s0 // declare texture sampler
// Constants c0..c7 are the convolution kernel weights
// corresponding to each neighbor sample.
texld r0, t0, s0
texld r1, t1, s0
mul   r0, r0, c0
mad   r0, r1, c1, r0
texld r1, t2, s0
texld r2, t3, s0
mad   r0, r1, c2, r0
mad   r0, r2, c3, r0
texld r1, t4, s0
texld r2, t5, s0
mad   r0, r1, c4, r0
mad   r0, r2, c5, r0
texld r1, t6, s0
texld r2, t7, s0
mad   r0, r1, c6, r0
mad   r0, r2, c7, r0
mov oC0, r0

 

Listing 2. Direct3D Pixel Shader to Sum Eight Weighted Texture Samples

Note that in order to sample the texture at the exact texel centers, a texture coordinate offset of half the size of one texel must be added to the texture coordinates. This must be done for Direct3D but is not required for OpenGL, because the Direct3D convention is for coordinates to start from the texel corner, while the OpenGL convention is to start from the texel center. This is a simple adjustment to put into practice. For the vertex shader in Listing 1, it requires adding the half-texel offset to each of the constants c0 through c7. This should be done on the CPU.

Direct3D 8. With hardware that supports at most Direct3D 8 vertex and pixel shaders, we are limited to taking only four neighbor samples per pass. Although this limitation will require more rendering passes to build up a convolution of any given size, each pass can be performed very quickly, typically at a rate of several hundred passes per second for render-target textures containing a few hundred thousand texels (textures sized from 256×256 to 512×512). Sample vs.1.1 and ps.1.3 shaders are shown in Listings 3 and 4.


vs.1.1
dcl_position v0
dcl_texcoord v3
mov oPos, v0 // output the vertex position in screen space
// Create neighbor-sampling texture coordinates by
// offsetting a single input texture coordinate according
// to several constants.

add oT0, v3, c0
add oT1, v3, c1
add oT2, v3, c2
add oT3, v3, c3

 

Listing 3. Direct3D Vertex Shader Program to Establish Neighbor Sampling



ps.1.3
tex t0   // sample 4 local neighbors
tex t1
tex t2
tex t3
// multiply each by kernel weight and output the sum
mul r0, t0, c0
mad r0, t1, c1, r0
mad r0, t2, c2, r0
mad r0, t3, c3, r0

 

Listing 4. Direct3D Pixel Shader Program to Sum Four Weighted Texture Samples

Direct3D 7. Direct3D 7–class hardware lacks the convenient vertex and pixel shading capabilities of modern graphics hardware. It is also typically limited to only two texture samples per pass and will have a much lower fill rate. Still, the blurring convolution can be performed using several overlapping triangles of full-screen coverage geometry. Each pair of triangles, arranged to form a full-screen quad, has the same vertex positions but different vertex texture coordinates. For two-texture multisampling hardware, each quad carries two texture coordinates, and each coordinate is set to sample a different neighbor. Each quad's vertex color attributes are set to the kernel weight for the particular neighbor-sample location, and this vertex color is multiplied by the texture sample value using the fixed-function SetTextureStageState(..) API calls. A stack of these quads can be rendered with additive blending in a single DrawPrimitive(..) call to build up the convolution result.

______________________________________________________


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2004 CMP Media LLC

privacy policy
| terms of service