
Hardware
Accelerated Spherical Environment Mapping using Texture Matrices
By
Rob
Wyatt
Gamasutra
August
11, 2000
URL: http://www.gamasutra.com/features/20000811/wyatt_01.htm
Because of the problems with creating the texture maps and the computational costs during runtime, real-time spherical environment mapping is not often used in games. As a result, when the technique is used, the spherical maps are usually pre-calculated and therefore the don't reflect changes in a scene as they happen. Fortunately, some DirectX 7-capable video cards support cubic environment maps, which don't exhibit any of the problems associated with spherical maps, and thus they're suitable for reflecting dynamic scenes. Despite their limitations though, spherical environment maps are still useful. Using sphere maps, you can create very high performance and cheap static reflections which in most cases are good enough for game reflections, another very useful example is creating realistic specular highlights from an infinite light source.
This article will show a hardware T&L accelerated method of using sphere maps, it is assumed that your game will have some level of geometry hardware acceleration in addition to Direct3D support. If geometry acceleration is not present, applying these techniques may actually slow down a game (especially if the standard Direct3D software pipeline is used).
The Spheremap
Demo
To begin our look at spherical mapping let's look at the Spheremap demo, one
of the samples that comes with DirectX 7 (to find this demo, search the DirectX
7 CD-ROM for SPHEREMAP.EXE). This application displays a spherically environment-mapped
teapot. Figure 1 shows a screenshot from this application.
![]() |
|
Figure
1. A screenshot from SPHEREMAP.EXE
|
The Spheremap demo implements what I call "normal" spherical mapping, where the normal vector at a vertex is used in place of the eye-to-vertex reflection vector. The code that performs the mapping within this demo is shown in listing 1 (in the DirectX SDK source file this can be found in a function named ApplySphereMapToObject() ).
Unfortunately, the Spheremap demo has many shortcomings and doesn't implement spherical reflection mapping like OpenGL (when the automatic texture address generation is set to GL_SPHERE_MAP). In fact, Direct3D has no sphere map support at all - you have to calculate the texture coordinates yourself. To do so, you could create a system that cycled through all of the vertices and have the CPU calculate the texture coordinates, this is what the DirectX7 Spheremap demo does but this is far from efficient.
A closer look
The DirectX 7 documentation and various pieces of literature from the graphics
vendors stress over and over that correctly using and managing vertex buffers
is the key to getting high performance - especially with hardware T&L-based
cards. Static vertex buffers are the most efficient, as they can be kept in
local video memory and never updated (i.e. optimized), but that means that all
geometry processing has to be performed with the standard hardware pipeline,
limiting the effects that you can create. Even so, it is surprising what can
be done when all the resources of the pipeline are used.
If you must
have dynamic geometry, a carefully managed CPU-modifiable vertex buffer is still
better than no vertex buffer (as in the the SPHEREMAP.EXE example). However,
the Spheremap sample code is one of those pathological cases where vertex buffers
are actually slower - if you converted that code to use video memory vertex
buffers, it would most certainly slow down since the normal is read back from
the vertex buffer (which is taboo, as both video memory and AGP memory are uncached).
If the vertex buffer happens to be in local video memory, then it's being fetched
back over the AGP bus, which is painfully slow. In this case, keeping a second
copy of the normal vectors in system memory would be best.
Also, note that there's a glaring mistake in the DirectX algorithm, which I
am compelled to point out. It is the line commented, "Check the z-component,
to skip any vertices that face backwards". Vertices do not face backwards,
polygons do; it is perfectly legal for a polygon to have a vertex normal that
points away from the viewer while still having a face normal pointing towards
the viewer:

The results of the erroneous z-component check can be seen in the DirectX 7 example when the bottom of the teapot comes into view. For a few frames, a number of triangles are not textured properly. This check is not only an error, it causes the loop to run slower (well, it certainly doesn't speed it up). Without the check, there would be 2N dot products (where N is the number of vertices). With the check in place, and assuming half of the vertices face away from the viewer, there are N+2N/2 = N+N = 2N vertices, so the same amount of work is done. The difference is that now there is a jump in the middle of the loop in which the CPU has to predict or mispredict. On a Pentium II or III, a mispredicted jump is far more expensive than a couple of dot products.
When you have removed the z-component check, all that's left to do in the main loop is generate texture coordinates. The vector [m11, m21, m31] is the local space +X direction in camera space and the vector [m12, m22, m32] is local space +Y direction in camera space. Recall that all normal vectors are points on a unit sphere, so the code generating the texture coordinates is effectively calculating the longitude and latitude coordinates of the normal vectors position on that sphere (or the cosines of them) by taking the dot product of the unit normal with the unit axes (see Figure 2a & 2b). The output of that calculation is scaled and biased so that the center of the sphere map is the origin:
![]() |
|
Figure
2a: Normal vector
|
![]() |
If we consider that the sphere map UV coordinate calculation requires two dot products, and a matrix*vector performs four dot products we should be able to perform the same calculation using a texture matrix. Direct3D supports 4x4 texture matrices at every texture stage so all we have to make a texture matrix that performs the same dot products as discussed above, also by carefully creating the texture matrix the scale and bias is automatically performed so the origin is in the center of the texture map. The required texture matrix looks like the following:


NOTE: In the above math, the vectors lsx and lsy are used in place of [m11, m21, m31] and [m12, m22, m32] to represent the local space x and y axis in camera space - in other words, the local space [1,0,0] and [0,1,0] vectors respectively, transformed by the local*world matrices.
Next, specify the vertex normal as the first three elements of the input texture coordinate vector, and the forth element will automatically be set to its default of 1. The specified texture matrix will be applied to the texture coordinates (normal vector) and the resulting texture coordinated vector identical to that in the DirectX example.
Note: DirectX has no specific naming convention for the elements of a 4D texture coordinate so I will use the standard of [r, s, t, q]. While performing standard 2D texture mapping 'r' component is equivalent to 'u', likewise the 's' component is 'v' and elements 't' and 'q' are unused.
The following code sets the above texture matrix at stage 0. This operation needs to be done any time either the world or local matrices change, as LocalToEyeMat = Local*World:
D3DMATRIX
tex = IdentityMatrix;
tex_mat._11 = 0.5f*LocalToEyeMat._11;
tex_mat._21 = 0.5f*LocalToEyeMat._21;
tex_mat._31 = 0.5f*LocalToEyeMat._31;
tex_mat._41 = 0.5f;
tex_mat._12 = -0.5f*LocalToEyeMat._12;
tex_mat._22 = -0.5f*LocalToEyeMat._22;
tex_mat._32 = -0.5f*LocalToEyeMat._32;
tex_mat._42 = -0.5f;
3DDevice->SetTransform(D3DTRANSFORMSTATE_TEXTURE0, &tex_mat);
There is one additional
render state that needs to be set. You must tell Direct3D to apply the texture
matrix and to use just the first two elements of the result:
3DDevice->SetTextureStageStat ( 0, D3DTSS_TEXTURETRANSFORMFLAGS,
D3DTTFF_COUNT2 );
Direct3D has no way of specifying that the untransformed normal should be used as input into the texture matrix. The quick fix for this is to create a flexible vertex that has a position, normal and a three-element texture coordinate, and when the buffer is filled, you copy each normal vector into the texture coordinate. Unfortunately, this also increases the size of each vertex by 12 bytes and consumes more bandwidth when processing the buffer. (In a basic vertex case, these extra 12 bytes increases the vertex size by 50%.) But the cost is worth it: you can perform the "normal" spherical environment mapping (as used in the Direct3D sample) with a static vertex buffer, using nothing more than the standard Direct3D pipeline. This is a big win with hardware, since cards like nVidia's GeForce and GeForce2 process the texture matrix in hardware without CPU intervention, allowing the vertex buffer to be stored in local video memory.
Note that both the Direct3D and texture matrix examples expect a unit scale in the local-to-camera space transform (local*world). If this isn't the case, the texture matrix must be scaled by the inverse of the scale factor. Additionally, the normal-vector texture coordinates are expected to be of unit length. If this technique is applied to dynamic geometry, then every time a normal is modified, the associated texture coordinate needs to be updated. Another shortcoming of the method discussed above is that only the original input normal vectors are considered when calculating the reflection which for most meshes is fine but when mesh skinning is applied there is a problem. When skinning a mesh in hardware each vertex (position and normal) is multiplied by a pair of world transforms, the final position and normal is calculated from a weighting applied to the results of these transforms. This skinned normal and position is not available outside of the graphics pipeline but to obtain a correct reflection we need to know what the skinned normal vector was but we have a problem, one solution would be to use the CPU to reskin the mesh but this is expensive.
Fortunately there is a better solution, Direct3D can be told to use the camera-space normal directly as a texture coordinate by setting the texture coordinate index for the required texture stage to include the D3DTSS_TSI_CAMERASPACENORMAL flag. With this 'Tex-Gen' mode set any texture coordinates within the vertex buffer that are bound to the specified texture stage are ignored and instead the camera-space normal [n'x,n'y,n'z,1] is used, this normal vector automatic texture coordinate includes any skinning operations that may have been performed. Contrary to popular belief, this flag is not just for use in cubic environment mapping, it can be used anytime you want the camera space normal to be used as a texture co-ordinate.
Referring back to the texture matrix, it is easy to create a texture matrix to take advantage of this new flag, as shown below:



where
[n'x n'y n'z 1] is the transformed and skinned normal vector.
As you can see, all that needs to be done is multiply the transformed automatic
normal-vector texture coordinate by a constant matrix. The first advantage of
this method is that you do not have to update the texture matrix when the local-to-camera
matrix changes (since this is already taken care of with the transformed normal).
Another advantage is that you do not need to keep a second copy of the normal
vectors within the mesh - the normal seen by the texture matrix is the same
one seen by the geometry. That means that automatic normal normalization and
skinning can be utilized. When automatic normalization is enabled, you can even
set non-uniform scale factors in the world matrix and everything works fine.
The other big gain is that it is possible to perform reflective spherical environment mapping, since Direct3D can pass the camera-space reflection vector in place of the normal by setting the texture index state to include D3DTSS_TSI_CAMERASPACEREFLECTIONVECTOR. This provides environment mapping similar to that produced by GL_SPHERE_MAP, which looks significantly better than normal-based spherical environment mapping. (Unfortunately, the reflection method is more sensitive to distortion due to bad normal vectors, non-unit normal vectors and low tessellation.)
There are two methods that Direct3D can use to calculate the eye-to-vertex reflection vector. The method used depends on the local-viewer render state (D3DRENDERSTATE_LOCALVIEWER). When this render state is set to true, the per-vertex refection is calculated using this formula:
R = 2(E.N)N-E
(Where N is the vertex normal and E is the camera to vertex unit vector.)
When the local-viewer render state is set to false, the reflection vector is orthogonal, and is calculated from an infinite viewpoint using this more simple formula:
R=2NzN-I
(Where Nz is the world space Z component of the vertex normal and I is the vector
[0,0,1].)
Better reflections are obtained with the local viewer model, but it is a more complex calculation and performance can be affected by the level of hardware acceleration; try using both reflection models and see which one works the best for you. By default within Direct3D, local viewer reflections (and specular calculations) are enabled, and they should be disabled if the orthogonal non-local version is required. The screen shot below shows the test application associated with this article performing spherical reflection mapping on a torus:

Finally, depending on the quality of your sphere maps, you might want to pull the boundary in a little (but not too much, as it creates other distortions). By not using the last few pixels around the edge of the sphere map, you can significantly change the look of an object. Another way to look at the texture matrix is as follows:

Where both Scale and Offset are both in u,v coordinates, in the range 0.0 to 1.0. By carefully adjusting this matrix you can not only trim off the outer edge of a sphere map, you can also pack more than one sphere map into a larger texture or even use elliptical sphere maps.
The code associated with this article is available as a Visual C++ 6.0 project call TGReflect, included in the archive is a ready to use pre-built release mode executable. This project was built using the Shrike 7.00 DirectX framework, this framework is only required if you wish to rebuild the project and is available from this link, see Shrike7.doc within the archive for more information.
Spherical environment mapping is useful in a lot of circumstances and if it is going to be used it may as well be hardware accelerated, the less graphics work that is done by the CPU the lesser the chance the rendering hardware will be stalled. By carefully utilizing the resources of the standard DirectX graphics pipeline a lot can be achieved and as a developer trying to obtain maximum performance for your game you should carefully examine what the hardware is capable off before delegating an operation to software, current video cards such as the GeForce 2 have significantly more computational power and bandwidth available to them than even the fastest CPUs so take advantage of it whenever you can. Applications of spherical environment mapping are all over the Internet, start by trying the nVidia (http://www.nvidia.com) or OpenGL (http://www.opengl.org) sites.
Listing 1. Mapping
//
Get the current world-view matrix
D3DMATRIX matWorld, matView, matWV;
m_pd3dDevice->GetTransform( D3DTRANSFORMSTATE_VIEW, &matView );
m_pd3dDevice->GetTransform( D3DTRANSFORMSTATE_WORLD, &matWorld );
D3DMath_MatrixMultiply( matWV, matWorld, matView );
// Extract world-view
matrix elements for speed
FLOAT m11 = matWV._11, m21 = matWV._21, m31 = matWV._31;
FLOAT m12 = matWV._12, m22 = matWV._22, m32 = matWV._32;
FLOAT m13 = matWV._13, m23 = matWV._23, m33 = matWV._33;
// Loop through
the vertices, transforming each one and calculating
// the correct texture coordinates.
for( WORD i = 0; i < dwNumVertices; i++ )
{
FLOAT nx = pvVertices[i].nx;
FLOAT ny = pvVertices[i].ny;
FLOAT nz = pvVertices[i].nz;
//
Check the z-component, to skip any vertices that face backwards
if( nx*m13 + ny*m23 + nz*m33 > 0.0f )
continue;
//
Assign the spheremap's texture coordinates
pvVertices[i].tu = 0.5f * ( 1.0f + ( nx*m11 + ny*m21 +
nz*m31 ) );
pvVertices[i].tv = 0.5f * ( 1.0f - ( nx*m12 + ny*m22 +
nz*m32 ) );
}
Rob Wyatt has been involved in games and graphics for more than a decade and was one of the architects of the X-Box game console, he recently left Microsoft and headed to Southern California where he can be found flying around the skies of Los Angeles in his plane. He is currently looking a various technologies for the Internet.
Copyright © 2003 CMP Media Inc. All rights reserved.