Raising a variable to a non-integer power is an operation often encountered in computer graphics. It is required, for example, to compute the specular term in various shading schemes. It can also be used to produce a smooth conditional function useful in many pixel shaders. In most cases, the input variable falls between 0 and 1 while the exponent is greater than 1 and often quite large.
Like many other techniques, a quality gain can be achieved by computing the function per pixel rather than per vertex. This gain is very noticeable when using large exponents since the function varies a lot and sampling it at each vertex is bound to miss visually important details (see Figure 1).
Therefore, we are particularly interested in finding a way to compute such a function on the pixel shader. Like any pixel shader trick, it is important to minimize the number of textures and blending stages since these are very limited resources. This text presents a simple shader trick that performs a good per pixel approximation of a non-integer power function. The technique works for input values between 0 and 1 and supports large exponents. The presented shader does not require any texture look-up and is scalable, making it possible to spend more instructions in order to decrease the error or to reach greater exponents.
We first consider and analyze two typical techniques used to compute a power function on the pixel shader. We then expose some mathematical background used throughout the text. Finally, we show how the algorithm can be used to perform smooth conditional functions and complex bump-mapped Phong shading. The actual implementation of the approximation as a pixel shader program is discussed in detail.
When confronted with the problem of computing a power function on the pixel shader, two simple techniques come to mind. First, it seems possible to proceed through a 1D texture look-up, and second, applying successive multiplications looks promising.
Linearly interpolated textures can be thought of as piecewise linear functions. In particular, 1D textures with a linear filter are really a function taking a value between 0 and 1 and mapping it onto another value in the same range . This looks promising for our problem since an input between 0 and 1 raised to any power greater than 0 yields a result between 0 and 1.
Listing 1 shows a piece of code that builds a 16-bit monochrome 1D texture of resolution Resi to compute xn:
void ComputePowerTexture( int Resi, double n, unsigned short* Texture )
for( i=0; i
Texture[i] = (unsigned short)( pow( (double)i/Resi, n ) * USHRT_MAX );
Listing 1. C Function to create a 1D power texture
Once this texture has been constructed, a simple texture look-up pixel shader can be used to perform the computation, provided that the value to raise to power n is placed in an interpolated texture coordinate. The pixel shader at Listing 2 shows how to apply the power function on the result of a dot product, like it is often required. Note that this code only works for pixel shader versions 1.2 and 1.3. The code for version 1.4 is presented in Listing 3.
tex t0 ; Texture #0 look-up a vector
texdp3tex t1, t0 ; Use texture coordinates #1 as a 3D vector,
; performs dot product between this 3D vector and t0,
; using the result, look-up power function in 1D texture #1
mov r0, t1 ; emit the result
Listing 2. Code computing a power function using a 1D texture
(pixel shader versions 1.2 and 1.3)
texld r0, t0 ; Texture #0 look-up a vector
texcrd r1.rgb, t1 ; Load texture coordinates #1 as a 3D vector
dp3 r0, r0, r1 ; Performs dot product between this 3D vector and r0
texld r1, r0.x ; using the result, look-up power function in 1D texture #1
mov r0, r1 ; emit the result
Listing 3. Code computing a power function using a 1D texture
(pixel shader versions 1.4)
Various problems exist with that particular technique:
- it uses up one texture stage, which may make it unfit to algorithms requiring many textures,
- changing the value of the exponent n requires regenerating the texture, unless a 2D texture is used in which case a limited number of predefined exponents can be used,
- for pixel shaders versions less than 1.4, a 1D texture look-up cannot be applied to intermediate computation results unless multiple passes are used.
This last limitation is often a major drawback since, in usual cases, the power function must be preceded by a vector renormalization (such as done with a cube map) and a dot product. With large exponents, the vector renormalization is especially important. This is due to the fact that the maximum value of a dot product is the product of the length of the two vectors. If one of these is not normalized, the dot product can never reach 1. When raised to a large exponent, a value smaller than 1 will rapidly move toward 0. This translates to visual details being washed out. Figure 2 shows a vector interpolated with and without normalization, followed by a dot product, and then raised to a high power. It is obvious that the detail (for example a specular spot) is washed out in the second version.
Since raising a value to an integer power simply requires multiplying a value with itself a number of times, it seems possible to approximate a non-integer power function through successive multiplication steps.
For example, the pixel shader at Listing 4 shows how to raise t0 to power 16. Analyzing this scheme indicates that log2 n multiplications are required to raise a variable to the power n, when n is a power of 2:
mul r0, t0, t0
mul r0, r0, r0
mul r0, r0, r0
mul r0, r0, r0
; r0 = t0 *t0 = t0^2
; r0 = t0^2*t0^2 = t0^4
; r0 = t0^4*t0^4 = t0^8
; r0 = t0^8*t0^8 = t0^16
Listing 4. Power function using successive multiplications
Listing 5 shows a pixel shader that raises t0 to power 31. Analyzing this shows that, in general, for n in the range [2a,2a+1), the algorithm can require 2a multiplications and a temporary variables2.
mul t1, t0, t0
mul t2, t1, t1
mul t3, t2, t2
mul r0, t3, t3
mul r0, r0, t3
mul r0, r0, t2
mul r0, r0, t1
mul r0, r0, t0
; t1 = t0^2
; t2 = t0^4
; t3 = t0^8
; r0 = t0^16
; r0 = t0^16 * t0^8 = t0^24
; r0 = t0^24 * t0^4 = t0^28
; r0 = t0^28 * t0^2 = t0^30
; r0 = t0^30 * t0 = t0^31
Listing 5. Power function with a non-power-of-2 exponent
using successive multiplications
The limitations of this technique are the following:
- only supports discrete changes in the exponent, making it impossible to change the value of n in a continuous fashion,
- requires a lot of instructions for a large exponent,
- requires a lot of instructions and temporary variables for non power of 2 exponents.
These last two problems often limit the usefulness of successive multiplications, since practical exponents have a tendency to be large and are usually not powers of 2.
The Need for a New Trick
Although the 1D texture look-up and the successive multiplications techniques have no limitations in common, both are too restrictive to be really useful in the general case. In particular, none of them is suited to large exponents. In the case of 1D textures, the impossibility to perform a per pixel renormalization before the look-up makes it unsuitable, and for successive multiplications the number of instructions required for large exponents is prohibitive.
The power function approximation technique presented in the rest of the text addresses all of the preceding issues. We therefore think it can often be used as an efficient alternative in pixel shaders that require a power function.