It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Philippe Beaudoin
[Author's Bio]
and Juan Guardado
[Author's Bio]
Gamasutra
August 1, 2002

Traditional Techniques

Power Function on the Pixel Shader

Applications

The Pixel Shader

Printer Friendly Version
   



This feature is an excerpt from Direct3D ShaderX: Vertex and Pixel Shader Tips and Tricks, edited by Wolfgang Engel.

[Purchase Book]

Letters to the Editor:
Write a letter
View all letters


Features

A Non-Integer Power Function on the Pixel Shader

Raising a variable to a non-integer power is an operation often encountered in computer graphics. It is required, for example, to compute the specular term in various shading schemes. It can also be used to produce a smooth conditional function useful in many pixel shaders. In most cases, the input variable falls between 0 and 1 while the exponent is greater than 1 and often quite large.

Like many other techniques, a quality gain can be achieved by computing the function per pixel rather than per vertex. This gain is very noticeable when using large exponents since the function varies a lot and sampling it at each vertex is bound to miss visually important details (see Figure 1).

Therefore, we are particularly interested in finding a way to compute such a function on the pixel shader. Like any pixel shader trick, it is important to minimize the number of textures and blending stages since these are very limited resources. This text presents a simple shader trick that performs a good per pixel approximation of a non-integer power function. The technique works for input values between 0 and 1 and supports large exponents. The presented shader does not require any texture look-up and is scalable, making it possible to spend more instructions in order to decrease the error or to reach greater exponents.

We first consider and analyze two typical techniques used to compute a power function on the pixel shader. We then expose some mathematical background used throughout the text. Finally, we show how the algorithm can be used to perform smooth conditional functions and complex bump-mapped Phong shading. The actual implementation of the approximation as a pixel shader program is discussed in detail.



Figure 1. Gouraud shading (left) and Phong shading (right)

Traditional Techniques

When confronted with the problem of computing a power function on the pixel shader, two simple techniques come to mind. First, it seems possible to proceed through a 1D texture look-up, and second, applying successive multiplications looks promising.

Texture Look-Up
Linearly interpolated textures can be thought of as piecewise linear functions. In particular, 1D textures with a linear filter are really a function taking a value between 0 and 1 and mapping it onto another value in the same range . This looks promising for our problem since an input between 0 and 1 raised to any power greater than 0 yields a result between 0 and 1.

Listing 1 shows a piece of code that builds a 16-bit monochrome 1D texture of resolution Resi to compute xn:

void ComputePowerTexture( int Resi, double n, unsigned short* Texture )
{
   int i;
   for( i=0; i<Resi; ++i )
      Texture[i] = (unsigned short)( pow( (double)i/Resi, n ) * USHRT_MAX );
}

Listing 1. C Function to create a 1D power texture

Once this texture has been constructed, a simple texture look-up pixel shader can be used to perform the computation, provided that the value to raise to power n is placed in an interpolated texture coordinate. The pixel shader at Listing 2 shows how to apply the power function on the result of a dot product, like it is often required. Note that this code only works for pixel shader versions 1.2 and 1.3. The code for version 1.4 is presented in Listing 3.

ps.1.2

tex t0             ; Texture #0 look-up a vector
texdp3tex t1, t0   ; Use texture coordinates #1 as a 3D vector,
                   ; performs dot product between this 3D vector and t0,
                   ; using the result, look-up power function in 1D texture #1

mov r0, t1         ; emit the result

Listing 2. Code computing a power function using a 1D texture
(pixel shader versions 1.2 and 1.3)

ps.1.4

texld r0, t0        ; Texture #0 look-up a vector
texcrd r1.rgb, t1   ; Load texture coordinates #1 as a 3D vector

dp3 r0, r0, r1      ; Performs dot product between this 3D vector and r0

phase

texld r1, r0.x      ; using the result, look-up power function in 1D texture #1

mov r0, r1          ; emit the result

Listing 3. Code computing a power function using a 1D texture
(pixel shader versions 1.4)

Various problems exist with that particular technique:

  • it uses up one texture stage, which may make it unfit to algorithms requiring many textures,
  • changing the value of the exponent n requires regenerating the texture, unless a 2D texture is used in which case a limited number of predefined exponents can be used,
  • for pixel shaders versions less than 1.4, a 1D texture look-up cannot be applied to intermediate computation results unless multiple passes are used.

This last limitation is often a major drawback since, in usual cases, the power function must be preceded by a vector renormalization (such as done with a cube map) and a dot product. With large exponents, the vector renormalization is especially important. This is due to the fact that the maximum value of a dot product is the product of the length of the two vectors. If one of these is not normalized, the dot product can never reach 1. When raised to a large exponent, a value smaller than 1 will rapidly move toward 0. This translates to visual details being washed out. Figure 2 shows a vector interpolated with and without normalization, followed by a dot product, and then raised to a high power. It is obvious that the detail (for example a specular spot) is washed out in the second version.



Figure 2. Result of not normalizing vectors before applying a power function.

Successive Multiplications
Since raising a value to an integer power simply requires multiplying a value with itself a number of times, it seems possible to approximate a non-integer power function through successive multiplication steps.

For example, the pixel shader at Listing 4 shows how to raise t0 to power 16. Analyzing this scheme indicates that log2 n multiplications are required to raise a variable to the power n, when n is a power of 2:

ps.1.0
tex t0

mul r0, t0, t0
mul r0, r0, r0
mul r0, r0, r0
mul r0, r0, r0




; r0 = t0 *t0 = t0^2
; r0 = t0^2*t0^2 = t0^4
; r0 = t0^4*t0^4 = t0^8
; r0 = t0^8*t0^8 = t0^16

Listing 4. Power function using successive multiplications

Listing 5 shows a pixel shader that raises t0 to power 31. Analyzing this shows that, in general, for n in the range [2a,2a+1), the algorithm can require 2a multiplications and a temporary variables2.

ps.1.1
tex t0

mul t1, t0, t0
mul t2, t1, t1

mul t3, t2, t2
mul r0, t3, t3

mul r0, r0, t3
mul r0, r0, t2
mul r0, r0, t1
mul r0, r0, t0




; t1 = t0^2
; t2 = t0^4
; t3 = t0^8
; r0 = t0^16
; r0 = t0^16 * t0^8 = t0^24
; r0 = t0^24 * t0^4 = t0^28
; r0 = t0^28 * t0^2 = t0^30
; r0 = t0^30 * t0 = t0^31

Listing 5. Power function with a non-power-of-2 exponent
using successive multiplications

The limitations of this technique are the following:

  • only supports discrete changes in the exponent, making it impossible to change the value of n in a continuous fashion,
  • requires a lot of instructions for a large exponent,
  • requires a lot of instructions and temporary variables for non power of 2 exponents.

These last two problems often limit the usefulness of successive multiplications, since practical exponents have a tendency to be large and are usually not powers of 2.

The Need for a New Trick
Although the 1D texture look-up and the successive multiplications techniques have no limitations in common, both are too restrictive to be really useful in the general case. In particular, none of them is suited to large exponents. In the case of 1D textures, the impossibility to perform a per pixel renormalization before the look-up makes it unsuitable, and for successive multiplications the number of instructions required for large exponents is prohibitive.

The power function approximation technique presented in the rest of the text addresses all of the preceding issues. We therefore think it can often be used as an efficient alternative in pixel shaders that require a power function.

Mathematical Details

If we take xn in the range from 0 to 1 then, for a large enough n, the function is very close to 0 on most of its domain and abruptly goes to 1 when x approaches 1. This is shown in Figure 3 for increasing values of n. This particularity will be the basic concept used for developing our approximation.



Figure 3. Shape of a power function with increasing exponents n = 2, 4, 8, 16, 64.

What is interesting to note is that the result xn is greater than 1/256 only for values of x greater than 256-1/n. For example, if we take n = 16, the function will be greater than 1/256 only if x is over 0.707, with n = 64 this value becomes 0.917. Since 1/256 is the smallest value displayable on 8 bits per channel hardware, approximating xn with 0 will yield no perceptual error for values of x between 0 and 256-1/n.

Now, if we look at xn for input values greater than 256-1/n, we can see that it looks pretty much like a scaled and offset power function of a lower degree. Figure 4 shows the function x16 being approximated by x4 scaled and offset horizontally to reach 0 when x = 256-1/16.



Figure 4. Approximating the right part of a power function with a lower degree exponent m.

Therefore, using the null function from 0 to 256-1/n and correctly scaling a lower degree power function for the rest of the range seems to constitute an adequate approximation. Say our approximating function uses an exponent m, a scaled and offset version of the function can be written as (Ax + B)m, where A is the scale and B is the offset. Now, naturally, we want this approximating function to be equal to 1 when x is 1. Also, we want it to reach 0 when x = 256-1/n. We can therefore solve for A and B. We find that A = 1 / ( 1 - 256-1/n ) and B = 1 - A. The approximating function can be written as:

     

By noting that x < 256-1/n if and only if Ax + B < 0, the above function can be rewritten more concisely as:

         f(x) = max(Ax + B, 0)m

Note that we will always consider m £ n in the rest of the text. This is because we want the approximating function to have a lower degree than the original.

This technique can now be used with real examples. Figure 5 shows the result of approximating xn with max( Ax + B, 0 )m for n = 16 and m = 4. In this case, A and B are computed as described earlier and their value is A = 3.4142 and B = -2.4142. The graph displays both the original and approximated curves. The normalized error function is also plotted in order to show how the error is distributed to the left and right of the point where both curve crosses.



Figure 5. Approximation of a power function and normalized error.

A first analysis shows that the approximation gives good results for these values of A, B, n and m. However, if we look more closely at the graph we can notice that the error is not distributed equally on each side of the crossing point. This leads us to think that the maximal approximation error could be lowered by adjusting A and B in order to move the crossing point.

In fact, for arbitrary values of n and m, the technique we've described to select A and B doesn't give any guarantee on the maximal approximation error. In practice, however, it may be suited to many applications.

In order to optimize the approximation error, one should seek values of A and B for which the maximal error on the left and right side are equal. We solved this problem using the numerical approach presented in Listing 6 and described next.

First we need a function that, given approximate values for A and B, is able to compute the maximal error on the left and right side of the crossing point. To do so, we simply find the zero of the derivative of the error function on the left and right sides and evaluate the error function at these points. This is accomplished by the function EvaluateMaxError.

Then, we perform a binary search, changing our guess values for A and B in order to move the crossing point. When the left error is greater than the right error, the crossing point is moved to the left, otherwise it is moved to the right. This is accomplished by the function FindAB.

This algorithm guarantees that we will pick values of A and B that minimize the maximal approximation error for any n and m, provided that 1£ m £ n.

Listing 6. C code to determine optimal scale and offset for some approximation

Using this algorithm with the previous example, where n = 16 and m = 4, yields A = 3.525 and B = -2.525. This is illustrated in Figure 6. It can be seen that the error is equally distributed to the left and right of the crossing point, indicating that the maximal error has been minimized.



Figure 6. Approximation of a power function using optimized scale and offset.

It should be noted that, by selecting A and B through the algorithm of Listing 6, we lose the property that the max( Ax+B, 0 )m is equal to 0 only for values of xn < 1/256. However, this doesn't hurt our approximation since the maximal error has been lowered.

Table 1 shows the maximal approximation error for typical values of n and m. It also shows the optimal values of A and B for these values.

Table 1. Scale, offset and maximal error
for typical approximations.

______________________________________________________

Power Function on the Pixel Shader


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2003 CMP Media LLC

privacy policy
| terms of service