| |
|
|
||||
![]() |
||||||
| |
|
|||||
|
A Non-Integer Power Function on the Pixel Shader Raising a variable to a non-integer power is an operation often encountered in computer graphics. It is required, for example, to compute the specular term in various shading schemes. It can also be used to produce a smooth conditional function useful in many pixel shaders. In most cases, the input variable falls between 0 and 1 while the exponent is greater than 1 and often quite large. Like many other techniques, a quality gain can be achieved by computing the function per pixel rather than per vertex. This gain is very noticeable when using large exponents since the function varies a lot and sampling it at each vertex is bound to miss visually important details (see Figure 1). Therefore, we are particularly interested in finding a way to compute such a function on the pixel shader. Like any pixel shader trick, it is important to minimize the number of textures and blending stages since these are very limited resources. This text presents a simple shader trick that performs a good per pixel approximation of a non-integer power function. The technique works for input values between 0 and 1 and supports large exponents. The presented shader does not require any texture look-up and is scalable, making it possible to spend more instructions in order to decrease the error or to reach greater exponents. We first consider and analyze two typical techniques used to compute a power function on the pixel shader. We then expose some mathematical background used throughout the text. Finally, we show how the algorithm can be used to perform smooth conditional functions and complex bump-mapped Phong shading. The actual implementation of the approximation as a pixel shader program is discussed in detail.
Traditional Techniques Listing 1 shows a piece of code that builds a 16-bit monochrome 1D texture of resolution Resi to compute xn:
Listing 1. C Function to create a 1D power texture Once this texture has been constructed, a simple texture look-up pixel shader can be used to perform the computation, provided that the value to raise to power n is placed in an interpolated texture coordinate. The pixel shader at Listing 2 shows how to apply the power function on the result of a dot product, like it is often required. Note that this code only works for pixel shader versions 1.2 and 1.3. The code for version 1.4 is presented in Listing 3.
Listing
2. Code computing a power function using a 1D texture
Listing
3. Code computing a power function using a 1D texture Various problems exist with that particular technique:
This last limitation is often a major drawback since, in usual cases, the power function must be preceded by a vector renormalization (such as done with a cube map) and a dot product. With large exponents, the vector renormalization is especially important. This is due to the fact that the maximum value of a dot product is the product of the length of the two vectors. If one of these is not normalized, the dot product can never reach 1. When raised to a large exponent, a value smaller than 1 will rapidly move toward 0. This translates to visual details being washed out. Figure 2 shows a vector interpolated with and without normalization, followed by a dot product, and then raised to a high power. It is obvious that the detail (for example a specular spot) is washed out in the second version.
Successive
Multiplications For example, the pixel shader at Listing 4 shows how to raise t0 to power 16. Analyzing this scheme indicates that log2 n multiplications are required to raise a variable to the power n, when n is a power of 2:
Listing 4. Power function using successive multiplications Listing 5 shows a pixel shader that raises t0 to power 31. Analyzing this shows that, in general, for n in the range [2a,2a+1), the algorithm can require 2a multiplications and a temporary variables2.
Listing
5. Power function with a non-power-of-2 exponent The limitations of this technique are the following:
These last two problems often limit the usefulness of successive multiplications, since practical exponents have a tendency to be large and are usually not powers of 2. The Need
for a New Trick The power function approximation technique presented in the rest of the text addresses all of the preceding issues. We therefore think it can often be used as an efficient alternative in pixel shaders that require a power function. Mathematical Details
What is interesting to note is that the result xn is greater than 1/256 only for values of x greater than 256-1/n. For example, if we take n = 16, the function will be greater than 1/256 only if x is over 0.707, with n = 64 this value becomes 0.917. Since 1/256 is the smallest value displayable on 8 bits per channel hardware, approximating xn with 0 will yield no perceptual error for values of x between 0 and 256-1/n. Now, if we look at xn for input values greater than 256-1/n, we can see that it looks pretty much like a scaled and offset power function of a lower degree. Figure 4 shows the function x16 being approximated by x4 scaled and offset horizontally to reach 0 when x = 256-1/16.
Therefore, using the null function from 0 to 256-1/n and correctly scaling a lower degree power function for the rest of the range seems to constitute an adequate approximation. Say our approximating function uses an exponent m, a scaled and offset version of the function can be written as (Ax + B)m, where A is the scale and B is the offset. Now, naturally, we want this approximating function to be equal to 1 when x is 1. Also, we want it to reach 0 when x = 256-1/n. We can therefore solve for A and B. We find that A = 1 / ( 1 - 256-1/n ) and B = 1 - A. The approximating function can be written as:
By noting that x < 256-1/n if and only if Ax + B < 0, the above function can be rewritten more concisely as: f(x) = max(Ax + B, 0)m Note that we will always consider m £ n in the rest of the text. This is because we want the approximating function to have a lower degree than the original. This technique can now be used with real examples. Figure 5 shows the result of approximating xn with max( Ax + B, 0 )m for n = 16 and m = 4. In this case, A and B are computed as described earlier and their value is A = 3.4142 and B = -2.4142. The graph displays both the original and approximated curves. The normalized error function is also plotted in order to show how the error is distributed to the left and right of the point where both curve crosses.
A first analysis shows that the approximation gives good results for these values of A, B, n and m. However, if we look more closely at the graph we can notice that the error is not distributed equally on each side of the crossing point. This leads us to think that the maximal approximation error could be lowered by adjusting A and B in order to move the crossing point. In fact, for arbitrary values of n and m, the technique we've described to select A and B doesn't give any guarantee on the maximal approximation error. In practice, however, it may be suited to many applications. In order to optimize the approximation error, one should seek values of A and B for which the maximal error on the left and right side are equal. We solved this problem using the numerical approach presented in Listing 6 and described next. First we need a function that, given approximate values for A and B, is able to compute the maximal error on the left and right side of the crossing point. To do so, we simply find the zero of the derivative of the error function on the left and right sides and evaluate the error function at these points. This is accomplished by the function EvaluateMaxError. Then, we perform a binary search, changing our guess values for A and B in order to move the crossing point. When the left error is greater than the right error, the crossing point is moved to the left, otherwise it is moved to the right. This is accomplished by the function FindAB. This algorithm guarantees that we will pick values of A and B that minimize the maximal approximation error for any n and m, provided that 1£ m £ n. Listing 6. C code to determine optimal scale and offset for some approximation Using this algorithm with the previous example, where n = 16 and m = 4, yields A = 3.525 and B = -2.525. This is illustrated in Figure 6. It can be seen that the error is equally distributed to the left and right of the crossing point, indicating that the maximal error has been minimized.
It should be noted that, by selecting A and B through the algorithm of Listing 6, we lose the property that the max( Ax+B, 0 )m is equal to 0 only for values of xn < 1/256. However, this doesn't hurt our approximation since the maximal error has been lowered. Table 1 shows the maximal approximation error for typical values of n and m. It also shows the optimal values of A and B for these values.
Table
1. Scale, offset and maximal error
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|