| |
|
|
||||
![]() |
||||||
| |
|
|||||
|
A Non-Integer Power Function on the Pixel Shader Power Function on the Pixel Shader Approximating a power function on the pixel shader requires us to translate the preceding mathematical reasoning into the pixel shader assembly language. Doing so requires us to compute the function max( Ax+B, 0 )m through the set of available microcode instructions. We also need a way to specify the variables present in this equation, namely A, B, m and x. We can rule out a number of these variables easily: the input variable x will simply be stored in a general-purpose register and the exponent m will be decided in advance. For variables A and B we will consider two scenarios. At first, they will be fixed ahead of time and their content will be stored in constant registers. In the second scenario, we will show how A and B can be modified dynamically on a per pixel basis. Constant
Exponent Now we need to compute max( Ax+B, 0 )m. First, the max( , 0) function is taken care of using the _sat modifier available on the pixel shader. Then, we pick m as a power of 2 selected to approximate with enough precision the target exponent n. We then perform a bias and scaling with mad, followed by log2 m self-multiplications with mul. The result is the pixel shader of Listing 7, where the power function is applied to each element of the vector r0. It should be noted that log2 m + 1 is equal to the number of pixel shader stages required. Therefore, in an actual shader, the number of free instructions could limit m.
ad_sat
r0, c0, r0, c1 ;
r0 = max( Ax + B, 0 ) Listing 7. Basic code for approximating a power function There is an important problem with the previous pixel shader. In fact, since all constant registers must be between -1 and 1, we have a very limited range of values for A and B. Table 1 shows that, for typical values of n and m, A is always greater than 1. Therefore, in practice, the proposed pixel shader is invalid. To limit ourselves to scale and offset values in the allowed range, we first rewrite the approximation function as follow:
max( Ax+B, 0 )m = k max( (Ax+B)k-1/m, 0 )m = k max( A'x+B', 0 )m where we introduced variables A' and B' defined as:
Therefore, A and B can be written:
A = A'k1/m B = B'k1/m From these last two relations we can see that, with A' and B' between -1 and 1, we can obtain values of A and B between -k1/m and k1/m. Given that k is greater than 1, this translates to an increased range for A and B. The pixel shader lets us compute k max(A'x+B', 0)m with k greater than 1 through its multiply instruction modifiers _x2 and _x4. If we take the program of Listing 7 and apply such modifiers to some or each of the mad or mul instructions, we will get a k greater than 1. It is possible to compute k given a sequence of multiply instruction modifiers. This is performed by the function ComputeK in Listing 8. Before calling any function in this listing, make sure to correctly initialize the global array Multiplier so that it contains the correct multiply instruction modifier for each instruction (either 1, 2 or 4). If we want to know which value of A' and B' correspond to some values of A and B, we can use the computed k and the equation presented earlier, this is done by the function ComputeAB. We go in
the opposite direction and find the maximal values for A and B given k
and m, as performed by the function MaxAB.
This result can then be converted in a maximal value for n, as
computed by MaxN.
Listing 8. C code for computing corrected scale and offset A' and B' It can be seen that the maximum value n is obtained when the modifier _x4 is used for each instruction in the code. Given that the value for A' and B' are stored in constant registers c0 and c1 respectively, the pixel shader at Listing 9 performs this approximation. ps.1.0 Listing 9. Corrected code for approximating a power function Table 2 shows the maximal range for A and B and the maximal exponents n that can be obtained with the previous shader for various values of m.
Naturally, if for a given m we want to limit ourselves to exponents smaller than the maximal n listed in this table, we can remove some _x4 modifiers or replace them by _x2. When doing this, we figure the new maximal n by using the MaxN function with an updated Multiplier array. Not using _x4 modifiers at each instruction is often a good idea since it can help reduce the numerical imprecision often present in pixel shaders. This lack of precision is mostly noticeable for small values of n since they translate to values of c0 and c1 close to zero. Such small numbers may suffer from an internal fixed-point representation and yield visual artifacts. Per Pixel
Exponent Once we have picked these values we simply take a texture image of the desired n and translate each texel to their corresponding value of A. The texels are then translated to A' through a multiplication by k-1/m. This time, however, instead of storing the result in a constant register, we update the texture. The program in Listing 10 executes the process.
Listing 10. C code to generate an approximation texture Once such a texture has been generated and placed in texture stage 0, we can easily extract A' inside a pixel shader. We can also extract B' by recalling that B = 1 - A. Since B' is the result of multiplying B by k-1/m, we can write B' = (1 - A)k-1/m = k-1/m - A'. Since m is fixed, we store k-1/m in constant register c0 and perform a simple subtraction to extract B'. The pixel shader of Listing 11 approximates a power function with n varying per pixel. This shader uses modifiers _x4 for each instruction, the texture and the constant register c0 should therefore be generated accordingly.
Listing
11. Code for approximating a power function with This shader
seems to show that one extra instruction is required to handle an exponent
n varying per pixel. However, this is only true if we cannot spare additional
texture space. In the case where a texture or a texture component is still
available, we could precompute the value of B' and store it there. However,
we believe that textures are often a more limited resource than pixel
shader instructions, this is why we suggest you use the approach presented
above.
|
||||||||||||||||
|
|