It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Philippe Beaudoin
[Author's Bio]
and Juan Guardado
[Author's Bio]
Gamasutra
August 1, 2002

Traditional Techniques

Power Function on the Pixel Shader

Applications

The Pixel Shader

Printer Friendly Version
   



This feature is an excerpt from Direct3D ShaderX: Vertex and Pixel Shader Tips and Tricks, edited by Wolfgang Engel.

[Purchase Book]

Letters to the Editor:
Write a letter
View all letters


Features

A Non-Integer Power Function on the Pixel Shader

Power Function on the Pixel Shader

Approximating a power function on the pixel shader requires us to translate the preceding mathematical reasoning into the pixel shader assembly language. Doing so requires us to compute the function max( Ax+B, 0 )m through the set of available microcode instructions. We also need a way to specify the variables present in this equation, namely A, B, m and x.

We can rule out a number of these variables easily: the input variable x will simply be stored in a general-purpose register and the exponent m will be decided in advance. For variables A and B we will consider two scenarios. At first, they will be fixed ahead of time and their content will be stored in constant registers. In the second scenario, we will show how A and B can be modified dynamically on a per pixel basis.

Constant Exponent
Let's first study the case where A and B do not change per pixel. In this scenario, A is placed in the constant register c0 while B is placed in c1. This means that the exponent n being approximated is constant as long as c0 and c1 remain unchanged.

Now we need to compute max( Ax+B, 0 )m. First, the max( …, 0) function is taken care of using the _sat modifier available on the pixel shader. Then, we pick m as a power of 2 selected to approximate with enough precision the target exponent n. We then perform a bias and scaling with mad, followed by log2 m self-multiplications with mul. The result is the pixel shader of Listing 7, where the power function is applied to each element of the vector r0. It should be noted that log2 m + 1 is equal to the number of pixel shader stages required. Therefore, in an actual shader, the number of free instructions could limit m.


ps.1.0

...                        ; Place input value x into r0

    ad_sat   r0, c0, r0, c1    ; r0 = max( Ax + B, 0 )
    mul       r0, r0, r0       ; r0 = max( Ax + B, 0 )^2
    mul       r0, r0, r0       ; r0 = max( Ax + B, 0 )^4

    .
    .                          ; repeat (log2 m) times the mul instruction
    .
    mul       r0, r0, r0       ; r0 = max( Ax + B, 0 )^m

Listing 7. Basic code for approximating a power function

There is an important problem with the previous pixel shader. In fact, since all constant registers must be between -1 and 1, we have a very limited range of values for A and B. Table 1 shows that, for typical values of n and m, A is always greater than 1. Therefore, in practice, the proposed pixel shader is invalid.

To limit ourselves to scale and offset values in the allowed range, we first rewrite the approximation function as follow:

max( Ax+B, 0 )m = k max( (Ax+B)k-1/m, 0 )m = k max( A'x+B', 0 )m

where we introduced variables A' and B' defined as:

A' = Ak-1/m B' = Bk-1/m

Therefore, A and B can be written:

A = A'k1/m B = B'k1/m

From these last two relations we can see that, with A' and B' between -1 and 1, we can obtain values of A and B between -k1/m and k1/m. Given that k is greater than 1, this translates to an increased range for A and B.

The pixel shader lets us compute k max(A'x+B', 0)m with k greater than 1 through its multiply instruction modifiers _x2 and _x4. If we take the program of Listing 7 and apply such modifiers to some or each of the mad or mul instructions, we will get a k greater than 1.

It is possible to compute k given a sequence of multiply instruction modifiers. This is performed by the function ComputeK in Listing 8. Before calling any function in this listing, make sure to correctly initialize the global array Multiplier so that it contains the correct multiply instruction modifier for each instruction (either 1, 2 or 4). If we want to know which value of A' and B' correspond to some values of A and B, we can use the computed k and the equation presented earlier, this is done by the function ComputeAB.

We go in the opposite direction and find the maximal values for A and B given k and m, as performed by the function MaxAB. This result can then be converted in a maximal value for n, as computed by MaxN.

// Table containing multiply instruction modifier for each instruction (1, 2 or 4)
int Multiplier[] = { 4, 4, 4, 4, 4, 4, 4, 4 };

// Compute value of k at instruction i given a table of multiply instruction modifiers
double ComputeK( int i )
{
   if( i == 0 )
      return Multiplier[i];

   double Temp = ComputeK( i-1 );
   return Multiplier[i] * Temp * Temp;
}

// Compute values of A' and B' given A and B and a multiplier table
// LogM: log of m in base 2 (number of instructions - 1)
void ComputeApBp( int LogM, double A, double B, double &APrime, double &BPrime )
{
   double Temp = 1.0/MaxAB( LogM );     // Note that k -1/m = 1/MaxAB

   APrime = A * Temp;
   BPrime = B * Temp;
}

// Compute maximum absolute values for A and B given some m and a multiplier table
// LogM: log of m in base 2 (number of instructions - 1)
double MaxAB( int LogM )
{
   double m = pow( 2.0, LogM );           // Find the value of m
   double K = ComputeK( LogM );         // Compute K
   return pow( K, 1.0/m );
}

// Compute maximum possible exponent given some m and a multiplier table
// LogM: log of m in base 2 (number of instructions - 1)
double MaxN( int LogM )
{
   double m = pow( 2.0, LogM );           // Find the value of m
   double Max = MaxAB( LogM );

   double A;
   double B;

   double n0 = m;                                // Lower bound for maximal exponent
   double n1 = 5000;                           // Upper bound for maximal exponent
   double n;=

   do
   {
     n = (n0 + n1)/2.0;

     FindAB( n, m, A, B );

     if( fabs(A) > Max || fabs(B) > Max )
        n1 = n;
     else
        n0 = n;

   } while( fabs( n0 - n1 ) > Epsilon );

   return n;
}

Listing 8. C code for computing corrected scale and offset A' and B'

It can be seen that the maximum value n is obtained when the modifier _x4 is used for each instruction in the code. Given that the value for A' and B' are stored in constant registers c0 and c1 respectively, the pixel shader at Listing 9 performs this approximation.

ps.1.0

... ; Place input value x into r0

mad_x4_sat    r0, c0, r0, c1             ; r0 = 4 * max( A'*x + B', 0 )
mul_x4            r0, r0, r0                   ; r0 = 4 * (4*max( A'*x + B', 0 ))^2
mul_x4            r0, r0, r0                   ; r0 = 4 * (4*(4*max( A'*x + B', 0 ))^2)^2
.
.                                                       ; repeat (log2 m) times the mul
.
mul_x4            r0, r0, r0                   ; r0 = 4^(2m-1) * max( A'*x + B', 0 )^m

Listing 9. Corrected code for approximating a power function

Table 2 shows the maximal range for A and B and the maximal exponents n that can be obtained with the previous shader for various values of m.



Table 2. Maximal values of A, B and n available depending
on approximation exponent m
.

Naturally, if for a given m we want to limit ourselves to exponents smaller than the maximal n listed in this table, we can remove some _x4 modifiers or replace them by _x2. When doing this, we figure the new maximal n by using the MaxN function with an updated Multiplier array.

Not using _x4 modifiers at each instruction is often a good idea since it can help reduce the numerical imprecision often present in pixel shaders. This lack of precision is mostly noticeable for small values of n since they translate to values of c0 and c1 close to zero. Such small numbers may suffer from an internal fixed-point representation and yield visual artifacts.

Per Pixel Exponent
Until now we have considered that the exponent to approximate n was fixed per pixel and could be stored in constant registers. However, being able to vary the exponent based on the result of a texture look-up is sometimes required. The proposed pixel shader trick can be extended to support that. To do so we must decide ahead of time a value for m and a sequence of multiplication modifiers. These must be chosen in order to cover all possible values of n, for n> m.

Once we have picked these values we simply take a texture image of the desired n and translate each texel to their corresponding value of A. The texels are then translated to A' through a multiplication by k-1/m. This time, however, instead of storing the result in a constant register, we update the texture. The program in Listing 10 executes the process.


// Translates a texture of exponents n into values that
// can be directly used by the pixel shader
// Texture: monochrome texture in custom format
// ResX: X resolution of the texture
// ResY: Y resolution of the texture
// LogM: log of m in base 2 (number of instructions - 1)
void TranslateTexture( double** Texture, int ResX, int ResY, int LogM )
{
   double A;
   double APrime;
   double Dummy;
   double m = pow( 2.0, LogM ); // Find the value of m

   for( int i=0; i<ResX; ++i )
      for( int j=0; j<ResY; ++j )
      {
        FindAB( Texture[i][j], m, A, Dummy );
        APrime = A/MaxAB( LogM );     // Compute A'. Note that k -1/m = 1/MaxAB
        assert( fabs(APrime) <= 1 );  // If assert fails select another m
                                      // or change the multipliers

        Texture[i][j] = APrime; // Update the texture
     }

}

Listing 10. C code to generate an approximation texture
based on an exponent texture

Once such a texture has been generated and placed in texture stage 0, we can easily extract A' inside a pixel shader. We can also extract B' by recalling that B = 1 - A. Since B' is the result of multiplying B by k-1/m, we can write B' = (1 - A)k-1/m = k-1/m - A'. Since m is fixed, we store k-1/m in constant register c0 and perform a simple subtraction to extract B'.

The pixel shader of Listing 11 approximates a power function with n varying per pixel. This shader uses modifiers _x4 for each instruction, the texture and the constant register c0 should therefore be generated accordingly.

ps.1.0

tex t0

...

sub         r1, c0, t0
mad_x4_sat  r0, t0,r0,r1
mul_x4      r0, r0, r0
mul_x4      r0, r0, r0
.
.
.
mul_x4      r0, r0, r0



; Sample the value c0 for approximation

; Place input value x into r0

; Compute B' = 1/MaxAB(LogM) - A'
; r0 = 4 * max( A'*x + B', 0 )
; r0 = 4 * 16 * max( A'*x + B', 0 )^2
; r0 = 4 * 16 * 256 * max( A'*x + B', 0 )^4

; repeat (log2 m) times the mul

; r0 = 4^(2m-1) * max( A'*x + B', 0 )^m

Listing 11. Code for approximating a power function with
an exponent n varying per pixel

This shader seems to show that one extra instruction is required to handle an exponent n varying per pixel. However, this is only true if we cannot spare additional texture space. In the case where a texture or a texture component is still available, we could precompute the value of B' and store it there. However, we believe that textures are often a more limited resource than pixel shader instructions, this is why we suggest you use the approach presented above.

As a last remark, we can note that the power function often only needs to be computed on a scalar. Therefore, provided that the input value x is placed in the alpha channel, the pixel shader can co-issue instructions. In such a case, we can also limit our usage of constant registers by using only the alpha channel of the constants. This technique is applied in the pixel shaders presented in the rest of the text.

______________________________________________________

Applications


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2003 CMP Media LLC

privacy policy
| terms of service