One of the largest problems with getting shaders into a game seems to be the learning curve associated with shaders. Simply stated, shaders are not something that your lead graphics programmer can implement over the weekend. There are two main issues with getting shaders implemented in your game:
1. Understanding what shaders can do and how they replace the existing graphics pipeline.
2. Getting the supporting code implemented into your game so that you can use shaders as a resource.
In this article we're going to continue the series of Gamasutra articles about shaders by examining how to make shaders work. The actual integration of shader support is the stuff for a future article. (Note: You don't need a high-end video card to try your hand at writing shaders. All you need is the DirectX 9.0 SDK installed. With that you can select the reference device (REF). While this software driver will be slow, it'll still give you the same results as DirectX 9 capable video card.) RenderMonkey works on any hardware that supports shaders, not just ATI's hardware.
If you have already read Wolfgang Engel's article, Mark Kilgard's and Randy Fernando's Cg article or you've perused the DirectX 9 SDK documentation, then you've got a fairly good idea of the capabilities of the High-Level Shader Language (HLSL) that's supported by DirectX 9. HLSL, Cg, and the forthcoming OpenGL shading language are all attempts to make it as easy to write shaders as possible. You no longer have to worry (as much) about allocating registers, using scratch variables, or learning a new form of assembly language. Instead, once you've set up your stream data format and associated your constant input registers with more user-friendly labels, using shaders in a program is no more difficult than using a texture.
Rather than go through the tedious setup on how to use shaders in your program, I'll refer you to the DirectX 9 documentation. Instead, I'm going to focus on a tool ATI created called RenderMonkey. While RenderMonkey currently works on DirectX high and low-level shader languages, ATI and 3Dlabs are working to implement support for OpenGL 2.0's shader language in RenderMonkey that we should see in the next few months. The advantage of a tool like RenderMonkey is that it lets you focus on writing shaders, not worrying about infrastructure. It has a nice hierarchical structure that lets you set up a default rendering environment and make changes at lower levels as necessary. Perhaps the biggest potential advantage of using RenderMonkey is that the RenderMonkey files are XML files. Thus by adding a RenderMonkey XML importer to your code or an exporter plug-in to RenderMonkey you can use RenderMonkey files in your rendering loop to set effects for individual passes. This gives RenderMonkey an advantage over DirectX's FX files because you can use RenderMonkey as an effects editor. RenderMonkey even supports an "artist's mode" where only selected items in a pass are editable.
While HLSL is very C-like in its semantics, there is the challenge of relating the input and output of the shaders with what is provided and expected by the pipeline. While shaders can have constants set prior to their execution, when a primitive is rendered (i.e., when some form of a DrawPrimitive call is made) then the input for each vertex shader is the vertex values provided in the selected vertex streams. After each vertex shader call, the pipeline breaks that vertex call into individual pixel calls and uses the (typically) interpolated values as input to the pixel shader, which then calculates the resulting color(s) as output from the pixel shader. This is shown in Figure 1, where the path from application space, through vertex processing then finally to a rendered pixel is shown. The application space shows where shaders and constants are set in blue text. The blue boxes show where vertex and pixel shaders live in the pipeline.
Figure 1. How shaders fit into the graphics pipeline. (Click on image for fullsized)
The inputs to the vertex shader function contain the things you'd expect like position, normals, colors, etc. HLSL can also use things like blend weights and indices (used for things like skinning), and tangents and binormals (used for various shading effects). The following tables show the inputs and output for vertex and pixel shaders. The [n] notation indicates an optional index.
The output of vertex shaders hasn't changed from the DirectX 8.1 days. You can have up to two output colors, eight output texture coordinates, the transformed vertex position, and a fog and point size value.
The output from the vertex shader is used to calculate the input for the pixel shaders. Note there is nothing preventing you from placing any kind of data into the vertex shader's color or texture coordinate output registers and using them for some other calculations in the pixel shader. Just keep in mind that the output registers might be clamped and range limited, particularly on hardware that doesn't support 2.0 shaders.
DirectX 8 pixel shaders supported only a single color register to specify the final color of a pixel. DirectX 9 has support for multiple render targets (for example, the back buffer and a texture surface simultaneously) and multi-element textures (typically used to generate intermediate textures used in a later pass). However you'll need to check the CAPS bits to see what's supported by your particular hardware. For more information, check the DirectX 9 documentation. While RenderMonkey supports rendering to a texture on one pass and reading it in another, I'm going to keep the pixel shader simple in the following examples.
Aside from the semantics of the input and output mapping, HLSL gives you a great deal of freedom to create shader code. In fact, HLSL looks a lot like a version of "C" written for graphics. (Which is why NVIDIA calls their "C" like shader language Cg, as in "C-for-Graphics"). If you're familiar with C (or pretty much any procedural programming language) you can pick up HLSL pretty quickly. What is a bit intimidating if you're not expecting it is the graphics traits of the language itself. Not only are there the expected variable types of boolean, integer and float, but there's also native support for vectors, matrices, and texture samplers, as well as swizzles and masks for floats, that allow you to selectively read, write, or replicate individual elements of vectors and matrices.
This is due to the single-instruction multiple-data (SIMD) nature of the graphics hardware. An operation such as;
results in an element-by-element multiplication since type vector is an array of four floats. This is the same as:
where I've used the element selection swizzle and write masks to show the individual operations. Since the hardware is designed to operate on vectors, performing an operation on a vector is just as expensive as performing one on a single float. A ps_1_x pixel shader can actually perform one operation on the red-green-blue elements of a vector while simultaneously performing a different operation on the alpha element.
In addition to graphics oriented data types there is also a collection of intrinsic functions that are oriented to graphics, such as dot product, cross product, vector length and normalization functions, etc. The language also supports things like multiplication of vectors by matrices and the like. Talking about it is one thing, but it's much easier to comprehend when you have an example of in front of you, so let's start programming.