Gamasutra: The Art & Business of Making Gamesspacer
Exploring Sh: A New GPU Metaprogramming Toolkit
View All     RSS
August 30, 2014
arrowPress Releases
August 30, 2014
PR Newswire
View All

If you enjoy reading this site, you might also want to check out these UBM Tech sites:

Exploring Sh: A New GPU Metaprogramming Toolkit

July 16, 2004 Article Start Page 1 of 2 Next

Sh is a free, open-source system developed at the University of Waterloo that lets you program graphics processors (GPUs) in C++. Basically, it consists of a C++ library that supports run-time specification and compilation of GPU shader programs. This library uses operator overloading to build a clean, high-level API, so that defining Sh programs is as straightforward as defining functions in C++, and as expressive as writing shaders in a specialized shading language. In addition, Sh integrates with the scope rules of C++ in such a way that all the capabilities of C++ can be used to manipulate and modularize GPU code, including classes, templates, functions, and user-defined types. No additional glue code is required to bind Sh programs to the host application: they act like an extension of it. For instance, shader parameters and textures can just be declared as variables, then used inside shader definitions, and Sh will do the rest. Sh can be used as a shading language, for complex multipass rendering algorithms, or to implement general-purpose stream computations (such as simulation).

Why should you, as a game developer, be interested in Sh? First of all, Sh is a much more powerful, modular, and complete programming system than other available real-time shading languages. It's more than a shading language: it also tracks textures and shader parameters, and the associations between these and shaders. Using the object-oriented features of C++, you can share code between shaders, and encapsulate complex algorithms and data representations so that they can be more easily be reused. You can also easily use Sh to build custom compilers to convert your own data into shaders (metaprogramming). In general, you can write more sophisticated shaders far more efficiently in Sh than in other systems. Second, Sh can improve your productivity by eliminating a lot of the annoyances and glue code requirements of other shading languages. Third, you can use Sh to accelerate other game engine computations, such as simulations and AI. Since Sh compiles to both the GPU and the CPU, writing these components in Sh does not commit you to running them on the GPU. You can even defer that decision to runtime. For instance, you could profile the GPU and CPU at install time and decide to run simulation components on whichever processor leads to a load-balanced system. In general, Sh makes all the computational capabilities of a system available to you with a common interface.

In the future, we hope that Sh's ability to encapsulate data representations and algorithms will lead to a large set of implementations of advanced algorithms being made available as Sh classes and functions by researchers. Sh is also a useful platform for shader compiler research since it is completely open source. Finally, we plan to extend Sh to a number of other compilation targets, including parallel machines and game platforms. Sh's conceptual model is platform, vendor, and API independent. Ideally, this will ease porting between different platforms and allow greater reuse of code.

The Sh Architecture

Let's begin by looking at how Sh is structured. The library is built around a set of classes, such as ShPoint3f, ShVector3f, or ShMatrix4x4f, that can be used directly as a graphics utility library. A number of useful operators and functions are defined that act on objects of these classes. You can add or subtract vectors, take dot or cross products, do matrix/vector and matrix/point transformations, normalize vectors to unit length, and so forth. Sh also supports swizzling (extraction and rearrangement of elements of a tuple or matrix) and writemasking (assignment to only some elements of a tuple or matrix).

You can specify operations on Sh objects in two modes. In immediate mode, which is the default, operations take place as soon as they are specified. In retained mode, rather than executing a sequence of operations, Sh records them in a program object. Retained mode is indicated by wrapping a section of code in the keywords SH_BEGIN_PROGRAM and SH_END. Recorded operation sequences can then be compiled for a specified target (usually the GPU, although Sh can also dynamically generate code for the host CPU). Program objects can be loaded into the vertex and fragment shader units of GPUs, in which case they affect rendering with standard graphics APIs. Alternatively, they can be used directly as stream functions for general-purpose computation, without any need to invoke a graphics API.

In addition to supporting the dynamic generation of code, Sh also manages textures and streams. Textures act like arrays, and like other parameters are bound to Sh programs using the scope rules of C++. This means that data abstractions can be built around textures. For instance, suppose you want to build a special compressed texture type that is decompressed by a particular sequence of shader code. You can declare a class that encapsulates a built-in texture class to store the compressed data, but redefines the access operators to insert the necessary code into the calling shader. If your new class supports the same interface as one of the built-in textures, it can be used anywhere they can be used.

Streams are used to support a general-purpose computational model on GPUs. A stream program is like a shader: it is a function that maps a certain number of inputs to a certain number of outputs. Stream objects in Sh are similar to textures. They refer to a sequence of data in memory that can be acted upon or generated by stream programs. Stream programs can be applied to streams with a simple operator or function call syntax. Streams can also be decomposed into or constructed from individual channels of data. A sophisticated stream syntax is provided that supports many advanced features, such as shared substreams, conversion of parameters to inputs and the reverse, program composition, and currying.

Example: Blinn-Phong Shader

The simplest way to introduce Sh is with some examples. The following code defines a Blinn-Phong shader for a single point source (the shader equivalent of "Hello World") by defining a vertex shader and a fragment shader. This shader will also transform vertices into view space for lighting and into device space for rendering. A rendering produced with this shader is given in Figure 1.

Figure 1: Blinn-Phong lighting model, simple and with texture maps.

First we will define a number of global variables giving the transformation matrices and the parameters of the lighting model:

ShMatrix4x4f modelview; // MCS to VCS transformation
ShMatrix4x4f perspective; // VCS to DCS transformation

ShColor3f phong_kd; // diffuse color
ShColor3f phong_ks; // specular color
ShAttrib1f phong_spec_exp; // specular exponent
ShPoint3f phong_light_position; // VCS light position
ShColor3f phong_light_color; // light source color

ShProgram phong_vert, phong_frag;


We will build the vertex and fragment shaders themselves in an initialization function as follows:

void phong_init () {
// Create vertex shader
phong_vert = SH_BEGIN_PROGRAM("gpu:vertex") {
// Declare shader inputs
ShInputNormal3f nm; // normal vector (MCS)
ShInputPosition3f pm; // position (MCS)

// Declare shader outputs
ShOutputNormal3f nv; // normal (VCS)
ShOutputVector3f lv; // light-vector (VCS)
ShOutputVector3f vv; // view vector (VCS)
ShOutputColor3f ec; // irradiance
ShOutputPosition4f pd; // position (HDCS)

// Specify shader computations
ShPoint3f pv = (modelview | pm)(0,1,2);
vv = normalize(-pv);
lv = normalize(phong_light_position - pv);
nv = normalize(modelview | nm);
ec = phong_light_color * pos(nv|lv);
pd = perspective | pv;
} SH_END; // End of vertex shader

// Create fragment shader
phong_frag = SH_BEGIN_PROGRAM("gpu:fragment") {
// Declare shader inputs
ShInputNormal3f nv; // normal (VCS)
ShInputVector3f lv; // light-vector (VCS)
ShInputVector3f vv; // view vector (VCS)
ShInputColor3f ec; // irradiance

// Declare shader outputs
ShOutputColor3f fc; // fragment color

// Specify shader computations
vv = normalize(vv);
lv = normalize(lv);
nv = normalize(nv);
ShVector3f hv = normalize(lv + vv);
fc = phong_kd * ec;
fc += phong_ks * pow(pos(hv|nv), phong_spec_exp);
} SH_END; // End of fragment shader
} // End of phong_init


We have wrapped Sh shader program definitions in the SH_BEGIN_PROGRAM and SH_END keywords. The SH_BEGIN_PROGRAM returns a program object that will represent the recorded sequence of operations. Inputs and outputs to the program objects are indicated by appropriate Input and Output prefixes on instances of Sh types. The "|" operator is used for dot product and matrix multiplication, although you can also use a dot function for the former and "*" for the latter.

Once defined, the program objects phong_vert and phong_frag can be loaded into the vertex and fragment shading units of the GPU using the shBind API call. You can now use a normal graphics API to specify geometry, and the shaders will be applied to that geometry. Right now, Sh only supports OpenGL, although we are working on a DirectX binding and it should be available soon. In your graphics API, you need to set up the correct vertex attributes for the shaders you have loaded, and fragment and vertex shader pairs need to be consistent in their inputs and outputs. A set of rules based on type and order of declaration defines how shader inputs map onto vertex attributes. You can also ask program objects for a human-readable string describing the interface binding.

The "uniform" parameters of these shaders, that is, the values that are the same for all shaded vertices or fragments such as phong_kd, are simply referenced directly by the shader definitions. No additional glue code is required to set up these parameters, and a simple assignment (outside of a shader definition) is all that is needed to modify one. Which parameters get bound to each shader program is controlled by the scope rules of C++. For instance, we could have made the parameters data members of a class and defined the shader program objects in a member function. Then the member function creating the shader programs would have picked up the data members and an encapsulated shader would have been created. In general, Sh is designed to integrate with C++ cleanly, and most C++ modularity constructs can be used to with Sh programs. Many other programming techniques are enabled by this integration, and by the fact that C++ can manipulate Sh programs in arbitrary ways at runtime.

If we wanted to texture map this shader, instead of ShColor3f for phong_kd we could have used ShTexture2D<ShColor3f>. Then we would have to modify the shader definitions to pass in a texture coordinate, and then index the texture object. The bindings of textures work in exactly the same way as uniform parameters, so as with parameters, we can create data abstractions using the object-oriented features of C++.

The following code example encapsulates parameters as data members in a class, uses template arguments and construction-time arguments to parameterize the shader, uses a template class to coordinate the vertex shader outputs and fragment shader inputs (incidentally, also demonstrating the more generic, template-based mechanism for declaring Sh types, which can also be used to declare tuples of arbitrary length), and finally uses C++ control constructs to manipulate shader code: in this case, by unrolling a loop to support multiple light sources, using C++ arrays to hold multiple light source properties. We can also use ordinary C++ functions to implement functions in shader code. This is roughly how the standard library functions such as normalize (and, in fact, the operators) are implemented. A rendering produced with this shader is also given in Figure 1.

class BlinnPhong {
// Declare parameters and textures as data members
ShTexture2D<ShColor3f> kd;
ShTexture2D<ShColor3f> ks;
ShAttrib1f spec_exp;
ShPoint3f light_position[NLIGHTS];
ShColor3f light_color[NLIGHTS];

// Declare I/O type to coordinate vertex and fragment shaders
template <ShBindingType IO> struct VertFrag {
ShPoint<4,IO,float> pv; // position (VCS)
ShTexCoord<2,IO,float> u; // texture coordinate
ShNormal<3,IO,float> nv; // normal (VCS)
ShColor<3,IO,float> ec; // total irradiance

// Declare program objects for shaders
ShProgram vert, frag;

// Constructor: parameterized by texture resolution
BlinnPhong (int res) : kd(res,res), ks(res,res) {

// Create vertex shader
vert = SH_BEGIN_PROGRAM("gpu:vertex") {
// Declare shader inputs
ShInputNormal3f nm; // normal vector (MCS)
ShInputTexCoord2f u; // texture coordinate
ShInputPosition3f pm; // position (MCS)

// Declare shader outputs
VertFrag<SH_OUTPUT> vf;
ShOutputPosition4f pd; // position (HDCS)

// Specify shader computations
vf.pv = modelview | pm;
vf.u = u;
vf.nv = normalize(modelview | nm);
pd = perspective | vf.pv;
for (int i=0; i
ShVector3f lv =
normalize(light_position[i] - vf.pv(0,1,2)); += light_color[i] * pos(vf.nv|lv);
} SH_END; // End of vertex shader

// Create fragment shader
frag = SH_BEGIN_PROGRAM("gpu:fragment") {
// Declare shader inputs
VertFrag<SH_INPUT> vf;

// Declare shader outputs
ShOutputColor3f fc; // fragment color

// Specify shader computations
ShVector3f vv = normalize(-vf.pv(0,1,2));
ShNormal3f nv = normalize(vf.nv);
fc = kd(vf.u) *;
ShColor3f kst = ks(vf.u);
for (int i=0; i
ShVector3f lv =
normalize(light_position[i] - vf.pv(0,1,2));
ShVector3f hv = normalize(lv + vv);
fc += kst * pow(pos(hv|nv),spec_exp)
* light_color[i];
} SH_END; // End of fragment shader
} // End of constructor
}; // End of BlinnPhong class


In the fragment shader, notice that a texture read is indicated with the "()" operator on a texture object, as in kd(vf.u) and ks(vf.u). This operator treats the texture as a tabulated function with a normalized texture coordinate range of 0 to 1 in each coordinate. Sh also supports the "[]" operator for texture lookups, which works the same but places a texel at each integer. The "[]" lookup operator is useful when textures are being used as arrays to hold data structures (for instance, a ray-tracer accelerator). Sh also supports several additional texture types for rectangular textures, 1D and 3D textures, and cube textures.

Sh programs can write to inputs and read from outputs. Writing to an input does not change the original data; Sh inputs are pass by value. When mapping to a backend that does not support these operations, Sh will introduce an additional temporary automatically. Temporaries (including automatically introduced temporaries) are also always initialized to zero if their value is used before they are assigned to. These transformations are included as conveniences to simplify the input code. For instance, the ability to use += on a zero-initialized output is very useful if you want to accumulate several sources of light in an output color but don't want to keep track of which source is "first".

Several additional shader examples are given in Figure 2. Many of these examples use the Perlin and Worley noise functions built into Sh. Wood, for example, adds some Perlin noise to a quadratic function and then feeds the result through a periodic sawtooth function stored in a texture map. The Worley noise functions are based on the kth nearest distance to a set of procedurally generated feature points (we use a jittered grid). The library function computing the Worley noise functions can be parameterized with different distance functions. The giraffe shader shown here uses a Manhattan distance metric with the Worley function and passes the distance to the closest feature point through a threshold function.

Figure 2: Some more example shaders. Many of these examples use the Perlin and Worley noise functions built into Sh.


Article Start Page 1 of 2 Next

Related Jobs

Raven Software / Activision
Raven Software / Activision — Madison, Wisconsin, United States

Sr. Software Engineer (Gameplay)
Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States

Senior AI Engineer
Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States

Lead Tools Engineer - Infinity Ward
Insomniac Games
Insomniac Games — Burbank , California, United States

Senior Engine Programmer