It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Ronen Zohar & Haim Barad
Gamasutra
April 16, 1999

Letters to the Editor:
Write a letter
View all letters


Features

 

Contents

Introduction

An Introduction to Streaming SIMD Extensions

The 3D Pipeline Structure and Body

Simple Lighting

Ideas for future Improvements

Simple Lighting

There are two main issues to consider when converting a lighting pipeline to SIMD. The first concern, which pops up with every SIMD algorithm, is "What do I do if I have a branch in my code?". There is no simple answer to that question – the solution depends on the nature of the branch. Solutions vary between "execute all the code paths in the branch and then glue the SIMD results according to the branch condition" and "try to manipulate the data in such manner that the branch is no longer needed".

In our lighting code that we previously showed, this branch appeared: if d (the dot product) is greater than zero, then multiply it by the color constants and add the results to the accumulated color. Let’s handle this branch using the second technique – changing things so that the branch is no longer needed.

Let’s try to clamp all of the negative values of d to zero. All the positive values (the ones that I’m interested in) will be left unchanged and the negative values will be clamped to zero and won’t affect the overall result. This clamping is simple and it’s done by the maximum function (simd_max, which uses the new maxps instruction) between the four different SIMD elements and zero. We can still use a scalar branch to handle the case of all four elements of the SIMD data are less than zero. We can use the move_mask function (which generates an integer bit mask representing the sign of each one of the SIMD elements) to determine if all of the elements are negative.

There’s another issue to consider when using this solution. In order to normalize a vector we have to calculate square root of the vector’s magnitude, and than divide each of the vector elements by the calculated root. The Pentium III has a new SIMD and scalar instructions that estimate the reciprocal of the square root of a given value (rsqrtps, rsqrtss) . These instructions are already encapsulated in the classes as the functions rsqrt and rsqrt_nr (this latter function calculates the reciprocal of the square root with one Newton-Raphson iteration). Since accuracy in lighting is much less important than accuracy when calculating a vertex position, I’ll use the regular version of the function in the lighting code.

The lighting portion of the pipeline looks like this:

switch (light->light_type) {

case DIRECTIONAL:

// assume that light direction is already normalized.

dot = light->dir.x * norm->x + light->dir.y * norm->y + light->dir.z * norm->z;

break;

case POINT:

dir.x = pos->x - light->pos.x;

dir.y = pos->y - light->pos.y;

dir.z = pos->z - light->pos.z;

magnitude = dir.x * dir.x + dir.y * dir.y + dir.z * dir.z;

len = rsqrt(magnitude);

dot = (dir.x * norm->x + dir.y * norm->y + dir.z * norm->z) * len;

break;

default:

}

if (move_mask(dot) == 0xf) // if all dot elements are less than zero

continue; // skip this light.

dot = simd_max(dot,_ZERO_); // zero all simd elements below zero.

diffuse.r += light_color.r * mat_color.r * dot;

diffuse.g += light_color.g * mat_color.g * dot;

diffuse.b += light_color.b * mat_color.b * dot;

 

Converting the Data For a Serial Rasterizer

Most rasterizers still need their data to be submitted in AOS format. This means that transposing the data from our internal data structure to AOS (this process is also known as "de-swizzling"). With Direct3D, you also have to pack the colors from floats in the [0..1] range to a DWORD ARGB format of values between 0 and 255. Here’s how to convert the data.

The first step is to pack the colors. Most of the instructions that deal with data arrangement and conversion inside the SIMD register are not encapsulated in the SIMD classes, but they do appear as intrinsic instructions. Color packing comprises three parts: clamping the values above one down to one (the maximum color value); changing the scale of the colors from [0..1] to [0..255]; and converting the values to integers and rearranging them in the ARGB manner.

Here is how to clamp the values above one:

diffuse.r = simd_min(diffuse.r,_ONE_);

diffuse.g = simd_min(diffuse.g,_ONE_);

diffuse.b = simd_min(diffuse.b,_ONE_);

Here’s how to change their scale:

diffuse.r *= _255_;

diffuse.g *= _255_;

diffuse.b *= _255_;

The conversion instruction cvttps2pi operates only on the lower half of the SIMD data (2 elements only). How do we convert all four elements? We can convert and pack the lower two elements and then, using movhlps, we will move the upper two elements to the lower segment and convert them. The packing is done by using MMX bitwise shifts and ors:

// conversion of lower two

r = _mm_cvtt_ps2pi(diffuse.r);

g = _mm_cvtt_ps2pi(diffuse.g);

b = _mm_cvtt_ps2pi(diffuse.b);

// packing

r = _m_psllqi(r,16);

g = _m_psllqi(g,8);

out[0] = _m_por(r,_m_por(g,b));

// conversion of upper two

r = _mm_cvtt_ps2pi(_mm_movehl_ps
(diffuse.r,diffuse.r));

g = _mm_cvtt_ps2pi(_mm_movehl_ps
(diffuse.r,diffuse.r));

b = _mm_cvtt_ps2pi(_mm_movehl_ps
(diffuse.r,diffuse.r));

// packing

r = _m_psllqi(r,16);

g = _m_psllqi(g,8);

out[1] = _m_por(b,_m_por(r,g));

The second step is to transpose the SOA vertex position to four AOS positions. We can perform this transposition using the _MM_TRANSPOSE4_PS macro defined in xmmintr.h. This macro transposes a 4x4 matrix, as passed by its line. The output from this macro gives us the first vertex (x,y,z,we) in x, the second vertex in y, and so on. The syntax looks like this:

_MM_TRANSPOSE4_PS(x,y,z,we)


Ideas for future Improvements


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2003 CMP Media LLC

privacy policy
| terms of service