Editor’s Note: Welcome to a new column about game programming, written by Rob Wyatt, a programmer at DreamWorks Interactive. In Wyatt’s World, Rob will look at the details of programming games, in a format that revolves around you, the reader. It’s going to require your input and ideas as readers, so if you have programming questions or topics you’d like to see addressed here in Wyatt’s World, let him know at [email protected].
And since it’s a new column, Rob’s decided that he may as well cover a new processor, Intel’s Pentium III. He’ll attempt to answer some common questions, and provide some background information and working examples. What is all the fuss about with the Pentium III?
This new processor contains 70 new multimedia instructions, or "Streaming SIMD Instructions" as Intel would like them to be called. Some of these new SIMD (Single Instruction Multiple Data) instructions provide an extension to MMX, and like the existing MMX instructions, they are integer-based. However, of real interest to game developers who work with 3D graphics and physics are the SIMD floating-point instructions. Gamasutra already has published a couple of articles on the subject, including "Implementing a 3D SIMD Geometry and Lighting Pipeline" ( http://www.gamasutra.com/view/feature/3331/implementing_a_3d_simd_geometry_.php) and "Optimizing Games for the Pentium III Processor" ( http://www.gamasutra.com/view/feature/3323/optimizing_games_for_the_pentium_.php).
The new Pentium III Streaming SIMD instructions are functionally similar to the instructions AMD added to the K6 with its 3DNow! instruction set, but the Pentium III instructions are implemented substantially differently. Whereas the K6-2 processor is a SIMD device, it only operates on two floating-point numbers at once. On the other hand, the Pentium III operates on four floating-point numbers at once. On the K6-2, the pair of 32-bit floating point values are held within one of the 64-bit MMX registers, which, as everybody knows, are aliased onto the floating-point registers. To use the 3DNow! instructions (which are in effect an extension to MMX), the processor had to operate in MMX mode, but along with MMX comes all of the associated restrictions, such as no FPU and the overhead of the EMMS instruction when the FPU is required. With the Pentium III, Intel has solved the problem of register aliasing and allows wider registers by adding eight new registers, called XMM0 to XMM7. Each register is 128 bits wide, and holds four IEEE 32-bit floats. Fortunately, the SIMD registers can be used while the processor is in floating-point or MMX mode, although it is better if it is in the latter mode. You may experience scheduling problems in the processor if you try to interleave floating-point instructions with SIMD instructions.
The execution speed of the new SIMD instructions is good. For example, the MULPS instruction (Multiply Packed Scalar, Packed Scalar, which means it independently multiplies all four elements of the register) has a latency of five cycles, and a throughput of one instruction every two cycles, which equates to two floating-point multiplies per clock cycle (see Figure 1). This throughput is typical of most of the floating-point SIMD instructions; the only real exception is the full-precision divide and square root, which take a whopping 36 and 58 cycles, respectively! Fortunately, there are instructions which approximate the results of both the reciprocal and reciprocal square root, which are each accurate to 12 bits of mantissa, and these instructions take only two cycles. This makes normalizing vectors a little faster.
Without these new instructions, the Pentium III is functionally identical to the Pentium II, and at the same clock speed, there is no difference in performance. With this in mind, it stands to reason that only applications that take advantage of the new instructions will benefit from a Pentium III. However, using the right algorithms, the benefit of the Pentium III can be huge, and modern games use many such algorithms.