Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Designing Fast Cross-Platform SIMD Vector Libraries
arrowPress Releases
August 1, 2021
Games Press
View All     RSS
If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Designing Fast Cross-Platform SIMD Vector Libraries

January 20, 2010 Article Start Previous Page 5 of 5

Intel Compiler, the "Black Magic"

The Microsoft compiler that ships with Visual Studio so far performed all the tests. Now it's time to switch gears and compile the same code with the Intel Compiler.

I have heard about the Intel Compiler before and its reputation of being the fastest. When I built the sample code with Intel, I was really impressed. Indeed Intel has produced a phenomenal compiler that gave me not only faster results, generally speaking, but also unexpected ones.

Let's start by checking out the assembler dump of the three-band equalizer compared to the code compiled using the Microsoft compiler.

Click here to download the table document -- please refer to Table 7.

The Intel compiler was able to shrink the code by about 20 percent. And the result for the "Verlet Integrator" and the "Constraint Solver" respectively were about 7 percent and 6 percent smaller.

But what was the real "Black Magic" was that the generic VClass made the code get even smaller. As I inspected the code the Intel compiler does better with overloaded operators than procedural calls. Another interesting result is that the data encapsulation inside a class bloats the code very little, and in fact for the cloth did not at all. This made all the demo statistics flatten more or less.

Here are the screenshots of the cloth demo using the Intel Compiler.

Figure 3 - Cloth Demo Using Intel Compiler

Figure 4 - Three Band EQ Using Intel Compiler

Final Statistics

Now all the statistics together for comparison:


Although the Intel compiler has solved the problem of code bloat generated by data inside a class and overloaded operators, I still do not endorse this approach for a cross platform SIMD library. This is because I have done similar tests with six different compilers GCC and SN systems for PS2, PSP, and PS3 -- and they all produce worse results than the Microsoft compiler, meaning they bloat the code even more. So, unless you are specifically writing a library that will only run on Windows, your best bet is to still follow the key points of this article for your SIMD vector library.

Sample Code

The sample code of this article along with all the comparison tables can be downloaded from the URL below:


[1] Microsoft, XNA Math Library Reference

[2] Intel, SHUFPS -- Shuffle Single-Precision Floating-Point Values

[3] Jakobsen, Thomas, Advanced Character Physics

[4] C., Neil, 3 Band Equaliser

[5] Wikipedia, Taylor Series

Article Start Previous Page 5 of 5

Related Jobs

Insomniac Games
Insomniac Games — Burbank, California, United States

Technical Artist - Pipeline
Insomniac Games
Insomniac Games — Burbank, California, United States

Engine Programmer
Legends of Learning
Legends of Learning — Baltimore, Maryland, United States

Senior Gameplay Engineer - $160k - Remote OK
Bytro Labs GmbH
Bytro Labs GmbH — Hamburg, Germany

Senior Product Owner / Live-Ops Owner (f/m/x)

Loading Comments

loader image