|
This
month, the Optimizations Corner gets "down and dirty" and starts examining
many of the pitfalls that you might stumble into if you're not careful
about microarchitecture. After all, a good way to slow down a fast
computer is to write software that ignores the underlying microarchitecture.
So, let's get those "glass jaws" in your code out of the way and let
the raw speed of the machine shine through.
Koby Gottlieb is the Group Leader of the Media/VTune Group in Intel's
Israel Design Center. Aside from being my supervisor (which explains
the glowing remark I'm about to make), he's an expert in low-level
code tuning for specific microarchitectures. In this article, he explains
the concept of partial stalls and how it affects performance on modern
dynamic execution microarchitectures, relevant to Pentium Pro, Pentium
II, and Pentium III processors.
Maximize
Your Application's Potential
Understanding
the microarchitecture can help you develop high-performance applications
for Intel architecture processors. Even though
the application you developed for the Intel386 and Intel486 processors
executes on the Pentium Pro, Pentium II, and Pentium III processors
without requiring any code modification, optimization techniques combined
with knowledge of the newest processor can help you tune your application
to its greatest potential. In theory, a good compiler knows all about
code generation strategy, but in practice, compilers are not
perfect. In some cases the way we write
our code forces the compiler to generate slow code, and in other cases
the compiler is not as good as it could be. When writing assembly code
we have no one to blame and therefore we should be aware of microarchitectural
limitations. In this article I would like to highlight
two aspects of performance problems: partial and MOB stalls.Detecting
and fixing these problems in your code may help your application perform
faster.
|