month, the Optimizations Corner gets "down and dirty" and starts examining
many of the pitfalls that you might stumble into if you're not careful
about microarchitecture. After all, a good way to slow down a fast
computer is to write software that ignores the underlying microarchitecture.
So, let's get those "glass jaws" in your code out of the way and let
the raw speed of the machine shine through.
Koby Gottlieb is the Group Leader of the Media/VTune Group in Intel's Israel Design Center. Aside from being my supervisor (which explains the glowing remark I'm about to make), he's an expert in low-level code tuning for specific microarchitectures. In this article, he explains the concept of partial stalls and how it affects performance on modern dynamic execution microarchitectures, relevant to Pentium Pro, Pentium II, and Pentium III processors.
Maximize Your Application's Potential
Understanding the microarchitecture can help you develop high-performance applications for Intel architecture processors. Even though the application you developed for the Intel386 and Intel486 processors executes on the Pentium Pro, Pentium II, and Pentium III processors without requiring any code modification, optimization techniques combined with knowledge of the newest processor can help you tune your application to its greatest potential. In theory, a good compiler knows all about code generation strategy, but in practice, compilers are not perfect. In some cases the way we write our code forces the compiler to generate slow code, and in other cases the compiler is not as good as it could be. When writing assembly code we have no one to blame and therefore we should be aware of microarchitectural limitations. In this article I would like to highlight two aspects of performance problems: partial and MOB stalls.Detecting and fixing these problems in your code may help your application perform faster.