Myth 6: Optimization And Assembly
For many programmers, optimization and
assembly are synonymous. This is much less true today than it was in
the past. If this inflames you, don’t think that I believe there
is NO place for assembly in optimization; it’s just a
smaller piece.
My justifications?
First, the use of APIs is much more
prevalent today than in the past. In many programs, the application
code written by the developer consumes a small percentage of the
total runtime. It is very common for the drivers, graphics bus, or
the graphics pipeline to be the 20% of the Pareto principle.
When the code you use, not the code you
write, is the hotspot or bottleneck, coding with optimized assembly (
excluding shader assembly ) will not be your most efficient use of
time. We have sacrificed a lot of control for code reuse through
APIs.
Second, optimizing for assembly is a
classic example of a micro optimization. A micro optimization is
more likely to have different results across PC configurations than
system or application optimizations. Everyone is familiar with the
office phrase, “It doesn’t crash on my machine”. Micro
optimizations sometimes stimulate the phrase, “It runs fast on my
machine”.
Finally, when you write assembly, you
get exactly what you write. To some, this is great. To those who
are not experts in assembly, it’s an opportunity to shoot yourself
in the foot. The term “optimizing compiler” is antiquated. Now,
even standard compilers contain optimizations. Writing assembly
bypasses the optimizations ingrained in compilers. Even text book
examples of non-data dependant loop unrolling yield slower
performance than that of pristine loop. The risk vs. reward of
pigeon-holing your compiler is not as justified as it was in the
past.
In closing this myth, I will again
repeat that there is a place, under the correct circumstances, that
optimizing with assembly is still the correct choice. For those who
prefer a higher level language, it’s good to know that compilers
are doing more of our work.
Myth 7: A Ratio Of 1 To 1, Thread To Logical Core, Is Optimal
Ok, so maybe I’m getting a bit
nit-picky. This statement is common, but more detail must be applied
to stop confusion. If using Microsoft Windows XP, you will find that
on start-up, your machine will be running anywhere between 400 to 800
threads. Does this mean we are never going to achieve the ratio
until we have an 800 core machine. Of course not.
A more accurate phrase is, “a ratio
of 1 to 1 intensive threads is optimal”. Two threads, running at
50% can share one core efficiently. The difficulty in this example
is to ensure that the executing 50% does not occur at the same time
for both threads.
Myth 8: Multi-threading With Efficient Synchronization Will Always Increase Performance
This myth is related to myth 5 and is
likely to go away as multi-core systems memory architectures evolve.
The root of this myth is that multi-threading does not increase
memory performance, it complicates it.
If a given algorithm is bound by memory
performance, then dividing the task across threads will not increase
the performance. And by opening the door to false sharing and
increased cache eviction, the potential for a decrease in performance
exists.
|