Myth 3: Game Developers Don’t “Do” Multi-core Well
Let’s face it. There are many areas
that we as programmers lag behind the mainstream. One area is
multi-core CPU programming [multi-core CPU pictured below]. We are; however, ahead of the pack as
multi-core programmers.
Any PC with at least one CPU core and
one GPU core is a multi-core machine. And the rules for optimizing
multi-core machines - which also include next-gen consoles such as the PlayStation 3 and Xbox 360 - are very different from that of optimizing single
cores. Not realizing that a machine is multi-core makes the process
of optimization difficult and inefficient. This leads us to myth
number 4…
Myth 4: Every Optimization Yields Some Performance Gain
Because every hardware accelerated game
is using a minimum of two cores (see Myth 3), there is a possibility
that a successful optimization could yield no frame rate increase.
Consider the following example:
A dealer splits a deck of cards in half
and hands them to Jack and Jill. The dealer then asks the
participants to sort the deck by red and black. Assume for our
purposes that Jill is much faster than Jack, and finishes her half of
the deck in 45 seconds. Jack, who is slower, finishes in 60 seconds.
The entire process, since Jack and Jill operate in parallel, is
equal to the slowest participant- in this example, Jack. Therefore,
the entire process takes 60 seconds.
Now - assume we optimize Jill’s
performance so that she is now able to sort the deck 15 seconds
faster. If we run the experiment again, we can clearly see that our
bottle neck, Jack, is still causing our experiment to take 60
seconds. We have optimized Jill by 15 seconds but noticed no
increase in the overall performance.
Any time we fail to optimize the
slowest core or parallel GPU kernel we have the potential for a zero
percent frame rate increase. This sort of optimization, especially
if it requires two or more weeks of work, does not impress
management.
It is possible to increase performance
by optimizing the incorrect core. This occurs when we indirectly
optimize the slowest core. For example, if our game is limited by
fragment processing on the GPU, then optimizing AI will do little,
and probably nothing, to increase our overall frame rate performance.
If we were to optimize CPU work, such as a faster and better culling
system, we would indirectly be optimizing pixel processing. In this
case, we targeted a CPU optimization that led to a GPU optimization
in our limiting kernel (fragment processing)
Myth 5: Reducing Instruction Processing Is Our Primary Goal In CPU Optimization

When comparing the growth rate of
instructions retired in the past five years, the GPU is the winner.
The CPU, by means of increased instruction level parallelism and
multi-core is in second place. The slowest growth (of resources
commonly utilized in game runtime) is the memory system.
The reason is simple, when used
correctly, memory is very fast. The problem is that games, which are
getting close to the 32 bit OS limit of 4 gigs, frequently abuse our
fragile memory architecture.
Many traditional optimizations, made
famous before the requirement of a tiered cache system, can be
harmful to modern architectures. For example, a look-up table trades
memory for instruction processing. If this increase in memory causes
a cache miss that requires a fetch from system memory, you have done
little to increase your performance. A cache miss that causes a
fetch to system memory is many times slower than the slowest
instruction. In attempting to save instructions, you have created
latency and a data dependency.
When optimizing the CPU, we have a
tendency to seek out the slowest instruction loops in our engine.
The usual suspects are AI, culling, and physics. If you are not
optimizing your engine for cache efficiency you are doing yourself a
disservice. If you are reducing instructions and increasing cache
misses, you are committing a sin.
|