|
“With enough inside information and a
million dollars you can go broke in a year” – Warren Buffet on
stock tips
Myths, tips, inside information –
however you state it, dangerous blanket statements and
generalizations increase the learning curve of those practicing Video
Game Optimization (VGO). Optimization myths thrive on complex
hardware, legacy implementations, faith, and e-mail lists. The
following is a description of some of the most popular myths that
I’ve heard over the past 7 years of optimizing games.
Myth 1: Pre-mature Optimization Is The Root Of All Evil
Before accusing me of disagreeing with
Stanford’s legend Donald Knuth [pictured] and a knighted programmer (Sir
C.A.R Hoare - quick sort inventor), realize that I am planning to
agree with what has become the mantra for the reactive optimizer.
Let’s examine the quote in its entirety.
“Forget about small efficiencies, say
about 97% of the time: premature optimization is the root of all
evil.” -Donald Knuth.
A statement from a professional that
exposes “the root of all evil” is very momentous, but when you
examine the entire statement, it’s easier to see the forest –not
the tree. I’m not ordained to interpret programming scripture, but
from this famous statement I infer the following: prematurely
optimizing small efficiencies is usually the root of all evil;
prematurely optimizing large efficiencies is a necessity.
There are three levels of VGO, the
system level, the application level, and the micro level. The system
level of optimization is where we as programmers examine our
architecture and compare it to our system specs. System level
questions include: “How many cores do we support?”, or “Do we
require a minimum level of shader support?”. The application level
of optimization is usually implemented at the class level. Examples
include quad trees, occlusion culling, and instancing. Micro level
optimizations, the most tangible and arguable level, are easily
recognized since they exist within the domain of several, or single,
lines of code.
There are more flavors of PC
configurations then there are of Linux operating systems. System
level and application level optimizations are more likely to “rise
the tide” of frame rates across combinations of AMD, Intel, Nvidia,
CPUs and GPUs. Micro optimizations tend to vary across different
configurations more than the system or application levels.
Optimization is a part of design!
System and application levels of optimizations are best implemented
during design. If we miss these opportunities because we feel we are
acting prematurely, then only an abundance of flexibility, the level
at which few engines provide, will afford us the opportunity to
integrate the optimization before shipping.
Pre-mature optimization of system and
application optimizations is not the root of all evil.
Pre-mature optimization of the micro level is.
Myth 2: Less Code Is Faster Than More
There are two polar opinions that
dominate game programming personalities. On the left, a personality
I call the LEAN_AND_MEAN programmer. On the right, is the class
heavy abstractionist.
Which one is correct in regards to
performance? I’m afraid there is no clear winner, and in-fact,
they may both be correct. The amount of code you write before you
compile does not always answer the more important questions
about the runtime performance.
The argument against the abstractionist
is that the overhead of their design is burdensome; however, a well
designed class hierarchy does not need to travel through many lines
of code during execution. A poorly designed class hierarchy will be
the victim of its verbose design.
The argument against the LEAN_AND_MEAN
programmer is the lack of flexibility needed reduce superfluous lines
of code and rapid refactoring.
The bottom line - sometimes writing more
code can reduce superfluous CPU and memory system work and maximize
parallelism. In this, both are taking the correct approach as long
as the class heavy abstractionist uses a good design and the
LEAN_AND_MEAN programmer manages superfluous execution and
flexibility.
Unintuitive hardware can also propagate
this myth. A good example of how more lines of code can be faster
than less is evident when using write-combined buffers.
Dynamic vertex buffers, when locked,
sometimes use write-combined buffers, a memory type that does not
travel through the cache system. This is done to reduce the
management of memory coherency between the GPU and the CPU. When we
use a write-combined buffer, it is important to update all 64 bytes
of a write-combined buffer line. If the entire 64 byte line is not
updated, the write-combined buffer writes to system memory in 8 byte
increments. When all 64 bytes are updated, the entire line writes in
a single burst.
What does this mean to a game
programmer? When considering the memory performance of a
write-combined buffer, we should update every byte of a vertex, even
if position is the only value that changed. In this example, writing
more code, which appears slower on the C++ level, unlocks a latent
hardware optimization.
The “lines of code” myth survives
on the belief that to some, more code, and a larger design, means
less performance. I’d bet a programmer with this belief wouldn’t
give up their quad tree as a strategy to reduce the number of lines
of code.
|
thanks