“With enough inside information and a million dollars you can go broke in a year” – Warren Buffet on stock tips
Myths, tips, inside information – however you state it, dangerous blanket statements and generalizations increase the learning curve of those practicing Video Game Optimization (VGO). Optimization myths thrive on complex hardware, legacy implementations, faith, and e-mail lists. The following is a description of some of the most popular myths that I’ve heard over the past 7 years of optimizing games.
Before accusing me of disagreeing with Stanford’s legend Donald Knuth [pictured] and a knighted programmer (Sir C.A.R Hoare - quick sort inventor), realize that I am planning to agree with what has become the mantra for the reactive optimizer. Let’s examine the quote in its entirety.
“Forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” -Donald Knuth.
A statement from a professional that exposes “the root of all evil” is very momentous, but when you examine the entire statement, it’s easier to see the forest –not the tree. I’m not ordained to interpret programming scripture, but from this famous statement I infer the following: prematurely optimizing small efficiencies is usually the root of all evil; prematurely optimizing large efficiencies is a necessity.
There are three levels of VGO, the system level, the application level, and the micro level. The system level of optimization is where we as programmers examine our architecture and compare it to our system specs. System level questions include: “How many cores do we support?”, or “Do we require a minimum level of shader support?”. The application level of optimization is usually implemented at the class level. Examples include quad trees, occlusion culling, and instancing. Micro level optimizations, the most tangible and arguable level, are easily recognized since they exist within the domain of several, or single, lines of code.
There are more flavors of PC configurations then there are of Linux operating systems. System level and application level optimizations are more likely to “rise the tide” of frame rates across combinations of AMD, Intel, Nvidia, CPUs and GPUs. Micro optimizations tend to vary across different configurations more than the system or application levels.
Optimization is a part of design! System and application levels of optimizations are best implemented during design. If we miss these opportunities because we feel we are acting prematurely, then only an abundance of flexibility, the level at which few engines provide, will afford us the opportunity to integrate the optimization before shipping.
Pre-mature optimization of system and application optimizations is not the root of all evil. Pre-mature optimization of the micro level is.
There are two polar opinions that dominate game programming personalities. On the left, a personality I call the LEAN_AND_MEAN programmer. On the right, is the class heavy abstractionist.
Which one is correct in regards to performance? I’m afraid there is no clear winner, and in-fact, they may both be correct. The amount of code you write before you compile does not always answer the more important questions about the runtime performance.
The argument against the abstractionist is that the overhead of their design is burdensome; however, a well designed class hierarchy does not need to travel through many lines of code during execution. A poorly designed class hierarchy will be the victim of its verbose design.
The argument against the LEAN_AND_MEAN programmer is the lack of flexibility needed reduce superfluous lines of code and rapid refactoring.
The bottom line - sometimes writing more code can reduce superfluous CPU and memory system work and maximize parallelism. In this, both are taking the correct approach as long as the class heavy abstractionist uses a good design and the LEAN_AND_MEAN programmer manages superfluous execution and flexibility.
Unintuitive hardware can also propagate this myth. A good example of how more lines of code can be faster than less is evident when using write-combined buffers.
Dynamic vertex buffers, when locked, sometimes use write-combined buffers, a memory type that does not travel through the cache system. This is done to reduce the management of memory coherency between the GPU and the CPU. When we use a write-combined buffer, it is important to update all 64 bytes of a write-combined buffer line. If the entire 64 byte line is not updated, the write-combined buffer writes to system memory in 8 byte increments. When all 64 bytes are updated, the entire line writes in a single burst.
What does this mean to a game programmer? When considering the memory performance of a write-combined buffer, we should update every byte of a vertex, even if position is the only value that changed. In this example, writing more code, which appears slower on the C++ level, unlocks a latent hardware optimization.
The “lines of code” myth survives on the belief that to some, more code, and a larger design, means less performance. I’d bet a programmer with this belief wouldn’t give up their quad tree as a strategy to reduce the number of lines of code.