GAME JOBS
Contents
Leveraging the Power of Cache Memory
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
June 19, 2013
 
Trendy Entertainment
Community Manager
 
Trendy Entertainment
UI Artist
 
Subatomic Studios
Art Director
 
Airtight Games
Senior Software Engineeer
 
Edge of Reality
LEVEL ARTIST
 
Sojo Studios
Generalist Game Engineer
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
June 19, 2013
 
Pitfalls In Automation [1]
 
Some Advice for the Aspiring Sound Designer
 
How to Ace a Games Industry Job Interview [2]
 
Supporting Gamification User Types
 
Making Ear Monsters: Developing a 3D Audio Game [1]
spacer
About
spacer Editor-In-Chief:
Kris Graft
Blog Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Mike Rose, Kris Ligman
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
Education:
Gillian Crowley
 
Contact Gamasutra
 
Report a Problem
 
Submit News
 
Comment Guidelines
 
Blogging Guidelines
Sponsor
Features
  Leveraging the Power of Cache Memory
by Alexandre Macris, Pascal Urro [Programming]
Post A Comment Share on Twitter Share on Facebook RSS
 
 
April 9, 1999 Article Start Page 1 of 3 Next
 

When hardware engineers presented us software developers with cache memory for the first time, essentially they gave us a black box. "Don’t worry about what’s inside of it," they assured us, "just trust us that without changing your programming habits, your programs are going to run much faster!" That was the goal: free software performance enhancement. While it worked fine for a while, over the years the rift between processor speed and memory access time became greater, and it became obvious that by opening the black box of cache memory and exploring the contents within, we’d find solutions to the widening performance gap.

The first steps into demystifying cache memory were modest, and the first tips that resulted from these explorations were vague. For instance, some of the first tips about making the best use of cache memory were non-specific suggestions such as, "avoid random memory access" and "keep your data local". With the appearance of dissociated data and code caches, new suggestions arose, like "keep your data and code strictly separated." The "allocate on write" technique came with its own set of advice. These days, the manual prefetch is in vogue, which takes advantage of dedicated instructions introduced with the new generation of Intel, AMD, PowerPC, and Mips processors. Unfortunately, these basic tips aren’t really satisfactory if you are serious about trying to optimize your game. As a result, the major new code optimization challenge facing game developers today has cache memory optimization.



Simply blindly applying vague suggestions like those to your game, without knowing their basis for operation, will not give you fully optimized code. In fact, it’s easy to find instances where strict code optimization and cache memory optimization conflict. Let’s look at an example.

Assume that you are computing a true-color image. You have the choice of representing your image using either a 24-bit RGB data structure or a 32-bit structure using an idle byte. Your first reflex would probably be to select the second solution for one simple reason: all your memory access will be 4-byte aligned, avoiding the three (at least) processor cycles needed to handle misaligned access. Usually, that’s the right choice, but if you’re using cache memory, it’s not always the way to go.

The technique you should use is implementation dependant. If you’re only going to look at the data once, a 24-bit structure may be more efficient than the byte-aligned 32-bit structure, but the opposite is true if the data is being accessed multiple times. Why? Because on the first pass, the image data is absent from the cache (assuming that we’re working on a processor without a prefetch instruction). Accessing the data from RAM and fetching that data into cache are expensive enough operations as to make the misalignment penalty insignificant. What we’re worried about during the first pass is not the shape of the data per se, but the raw amount of data that has to be moved. So to reduce the amount of data we need to fetch, we should use a 24-bit data structure as opposed to the 32-bit alternative.

At this point let me clarify something about optimizing your game for cache memory. The main handicap of cache memory is its lack of determinism and the difficulty we encounter when trying to tune its performance. Of course, Pentium event counters provide some useful help in this regard, but not enough to strictly optimize your code. Therefore the only real solution is, as usual, hard work. To that end I recommend purchasing a good book on the subject, such as The Cache Memory Book by Jim Handy (Academic Press, 1998).

This hard work will be rewarded, however. To convince you that the results of properly handling cache issues justify the added programming complications, let’s take a look at two detailed, real-life examples. The first example will convince you of one simple thing: accessing uncached data is expensive. The second (and more interesting) example will show you that sometimes changing a very small aspect of your game can solve a huge performance problem, provided you look deeply enough into the cache structure. There isn’t much actual code in these examples – they are primarily meant to illustrate that understanding and optimizing your data structures for cache can result in significant performance improvements.

 
Article Start Page 1 of 3 Next
 
Top Stories

image
Automated testing the BioWare way
image
The creation of Ubisoft Singapore: Building a ship as it sails
image
Puzzle design in Valve's Portal 2 alternate reality game
image
Why people cheat, and what to do about it
Comments


none
 
Comment:
 




UBM Tech