Contents
Debugging Memory Corruption in Game Development
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
November 22, 2009
 
Video Game Watchdog National Institute On Media And The Family Shutting Down [11]
 
Modern Warfare 2 Infinity Ward's 'Most Successful PC Version' Yet [12]
 
New Tech, Design Details Of Project Natal To Emerge At Gamefest In February
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
November 22, 2009
 
Trion Redwood City
Sr. Evnironment Modeler
 
Trion Redwood City
Sr. Environment Artist
 
Sucker Punch Productions
3D Environment Artist
 
Sucker Punch Productions
Network Programmer
 
Sucker Punch Productions
Character Artist
 
Sucker Punch Productions
Texture Artist
 
Monolith Productions
Sr. Software Engineer, Engine - Monolith Productions - #113767
 
Sony Online Entertainment
Brand Manager
spacer
Latest Features
spacer View All spacer
 
November 22, 2009
 
arrow Upping The Craft: Susan O'Connor On Games Writing [6]
 
arrow Small Developers: Minimizing Risks in Large Productions - Part II [7]
 
arrow iPhone Piracy: The Inside Story [48]
 
arrow And Yet It Grows: Analyzing the Size and Growth of the European Game Market [5]
 
arrow NPD: Behind the Numbers, October 2009 [13]
 
arrow Reflecting On Uncharted 2: How They Did It [5]
 
arrow Sponsored Feature: Rasterization on Larrabee -- Adaptive Rasterization Helps Boost Efficiency
 
arrow Postmortem: Wadjet Eye's The Blackwell Convergence [2]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
November 22, 2009
 
Time Fcuk
 
Accepting the Inherent Value of Games
 
Planckogenesis, Part II: Song Structure & Gravy Train [1]
spacer
About
spacer News Director:
Leigh Alexander
Features Director:
Christian Nutt
Editor At Large:
Chris Remo
Advertising:
John 'Malik' Watson
Recruitment/Education:
Gina Gross
 
Features
  Debugging Memory Corruption in Game Development
by Mick West
9 comments
Share RSS
 
 
October 16, 2008 Article Start Previous Page 4 of 6 Next
 

Small Integers

Small integers (in the range 0 to 10000) are usually counters or enums. If you see the value incrementing or decrementing evenly, then that indicates a counter.

If you see it oscillate between a few fixed values, then it is probably some kind of state variable.

Advertisement

Does this small integer seem to match anything in the game at the time of corruption? Some possibilities:

  • Score
  • Health
  • Lives
  • Level number
  • Weapon number
  • Button Pressed

Try to find some correlation between what is going on the game, and the value of corruption.

Large Integers

As numbers get larger, the number of uses for them decrease. It's unlikely that you will be managing groups of over 100,000 items. If you have a large integers that look like they are counting, then you should consider what it might be counting.

Consider then if it might actually be a pointer, or a code address, and not an integer value at all.

Negative Integers

Example:

FFFFF3A2 of A2 F3 FF FF

Negative integers start with ‘F's rather than ‘0's.

Integers are generally used for counting things. If you have a negative integer, then that greatly narrows down the range of things it might be used for.

Some code uses the negative form of an integer as a single kind of flag to change the behavior of the code, avoiding the need to have an additional flag.

Negative numbers are also sometimes used as error codes. Some functions take a pointer as a parameter, and then return the error code in the location pointed to by the pointer. If the pointer is incorrect, that will lead to memory corruption with a negative number.

Magic Hex Numbers

Example:

DEADBEEF or EF BE AD DE

A magic hex number in the context of debugging is a hex value that has been specifically chosen by the programmer to be visible in the debugger.

The numbers are also chosen so that using it inadvertently will maximize the chances of that use causing an error, and hence alerting the programmer to the illegal usage.

The most common use is in initializing a block of memory to certain values both when it is allocated and when it is freed. This both makes the block visible in the debugger (in the memory window), and also fills it with values that the programmer should notice if they are used either before the memory has been initialized correctly, or if the memory continues to be used after it has been freed.

Common Magic Hex Numbers are:

CCCCCCCC
CDCDCDCD
DEADBEEF
DEADDEAD
DDDDDDDD
FDFDFDFD

Use of magic numbers varies by platform. Often developers use their own magic numbers, and they tend to prefer those that can be read aloud, such as DEADBEEF.

Magic ASCII

Example:

474E5089 or 89 50 4E 47 or ‰PNG

Frequently asset files are identified by a four byte (partially) ASCII string that indicates the file type in some human readable way. It's quite unlikely that this will find its way into a single word corruption, but it's worth looking in the ASCII column in the memory window, just to check if this is the case, since if you recognize this, it should hopefully point you directly at the culprit.

Pointers

Example

00434150 or 50 41 43 00

Your program usually occupies a relatively small amount of the available four gigabyte address range of a 32-bit pointer. Hence, pointers usually fall within a recognizable range.

Under Win32, your executable starts at address 00400000 (4MB from the start of it's virtual address space) so function pointers, and pointers to static data will often start with 004 (and 005, 006 etc as your program increases in size).

On the PS2, your executable start at 00100000 (1MB), so pointers will start with 001, 002, etc.

Function pointers are an unlikely candidate for corruption data, so if you see a pointer like this, it's more likely a pointer to some static data.

The most common type of pointer to static data that is passed around is a pointer to a string. If it looks like you have a pointer in your corruption data, then try following it and see if it points to a recognizable string.

Depending on your platform, pointers may be more likely to be word aligned. On the PS2, pointers to code or any word sized data must be word aligned. The PC allows all data referencing at the byte level.

Random Numbers

Example

9D29F113 or 13 F1 29 9D

When you look through the memory occupied by your game, you will find surprising little data that looks random. There are usually lots of zeros, and where the data is more closely packed, certain bytes or patterns predominate.

So when you find a number that looks random, it almost certainly has some meaning. Here are some of the things it could be.

A floating point number - as mentioned previously, a floating point number with several significant digits will look kind of random. The constant pi (3.141592654) comes out as 40490FDB - which looks random.

A checksum - if your code uses a checksum, such as CRC32, for some reason, such as identifying assets, then this could be a stray one. If you have the capability, then try seeing what string generates this checksum.

Compressed data - well compressed data should look random. It's unlikely that it would end up in a single word of corruption, but possible.

Text - It looks random at first sight, but if the bytes are mostly in the range 0×30 to 0×7F, then it is quite possible that it is a fragment of a string. See what it says in the ASCII column of the memory window.

 
Article Start Previous Page 4 of 6 Next
 
Comments

Cristian Cornea
profile image
Great article. I really like the fact that it goes in depth with memory debugging and how to identify patterns in the memory dump.

Great job!

Tom Newman
profile image
Great article! I am not even a programmer and I was able to make sense out of this. Very well written.

Roberto Alfonso
profile image
As a professional programmer (healthcare industry) I love technical articles, especially if related with video games. I still remember the first time I happened to cross the infamous Relm Sketch bug in Final Fantasy VI (http://en.wikipedia.org/wiki/Characters_of_Final_Fantasy_VI#Relm , third paragraph). Using it correctly, you could get hundreds of items to sell, including Ilumina, the sword that was supposed to be unique and only attainable by converting the Ragnarok esper into sword when asked.

I suggest everyone having these problems to use a memory checking library like Fortify, ElectricFence, or even Valgrind. Personally, I initialize certain memory zones with 0xBEBACAFE ("drink coffee" in Spanish) instead of the famous 0xDEADBEEF.

Gerard Green
profile image
We like to initialize memory with 0x7FBADFAD because it is sNaN if interpreted as a floating-point value. Trying to use this value causes an exception, which makes it much easier to track. (Even if on your system it doesn't cause an exception, NaN propagates in normal operations, so it's still easier to track down than 0xDEADBEEF.)

Ondrej Spanel
profile image
While debugging techniques to detect memory corruption are useful, I would like to stress what is most important is to use coding techniques which reduce chance of introducing such corruption in first place. They can be summarized as: avoid low-level construct everywhere high-level constructs do the job fast enough. This includes:

- avoid explicit new / delete / malloc / free
- avoid "raw" pointers

Instead:

- use smart pointers
- use containers

In any case where you are using raw pointers or explicit allocation, make sure you have clearly defined (and documented) the responsibilities (who "owns" the memory, who is responsible to free it, when there is a raw pointer to a memory which is does not own, describe why can you be sure the pointer will be valid during its existence).


Roberto Alfonso
profile image
I hate smart pointers, mostly because I am old schooled (I used to program in assembler for most optimization projects) and therefore think they create lazy programmers (just like garbage collectors). As long as you follow a few guidelines, you should have no problem working with them:
- Always initialize pointers with NULL
- Set pointers to NULL immediately after deleting or freeing them
- If a function returns allocated memory, there must be a reciprocal function that receives a pointer and frees it (create_node should have a destroy_node, for example)
- If a function does not allocate memory, it mustn't release it

Different strokes I guess :-)

Gopalakrishna Palem
profile image

A similar approach that can be used to track object state changes at code level can be found at:

http://blogs.msdn.com/gpalem/archive/2008/06/19/tracking-c-variable-state-change
s.aspx

Its an off-line technique for tracking the C++ objects. Similar to "break on Access" of memory, C++ templates can be used to implement "report on change" pattern for any member variable value changes. May not solve all problems, but can be useful often.

Wylie Garvin
profile image
A good article, for sure. There are a couple of things I want to add.

(1) Code Corruption, Jump Table Corruption, and Stack Overwriting Code are prevented by the memory protection hardware on most modern platforms. For PC and current consoles, it is typical for code space to be read-only, so the program will crash immediately if it tries to modify its own code. Most compilers also put v-tables and other jump tables (e.g. from switch statements) into a read-only code segment. You can also expect non-code to be non-executable on modern platforms, so wild jumps or overwritten return addresses have a good chance of being stopped dead by the memory protection hardware if they point into the data segment or stack. As the article mentions, stack-overflow on PC will usually hit a guard page, stopping the program before it destroys your debugging context. However, one processor which game developers have to deal with unfortunately has no memory protection at all: the SPUs of the PS3 Cell processor. SPU code can happily overwrite itself, jump to a data address and execute garbage, or overflow its stack.

(2) Faulty RAM on consoles - on my last project we discovered no less than separate devkits with faulty RAM problems of some sort. They were each exhibiting signs of memory corruption, and after the first one was discovered (it happened to be mine) we wrote a little test program to stress-test the console's RAM with a variety of test patterns. All our other devkits pass the test program, but these four failed its tests. In the first such kit (mine), the symptom was a 1-bit error in a particular offset into a 64 KB page of virtual memory -- different virtual address each time we ran it, but the bottom 16 bits were always the same -- so probably it was the same physical address underneath. I wasted a full day debugging the corruption before we caught on that the bug only happened on my kit, not my neighbor's. The code worked flawlessly on other kits. So we wrote that test program and proved that the 1-bit error had nothing to do with our program, but was just a hardware problem of some kind. After that, the first thing we did when memory corruption was suspected, was to run this test program on the affected kit. Over the course of several months, three other devkits failed its tests, so we knew immediately that it was a hardware problem and didn't waste our time trying to debug a non-existent bug. I know bad RAM sounds implausible, but I suppose we use our devkits heavily for years and years, and we don't always turn them off when we're not using them... anyways, for console developers, be aware of
the possibility of bad RAM when investigating strange corruption bugs.

(3) Put Magic Numbers in structures to aid debugging: For non-final builds, you can add a 32-bit int field to the start of each class/structure you use, and in the constructor or initialization function, set this to a recognizable ASCII code (which is different for each type of structure). E.g. you might set it initially to 'ANIa' or 'ANId' for animation structures, 'OBJx' or 'CHAR' for objects or characters, etc. This can make it easier to recognize structures in raw memory when you have to debug something like a memory corruption. Also, when the structure is freed you can change the magic value to something else, such as the same value with the case of each character toggled ('aniA', 'objX', etc.). Whenever it accesses these objects, your code can also assert that the magic number has the original value, helping you to catch any dangling pointer errors. This technique has some runtime overhead, but depending on your circumstances it might be worth it.

Wylie Garvin
profile image
Oops.. no edit button?

"we discovered no less than four* separate devkits with faulty RAM problems of some sort."


none
 
Comment:
 


Submit Comment