It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

By Jelle van der Beek
[Author's Bio]

Gamasutra
April 14, 2004

Introduction

Making a memory dump

PS2

Printer Friendly Version
   

 

Change Login/Pwd
Post A Job
Post A Project
Post Resume
Post An Event
Post A Contractor
Post A Product
Write An Article
Get In Art Gallery
Submit News

 


 


Latest Letters to the Editor:
Perpetual Layoffs by Alexander Brandon [09.21.2007]

Casual friendliness in MMO's by Colby Poulson [09.20.2007]

Scrum deals and 'What is Scrum?' by Tom Plunket [08.29.2007]


[Submit Letter]

[View All...]
  



Upcoming Events:
Video Game Expo (VGXPO)
Philadelphia, United States
11.21.08

DIG London Game Conference
London, Canada
11.27.08

5th Australasian Conference on Interactive Entertainment
Brisbane, Australia
12.03.08

IEEE Symposium on Computational Intelligence and Games
Perth, Australia
12.15.08

2K Bot Prize
Perth, Australia
12.15.08

[Submit Event]
[View All...]

 


[Enter Forums...]

Note: Discussion forums for Gamasutra are hosted by the IGDA, which is free to join.
 

 

 


Features

Monitoring Your Console's Memory Usage, Part One

Making a memory dump

Xbox

Those who have experienced the joy of Xbox programming will have already found out that Microsoft thought of almost everything concerning game programming. Luckily for us, they also have a tool that dumps all allocated memory blocks, along with a callstack. It is called XbMemDump. If XbMemDump does not suit your needs, they also have a series of debugging functions to store callstack info and to run the heap manually.

Lastly, the Xbox has a unified memory structure, which makes it possible to monitor all memory, including that used for sound and video.

Automatic dumping memory using XbMemdump

At first look, XbMemdump seems like everything we need. It has many benefits: it has support for memory tracking at the kernel level, so it does not miss any allocations. It is able to display callstack information on up to 32 levels per allocation, so you won't have to bother tracking these allocations yourself.

However, when I started building MemAnalyze half a year ago, XbMemdump ran horribly slow if allocation tracking was enabled. It crashed during a level load on a regular interval, and when it did not crash, it took about 1.5 hours to complete. When I could finally dump the memory, it displayed just the return addresses and I couldn't get the symbol information to work.

Now, half a year later, I tested XbMemDump again, and there is no performance problem and the symbols are loaded just fine. Although when I asked them, Microsoft reported no changes to XbMemdump since December 2002, you should check to see how it performs with your code. It might be running smoothly now because of different allocation strategies we implemented in our game since I first began this work. In case XbMemdump doesn't perform well with your code, or if you are interested in how I worked around the problems, the following will explain how to manually dump the Xbox's memory.

Manually dumping the memory

Intercepting all allocations

We first need to intercept all allocations. This can be a pretty tough job. The Xbox has two different types of allocations: PhysicalAllocs, typically used to allocate contiguous memory: (video buffers, sound data), and HeapAllocs.

Xbox provides a global allocation function, XMemAlloc, which can be overloaded. XMemAlloc supports (almost) all types of allocations. Every third-party product should use XMemAlloc for their allocations, so the game developer can intercept them. If the tool developer really needs other behavior that XMemAlloc doesn't support, like 32-byte alignments or higher, a wrapper for the allocation function should be created, with the possibility for providing a callback function. This way, the application can respond to all allocations.

Sadly, not all third-party products conform to these rules. Even Microsoft has ignored these rules: up until the December SDK 2003, the XACT and XMV modules did not use XMemAlloc. (They do now, however.)

Once we can intercept all allocations, or at least all the allocations needed, we can then store our callstack information.

Real-time callstack tracing

Microsoft offers a series of debugging functions with the prefix "Dm". To use them, you need to link with the debug library XbDm.lib. The function DmCaptureStackBackTrace is used to store callstack information. (If you would like to know more about callstack tracing on Intel-based machines, I suggest reading Chavdar Dimitrov's explanation [REF2]). Listing 1 shows my own callstack trace function that works on any IA-32 based architecture (and above), provided that you disable the omission of frame pointers in the compiler settings.


unsigned int StoreCallStackCPP( unsigned int* pArray, unsigned int nCount ) { struct CStackFrame { CStackFrame* pPrevFrame; unsigned int nReturnAddress; }; CStackFrame* pStackFrame; unsigned int nResult = 0; if(pArray != NULL) { _asm mov [pStackFrame], ebp // Point to the previous frame: the frame of the caller pStackFrame = pStackFrame->pPrevFrame; for(unsigned int i=0; i <nCount; ++i) { pArray[i] = pStackFrame>-nReturnAddress; // If return address is zero, we have reached the // end of the callstack if(pArray[i] == 0) { break; } pStackFrame = pStackFrame->pPrevFrame; } // Store the number of succesful items nResult = i; } return nResult; } unsigned int __declspec(naked) StoreCallStackAsm( unsigned int* pArray, unsigned int nCount ) { __asm { // Note: this function has no prolog/epilog code mov ebx, ebp // use ebp directly =
                                  // framepointer of
                                  // previous function mov ecx, dword ptr [esp +8] // Load nCount mov eax, ecx xor edi, edi // Fill edi with zero for
                                  // NULL pointer comparison mov esi, dword ptr [esp +4] // Load pArray cmp esi, edi // Check for pArray NULL
                                  // pointer jz done store_items: cmp ebx, edi // Check for framepointer
                                  // NULL pointer jz done mov edi, dword ptr [ebx +4] // Offset +4 from                                  // framepointer
                                  // = return address mov dword ptr [esi], edi // Store RA mov ebx, dword ptr [ebx] // Load the previous                                   // framepointer add esi, 4 // Inc the array loop store_items done: sub eax, ecx // Store the number of
                                  // successful items ret } }

Listing 1. Intel-based callstack tracing

Please note that we have not obtained the start addresses of the functions that preceded our function. Instead, we have found the return addresses! This address is located somewhere in between the function's start- and end address of the caller.

The functions StoreCallStackAsm and StoreCallStackCPP return the number of successful items placed in the array. Listing 2 shows how to use StoreCallStack.


const unsigned int STACK_DEPTH = 3; const unsigned int EXTRA_ALLOC_TAG = 0xCAFEBABE; class CExtraAllocHeader { public: unsigned int tag; unsigned int RA[STACK_DEPTH]; CExtraAllocHeader() { tag = EXTRA_ALLOC_TAG; memset(RA, 0, sizeof(RA)); } }; void Foo3() { CExtraAllocHeader header; int nrItemsCPP; int nrItemsAsm; nrItemsCPP =
   StoreCallStackCPP(header.RA, sizeof(header.RA) /sizeof(int)); nrItemsAsm =
   StoreCallStackAsm(header.RA, sizeof(header.RA) /sizeof(int)); } void Foo2() { Foo3(); } void Foo1() { Foo2(); } int _tmain(int argc, _TCHAR* argv[]) { Foo3(); return 0; }

Listing 2. Example of the callstack tracers.

In this example, StoreCallStack will store the instructions in the scope of the functions Foo2, Foo1 and _tmain. Both the caller of StoreCallStack: Foo3, and StoreCallStack itself are not included in the callstack!

Storing the data

We must store the callstack somewhere. For heap allocations, I decided to enlarge the block that was allocated by 16 bytes, and add our information at the back of the allocated block. I also provide a tag of 4 bytes in the 16 bytes. Choose a hexadecimal value such as 0xCAFEBABE for the tag value. The tag value is used later, when walking the heap. The heap walker must check if the allocated block it is processing has our callstack information, since there will always be allocations that we didn't track. In running a test of our first level, I found that we managed to track almost all allocations:

Heap summary: Total count=76162, of which: Tagged: 75756, Untagged: 406!

Heap summary: Total size=28244816 bytes, of which: Tagged: 26859088, Untagged: 1385728!

The Xbox memory manager rounds each heap allocation to a 16 bytes address (a 16-byte alignment), and the size is always a multiple of 16 bytes. If you want to pad your own data to a block, do this math yourself. First round up the size of the allocated item to a multiple of 16 bytes, and then add another 16 for your own data (or any multiple of 16). Using 16 bytes, we can store a callstack three functions deep. Figure 4 shows the layout of an allocation of 24 bytes on Xbox.

Figure 4: An allocation of 24 bytes that is rounded to 32 bytes. In release mode, a 16-byte heap header precedes the memory block. Our extra data is padded to the block, 8 bytes are lost. Note: Xbox uses a different list for large allocations, so heap headers are slightly larger for these allocations.

As you can see, we are losing 8 valuable bytes. There is not much we can do about this: during the heapwalk, there is no way to recover the original size that was requested for the block after the allocation. As a last resort you could add a byte at the back of the block indicating the number of callstack levels present. This way you could have a dynamic number of callstack levels, ranging from 3 to 6 levels deep, filling up unused bytes (the tag needs to shrink to 3 bytes though).

Although I have used the approach as described above, there are a few disadvantages to it:

  • The 16-byte overhead per allocation block pollutes the memory dump.
  • The callstack is quite limited, unless we add even more overhead per block.
  • There is a small chance that a memory block is recognized as a tagged block, even if it is not, since we can't guarantee our tag will be unique. This is not very harmful: the system won't crash; it will simply display a few blocks with incorrect or unknown callstack functions.

On the positive side, these downsides never really proved to be a problem to me. The system is easy to implement, and more importantly: there is no performance penalty involved when a block is allocated or freed!

Still, I would like to present another approach. Since the Xbox has support for multiple heaps, we can create a separate list that contains the extra allocation data and put it on an alternative heap. The advantage of this technique is that our memory snapshot will be the exact representation of the memory in a normal build. It is also much easier to track larger callstack levels, as XbMemDump does, and it makes walking the heap easier: we can just run over this list. The disadvantage is that each free of a memory block will need to search this list in order to delete our extra data. We need to use a hash table or another optimization algorithm in order to keep the performance penalty down.

For physical allocations, you have no choice but to maintain a separate list with the addresses, sizes and return addresses. We have to, because there is no such thing as a "PhysicalAllocWalker" on the Xbox. Typically there will be far fewer PhysicalAllocs then HeapAllocs, so the performance penalty for walking the list on a deallocation is not too big. In our test run of our first level, our number of PhysicalAllocs were:

*** Number of tracked physical allocations:39, total size: 12601656 ***

Dumping all allocations

We can now create a snapshot of the memory. If we decided to put our heap data on a separate heap, we can simply run over the list. If we didn't, we will need to walk the heap, and for each item, check the tag to see if it was tracked by our code, this output the extra allocation data that we stored at the end of the block. For PhysicalAllocs, we simply run over the list of PhysicalAllocs.

We can walk the heap pretty easily by using Microsoft's debug function HeapWalk. It works perfectly, but unfortunately, it is only available in the debug libraries. It is difficult, if not impossible, to make a release build while linking with just the XapiLibD.lib. Whenever I tried this, I always ended up in a complete debug build. The reason HeapWalk is put in a debug library is purely that Microsoft does not want our final game to have low-level heapwalk functionality, which sounds plausible. Perhaps they should place the HeapWalk function in the XbDm library, which can be easily linked into a release build, but is unapproved.

One key disadvantage of a debug build is that the data structures will look quite different. In debug mode the memory manager behaves slightly differently. For instance: the heap header for each allocation block is larger, and it adds 0xFF tags to check for memory overruns. Last but not least, most games run terribly slow in debug mode.

Sadly, there is no simple way to walk the heap in a release configuration unless we write our own heapwalker. I have tried and I have come a long way, but it is not a methodology I want to propagate. The Xbox kernel is way too complicated and it is bad practice not to use Microsoft's existing code. For the PS2 however, my colleague Tom van Dijck wrote a heapwalker. A detailed description of his PS2 heapwalker can be found below.

Finally, we need to output the image's base address. We can retrieve the image base address by calling DmWalkLoadedModules. This function will return all currently loaded modules, including kernel and debugging modules. We need to output all the base addresses along with their names. An in-depth description of the image base address will be given in part two of this series.

As mentioned earlier, I personally decided not to output function names in the memory dump. If you would like to do so, the “Dm” functions provide functionality for parsing symbol information and converting addresses to function names. For more information on the Xbox memory functions, take a look at Forrest Trepte's Xstream training session on Xbox central [REF9].

______________________________________________________

PS2


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2003 CMP Media LLC

privacy policy
| terms of service

 

 

 

join