|
Features

Monitoring Your Console's Memory Usage, Part One
Making a memory dump
Xbox
Those who have experienced the joy of Xbox programming will have
already found out that Microsoft thought of almost everything concerning
game programming. Luckily for us, they also have a tool that dumps
all allocated memory blocks, along with a callstack. It is called
XbMemDump. If XbMemDump does not suit your needs, they also have
a series of debugging functions to store callstack info and to run
the heap manually.
Lastly, the Xbox has a unified memory structure,
which makes it possible to monitor all memory, including that used
for sound and video.
Automatic dumping memory using XbMemdump
At first look, XbMemdump seems like everything
we need. It has many benefits: it has support for memory tracking
at the kernel level, so it does not miss any allocations. It is
able to display callstack information on up to 32 levels per allocation,
so you won't have to bother tracking these allocations yourself.
However, when I started building MemAnalyze
half a year ago, XbMemdump ran horribly slow if allocation tracking
was enabled. It crashed during a level load on a regular interval,
and when it did not crash, it took about 1.5 hours to complete.
When I could finally dump the memory, it displayed just the return
addresses and I couldn't get the symbol information to work.
Now, half a year later, I tested XbMemDump again,
and there is no performance problem and the symbols are loaded just
fine. Although when I asked them, Microsoft reported no changes
to XbMemdump since December 2002, you should check to see
how it performs with your code. It might be running smoothly now
because of different allocation strategies we implemented in our
game since I first began this work. In case XbMemdump doesn't perform
well with your code, or if you are interested in how I worked around
the problems, the following will explain how to manually dump the
Xbox's memory.
Manually dumping the memory
Intercepting all allocations
We first need to intercept all allocations.
This can be a pretty tough job. The Xbox has two different types
of allocations: PhysicalAllocs,
typically used to allocate contiguous memory: (video buffers, sound
data), and HeapAllocs.
Xbox provides a global allocation function,
XMemAlloc, which
can be overloaded. XMemAlloc
supports (almost) all types of allocations. Every third-party product
should use XMemAlloc
for their allocations, so the game developer can intercept them.
If the tool developer really needs other behavior that XMemAlloc
doesn't support, like 32-byte alignments or higher, a wrapper for
the allocation function should be created, with the possibility
for providing a callback function. This way, the application can
respond to all allocations.
Sadly, not all third-party products conform
to these rules. Even Microsoft has ignored these rules: up until
the December SDK 2003, the XACT and XMV modules did not use XMemAlloc.
(They do now, however.)
Once we can intercept all allocations, or at
least all the allocations needed, we can then store our callstack
information.
Real-time callstack tracing
Microsoft offers a series of debugging functions
with the prefix "Dm". To use them, you need to link with
the debug library XbDm.lib.
The function DmCaptureStackBackTrace
is used to store callstack information. (If you would like to
know more about callstack tracing on Intel-based machines, I suggest
reading Chavdar Dimitrov's explanation [REF2]). Listing 1 shows
my own callstack trace function that works on any IA-32 based architecture
(and above), provided that you disable the omission of frame pointers
in the compiler settings.
|
|
unsigned int StoreCallStackCPP(
unsigned int* pArray,
unsigned int nCount
)
{
struct CStackFrame
{
CStackFrame* pPrevFrame;
unsigned int nReturnAddress;
};
CStackFrame* pStackFrame;
unsigned int nResult = 0;
if(pArray != NULL)
{
_asm mov [pStackFrame], ebp
// Point to the previous frame: the frame of the caller
pStackFrame = pStackFrame->pPrevFrame;
for(unsigned int i=0; i <nCount; ++i)
{
pArray[i] = pStackFrame>-nReturnAddress;
// If return address is zero, we have reached the
// end of the callstack
if(pArray[i] == 0)
{
break;
}
pStackFrame = pStackFrame->pPrevFrame;
}
// Store the number of succesful items
nResult = i;
}
return nResult;
}
unsigned int __declspec(naked) StoreCallStackAsm(
unsigned int* pArray,
unsigned int nCount
)
{
__asm
{
// Note: this function has no prolog/epilog code
mov ebx, ebp // use ebp directly = // framepointer of // previous function
mov ecx, dword ptr [esp +8] // Load nCount
mov eax, ecx
xor edi, edi // Fill edi with zero for // NULL pointer comparison
mov esi, dword ptr [esp +4] // Load pArray
cmp esi, edi // Check for pArray NULL // pointer
jz done
store_items:
cmp ebx, edi // Check for framepointer // NULL pointer
jz done
mov edi, dword ptr [ebx +4] // Offset +4 from
// framepointer // = return address
mov dword ptr [esi], edi // Store RA
mov ebx, dword ptr [ebx] // Load the previous
// framepointer
add esi, 4 // Inc the array
loop store_items
done:
sub eax, ecx // Store the number of // successful items
ret
}
}
|
 |
 |
 |
Listing
1. Intel-based callstack tracing
|
Please note that we have not obtained the start addresses of the
functions that preceded our function. Instead, we have found the
return addresses! This address is located somewhere in between
the function's start- and end address of the caller.
The functions StoreCallStackAsm
and StoreCallStackCPP
return the number of successful items placed in the array. Listing
2 shows how to use StoreCallStack.
In this example, StoreCallStack
will store the instructions in the scope of the functions Foo2,
Foo1 and _tmain.
Both the caller of StoreCallStack:
Foo3, and StoreCallStack
itself are not included in the callstack!
Storing the data
We must store the callstack somewhere. For heap allocations, I
decided to enlarge the block that was allocated by 16 bytes, and
add our information at the back of the allocated block. I also provide
a tag of 4 bytes in the 16 bytes. Choose a hexadecimal value such
as 0xCAFEBABE for
the tag value. The tag value is used later, when walking the heap.
The heap walker must check if the allocated block it is processing
has our callstack information, since there will always be allocations
that we didn't track. In running a test of our first level, I found
that we managed to track almost all allocations:
Heap summary: Total count=76162, of which: Tagged: 75756, Untagged:
406!
Heap summary: Total size=28244816 bytes, of which: Tagged: 26859088,
Untagged: 1385728!
The Xbox memory manager rounds each heap allocation to a 16 bytes
address (a 16-byte alignment), and the size is always a multiple
of 16 bytes. If you want to pad your own data to a block, do this
math yourself. First round up the size of the allocated item to
a multiple of 16 bytes, and then add another 16 for your own data
(or any multiple of 16). Using 16 bytes, we can store a callstack
three functions deep. Figure 4 shows the layout of an allocation
of 24 bytes on Xbox.
As you can see, we are losing 8 valuable bytes. There is not much
we can do about this: during the heapwalk, there is no way to recover
the original size that was requested for the block after the allocation.
As a last resort you could add a byte at the back of the block indicating
the number of callstack levels present. This way you could have
a dynamic number of callstack levels, ranging from 3 to 6 levels
deep, filling up unused bytes (the tag needs to shrink to 3 bytes
though).
Although I have used the approach as described above, there are
a few disadvantages to it:
- The 16-byte overhead per allocation block pollutes the memory
dump.
- The callstack is quite limited, unless we add even more overhead
per block.
- There is a small chance that a memory block is recognized as
a tagged block, even if it is not, since we can't guarantee our
tag will be unique. This is not very harmful: the system won't
crash; it will simply display a few blocks with incorrect or unknown
callstack functions.
On the positive side, these downsides never really proved to be
a problem to me. The system is easy to implement, and more importantly:
there is no performance penalty involved when a block is allocated
or freed!
Still, I would like to present another approach. Since the Xbox
has support for multiple heaps, we can create a separate list that
contains the extra allocation data and put it on an alternative
heap. The advantage of this technique is that our memory snapshot
will be the exact representation of the memory in a normal build.
It is also much easier to track larger callstack levels, as XbMemDump
does, and it makes walking the heap easier: we can just run over
this list. The disadvantage is that each free of a memory
block will need to search this list in order to delete our extra
data. We need to use a hash table or another optimization algorithm
in order to keep the performance penalty down.
For physical allocations, you have no choice but to maintain a
separate list with the addresses, sizes and return addresses. We
have to, because there is no such thing as a "PhysicalAllocWalker"
on the Xbox. Typically there will be far fewer PhysicalAllocs
then HeapAllocs,
so the performance penalty for walking the list on a deallocation
is not too big. In our test run of our first level, our number of
PhysicalAllocs were:
*** Number of tracked physical allocations:39, total size: 12601656
***
Dumping all allocations
We can now create a snapshot of the memory. If we decided to put
our heap data on a separate heap, we can simply run over the list.
If we didn't, we will need to walk the heap, and for each item,
check the tag to see if it was tracked by our code, this output
the extra allocation data that we stored at the end of the block.
For PhysicalAllocs,
we simply run over the list of PhysicalAllocs.
We can walk the heap pretty easily by using Microsoft's debug function
HeapWalk. It works
perfectly, but unfortunately, it is only available in the debug
libraries. It is difficult, if not impossible, to make a release
build while linking with just the XapiLibD.lib.
Whenever I tried this, I always ended up in a complete debug build.
The reason HeapWalk
is put in a debug library is purely that Microsoft does not want
our final game to have low-level heapwalk functionality, which sounds
plausible. Perhaps they should place the HeapWalk
function in the XbDm library, which can be easily linked into a
release build, but is unapproved.
One key disadvantage of a debug build is that the data structures
will look quite different. In debug mode the memory manager behaves
slightly differently. For instance: the heap header for each allocation
block is larger, and it adds 0xFF tags to check for memory overruns.
Last but not least, most games run terribly slow in debug mode.
Sadly, there is no simple way to walk the heap in a release configuration
unless we write our own heapwalker. I have tried and I have come
a long way, but it is not a methodology I want to propagate. The
Xbox kernel is way too complicated and it is bad practice not to
use Microsoft's existing code. For the PS2 however, my colleague
Tom van Dijck wrote a heapwalker. A detailed description of his
PS2 heapwalker can be found below.
Finally, we need to output the image's base address. We can retrieve
the image base address by calling DmWalkLoadedModules.
This function will return all currently loaded modules, including
kernel and debugging modules. We need to output all the base addresses
along with their names. An in-depth description of the image base
address will be given in part two of this series.
As mentioned earlier, I personally decided not to output function
names in the memory dump. If you would like to do so, the “Dm”
functions provide functionality for parsing symbol information and
converting addresses to function names. For more information on
the Xbox memory functions, take a look at Forrest Trepte's Xstream
training session on Xbox central [REF9].
______________________________________________________
|