|
Features

Monitoring Your Console's Memory Usage, Part Two
PS2
Map files
In CodeWarrior you can let the linker output an XMap file. This
file contains a start address, size, and decorated name per function.
Parsing it should not be too difficult. Each return address in our
stack trace is matched to all address-size ranges and if it is within
the correct range, that name is stored. Listing 8 displays a piece
of a CodeWarrior Xmap file.
PS2 symbol information
Codewarrior uses debug information in the DWARF 1.1 format (Debug
With Arbitrary Record Format). For information on the format, please
refer to [Ref 1]. The other PS2 compilers, GCC and ProDG, use the
ECOFF/STABS debug format. I have no experience using any of them,
but I know that there is source code on the web for reading the
DWARF format. There is an executable called DwarfDump
and an open source library called DwarfLib.
For more information, refer to [Ref 2].
The details of MemAnalyze
In this section I like to explain how we will process our platform
independent allocation data. I guess you can figure out for yourself
how to build a memory layout view, so I will not get into details
about that view. The two other views need a little more attention.
The TopX view
More on return addresses
The return addresses we stored in the memory dump are more valuable
then you might have thought in the first place. They do not just
point at the function that allocated the memory, they point to the
instruction within the function that allocated the memory (figure
2). Using this information, we can distinguish between multiple
allocations in a function. Do not be tempted to replace the return
addresses with function names unless you store the offset of the
instruction along with it.
Finding the allocators
Let's take a look again at our list of allocated blocks with their
callstacks. Let's forget we have a complete callstack per allocation,
and first just focus on the return addresses on top of the callstack:
the actual calls to new, XmemAlloc
or any other allocation function. We simply need to run over our
complete list of blocks and find all the different return addresses
from callstack level zero. For all these return addresses, we need
to accumulate the total size allocated and the number of allocations
performed. Doing so, we have an overview of all allocations, and
they can be sorted on allocation address, total size allocated,
and the number of allocations.
This gives us a great overview of our allocations. However, the
return addresses we are looking at are sometimes too deep into system
code. It may not be all that interesting to know that D3DAllocContiguousMemory
allocated 30 megabytes of memory. It provides us with some information,
but we would rather like to zoom out to see who called D3DallocContiguousMemory.
This way we could see how much memory is spent on vertex buffers
or texture memory, for instance.
Zooming out
For a more global view, we first can collapse the data a bit by
sorting on the function that allocated the memory instead of the
actual instruction that performed the allocation. This will combine
all allocations in the scope of a function.
Theoretically, to zoom out even further, we could sort on a different
level in the callstack. Instead of using entry zero, we could sort
on entry one or entry two. But, this doesn't make much sense, and
I am not even sure what good this information would do. If we want
a better overview of our allocations, the hierarchy view is much
more elegant, as described later.
The Memory leaks view
When to make a memory dump
We have discussed the comparison of multiple memory dumps. Now
we need to decide at what point in the game we will make these dumps.
We need to find a situation in the game where the memory allocation
state of the game is exactly the same, time over time. The application's
exit is one of these places. In our case, and I think this will
work for many games out there, the menu is another such place. Each
time you re-enter the menu after playing the game, the memory state
should be exactly the same. Do not be confused by the fact that
the menu will have allocated the items at a different location in
memory. The number of allocations that have been performed and the
size of the allocations should not differ. If it does, we will have
a memory leak.
There is one exception to this rule, and that is the use of memory
managers, such as freelists. Freelists may grow due to memory fragmentation.
I can tell you that freelists will grow and it will sometimes seem
that fragmentation is the cause. Disable the use of freelists to
make sure these are really fragmentation issues and not memory leaks.
Before I make my first memory dump, I usually load the level one
time and then go back to the menu. The game is likely to perform
a couple of one-time initial global allocations. I do not want these
to pop up in my memory report.
Finding the leaks
Now we have to come up with information on the memory dumps that
actually makes sense. I will discuss one of the algorithms I have
used. Because we may need to compare many thousands of blocks, performance
is an issue. I leave it up to you to optimize the algorithm.
We have two lists of say, 10,000 blocks of memory, each with a
callstack and an allocation size. First we will delete all the blocks
that have the exact same size and callstack that are present in
memory dump 1 and memory dump 2 (figure 3). This way, only the differences
in both dumps will remain. Naturally, you will need to make copies
of both lists first or you will destroy your source data.
In our figure, that leaves us with block 2 from memory dump 2.
If this was an actual situation, we could mark block 2 as a memory
leak and display the function name, size, and callstack. However,
it is not always this obvious. If we take a look at figure 4, which
represents a possible result of our difference algorithm, we can
see that the same callstack has allocated more memory in dump 2
than it did in memory dump 1.
In this case, this callstack allocated the same number of allocations,
but the size differs. This is a typical freelist situation, where
the freelist has grown. This is still quite straightforward. There
are a few other situations, and I'd like to point out one in particular.
Figure 5 displays a very odd situation.
In this scenario it is hard, if not impossible, to come up with
a verdict of what is going on. The callstack has not only allocated
more (or less!) memory in size, but has also allocated a greater
number of items. This seems like both a memory leak, and memory
growth or shrinkage. It is even very hard to tell if it would be
growth or shrinkage.
To handle all the situations in the resulted difference list, I
count the number of blocks and the total size that was allocated,
per callstack. For instance, in figure 5, the callstack 0x00001234,
0x00003456 and 0x00004567 has allocated 1 item in 128 bytes in memory
dump 1. It has allocated 2 items in 1536 bytes in memory dump 2.
Listing 9 displays all the different scenarios that I have come
up with.
|
|
NrBlocksMD1
= CountNrBlocks(list1, CurCallStack);
NrBlocksMD2 = CountNrBlocks(list2, CurCallStack);
assert(!(nrBlocks1 == 0 && nrBlocks2 == 0)); // They
cannot //
both be
//
zero
TotalSizeMD1 = CountTotalSize(list1, CurCallStack);
TotalSizeMD2 = CountTotalSize(list2, CurCallStack);
Diff = TotalSizeMD2 - TotalSizeMD1;
if(NrBlocksMD1
== NrBlocksMD2)
{
if(Diff >0)
{
// Dump2 allocated more
memory (growth) in same // number
of allocations
}
else
{
// Dump2 allocated less
memory (shrank) in same //
number of allocations
}
}
else
{
// One of them has to be zero, else we have
a very odd // situation (as in figure 5)
if(NrBlocksMD1 != 0 && NrBlocksMD2
!= 0)
{
if(NrBlocksMD2 >NrBlocksMD1)
{
// Dump2
Leak and grow/shrank
}
else
{
// Dump1
Leak and grow/shrank
}
}
else if(NrBlocksMD1 == 0)
{
// Dump 2 has leaked
}
else
{
// Dump 1 has leaked (weird)
}
}
|
 |
 |
 |
Listing
9: The different scenarios for our memory dump difference.
|
Using this code we can iterate over the remaining list of memory
dump 1 and compare it to the remaining list of memory dump 2. This
time I chose not to remove all the items that were processed, since
this can become quite complex to manage. Instead, I have marked
all the items that were processed, and skip over them on the next
iteration step. After we have compared list 1 to list 2, all the
items in list 2 that have not yet been processed are memory leaks!
So we need to run over the second list one more time, building a
list of all the leaks.
The hierarchy view
An idea that I have not built yet, but would look very cool to
me, is a sort of hierarchy view. It looks a lot like a traditional
profiler view.
Starting off with the return addresses of callstack level zero,
we can zoom out to their parents, and on to their parents. Keep
in mind that the parent of a return address from our callstack is
the function that performed the allocation, and that the parent
of a function is again the return address in the next callstack
level (figure 6). You can also decide always to collapse allocations
within a function.
Listing 10 shows an example of a hierarchy output.
|
|
+D3DAllocContiguousMemory()
(16KBytes in 16 allocations, 40% of all allocations)
+CTextureManager::CreateTexture()
(6KBytes in 6 allocations, 15%)
-CApplication::LoadScreen()
(4KBytes in 4 allocations,
10%)
-CCar::Initialize()
(2Kbytes in 2 allocations, 5%)
+CSpecialEffectMgr::CreateVertexBuffer()
(10KBytes in 10 allocations, 25%)
-CDynamicTrailActor::Initialize()
(4KBytes in 4 allocations,
10%)
-CParticleManager::CreateEmitter()
(4KBytes in 4 allocations,
10%)
-Coverlay::CreateItem()
(2Kbytes in 2 allocations, 5%)
|
 |
 |
 |
Listing
10. A possible hierarchy view.
|
The next big step
This version of MemAnalyze uses a memory dump from disk. However,
it would be fantastic to expand MemAnalyze to do real-time analysis.
I am always interested in what section of what level uses the most
memory. We could make a view like Windows' CPU performance window
(figure 7).
We could even track the history of memory mutations and fast forward
or rewind our statistics, and do compares on them. Although this
sounds very difficult to do, I wonder how much additional work it
would cost. We won't even need to worry about intermediate data
storage. We just send the allocation data directly to the PC.
Wrap up
On both project Xyanide and Cyclone Circus, we found
our fragmentation problems and memory leaks within fifteen minutes
after starting the game. We ran MemAnalyze at a regular interval,
and it provided us with information on what part of our code allocated
less or more memory.
In the end, our Playstation 2 game, Cyclone Circus, never
had more then 160K of lost space caused by fragmentation. Returning
from the game to the menu gave us the exact same memory layout,
with our heap end at exactly the same position, after each race--even
after 120 hours of demo mode racing. So these tools have proven
to be very useful. It would be a great if the console manufacturers
would provide these sorts of tools in the next generation consoles
and development tools.
Acknowledgements
Many thanks to my colleague Tom van Dijck, who deserves all the
credit for his PS2 implementation. I would also like to thank the
Xbox Developer Support Desk for their professional support.
References
[1] Information on the Dwarf 1.1 debug format
http://www.eagercon.com/dwarf/dwarf_1_1_0.pdf
[2] Information on Dwarf debugging format and binaries
http://reality.sgi.com/davea/
______________________________________________________
|