
Play
by Play: Effective Memory Management
By
Brian
Hixon, Daniel Martin, Rob Moore, Greg Schaefer, and Richard Wifall
Gamasutra
August
2, 2002
URL: http://www.gamasutra.com/features/20020802/hixon_01.htm
Back before Al Gore invented the Internet, back when 64KB was more memory than any computer would ever need, there was a time when memory managers didn’t exist. But gradually, new computer systems came out with larger amounts of memory (Figure 1). Game designers discovered ways to eat up RAM faster than any system could dish it out. It became obvious that managing all this data with a simple static memory map was too much work. Programmers needed a way to create memory maps that could change dynamically at run time depending on the state of the program. Thus, the memory manager was born.
For an excellent introduction to how a basic memory manager works, check out www.memorymanagement.org/articles/begin.html. This article will assume this basic knowledge and focus instead on things to think about when writing your own memory manager and point out some of the problems you may encounter in using a memory manager. This discussion is based on our experiences in writing and rewriting the mem-ory manager for Madden NFL 97 to Madden NFL 2002. This article is also slanted toward the C language as written for console game machines. The same principles would apply to C++ and Windows programming, but the details are likely different.
Growing Your Own
So you know that memory managers are great, but what’s wrong with just using
malloc and free?
They are part of the standard C library, so they should be the best, right?
One major problem is that you don’t really know what they are doing. If you
write your own memory manager, you can generate statistics, add debug code,
and add advanced features that might not be found in the default memory manager
— features like handles, heap compaction, and virtual memory. The ultimate weapon
of game programmers is context. You know the needs of your game and can tailor
the memory manager accordingly. Developing a successful memory manager requires
addressing five different issues: ease of use, performance, memory over-head,
fragmentation control, and debugging support. Unfortunately, these attributes
are all interconnected, and optimizing one can result in substandard results
for another. The memory manager designer must govern a delicate balancing act.
At a lower level, memory manager design also requires paying attention to platform-specific requirements. In addition, it may be possible to utilize hardware support to assist the memory manager.
Ease of Use
With respect
to a memory manager, the ease of use consideration really comes down to a single
question: Should the memory manager support pointers, handles, or both? When
design-ing a memory manager and dealing with the problem of fragmentation, the
use of handles can be very appealing. Unfortunately, while handles provide a
straight-forward solution to the fragmentation problem, they are much more difficult
to use than pointers. A memory manager that supports only handles is essentially
pushing its internal complexity back onto the user.
While supporting both handles and pointers is possible, the resulting memory
manager is more complicated than one that supports a single method. Madden
used to support both handles and point-ers until an analysis showed that pointers
were being used 99 percent of the time. Not surprisingly, when given a choice,
programmers used pointers, since they were the easiest solution. Therefore,
we simplified the latest Madden memory manager by removing handle support
and optimizing the pointer support.
Performance
Performance must be addressed both in terms of speed and consistency. A memory
manager that is fast most of the time but slows down dramatically at unpredictable
times is unacceptable in a gaming environment. Therefore it is important to
understand the issues that contribute to performance. From the memory manager
user’s point of view, two operations will impact the game: allocations and recycling.
Allocation performance is determined by allocation poli-cy, free list management,
and the use of garbage collection. The most popular allocation policies are
best fit, first fit, and next fit. By organizing the free blocks as a linked
list, best fit has consistent O(n) performance, while first fit and next
fit have worst-case O(n) and on average O(n/2). By organizing
the free blocks as a size-sorted binary tree, best fit has worst-case O(n)
and on average O(n log n). By organizing the free blocks as a
balanced binary tree, best fit has consistent O(n log n). Garbage
collection applies only to memory managers with handle support, and generally
involves a fairly significant performance penalty when it occurs, as it essentially
makes a copy of the entire memory heap during compaction.
|
||||
|
Figure 1: Main memory growth for console game machines. |
||||
Recycling performance
is based on free list management and the use of free block coalescing. Free
block coalescing requires the ability to locate the blocks immediate-ly preceding
and following any arbitrary block. Some memory structures, such as boundary
tag (see Donald Knuth’s Fundamental Algorithms under References for more
information on the boundary tag), allow this in O(1) time, while those that
don’t require an O(n) scan of the free block list. The current Madden
memory manager uses boundary tags to allow O(1) access to previous/subsequent
blocks and organizes the free list as an unbalanced binary tree. The result
is allocation and recycling performance both in the O(n log n)
to O(n) range.
MemoryOverhead
Memory overhead is the cost we have to pay to manage memory, since each allocation
will cost some memory. Memory overhead is an important consideration, especially
if the memory manager will be handling a large number of objects. The first
decision is whether the memory map state should be stored internally or externally
to the memory being managed. If it is stored internally, then the maximum number
of allocated blocks does not need to be known in advance, but memory that is
not directly CPU-addressable cannot be managed. If it is stored externally,
you must know the maximum allocated and free blocks in advance and set aside
memory for this state, but address spaces that are not directly CPU-addressable
can be managed.
Madden previously used external storage for the memory state, but this
required additional overhead because it was impossible to predict accurately
the maximum number of allocated and free blocks. We had to include a “safety
factor,” which turned directly into wasted memory. The new memory manager uses
internal storage as shown in Figure 2, thus avoiding the entire issue. All allocations
are 16-byte aligned and each block has 16-byte overhead. By limiting allocations
to a minimum of 16 bytes, every block is guaranteed to have this much storage
available. Therefore, when an allocated block is released, those 16 bytes can
be used to organize the block into the free list.
It is worthwhile to digress slightly and consider the management of non-CPU-addressable
memory. Because consoles are designed for low cost and high performance, they
sometimes incorporate special-purpose memory that is not directly CPU-addressable.
For example, the Gamecube has 16MB of auxiliary memory (ARAM) that supports
only DMA access. A memory manager that stores its state internal to the memory
it is managing cannot be used in such cases, while a memory manager that stores
its state externally can. While it may seem appealing to use external state
storage in order to support all kinds of memory, our experience with Madden
has shown this to be a mistake. Memory that is not directly CPU-addressable
is normally used only for special types of objects, due to the com-plexity of
using the memory, and often contains only a small number of objects. Therefore,
Madden now uses a single, general-purpose internal storage memory manager
for all CPU-addressable memory and an additional, customized external storage
memory manager for any special-purpose case, such as the Gamecube’s ARAM.
Fragmentation
Control
![]() |
|
Figure
2: Memory structure image.
|
Memory fragmentation
is a condition that occurs over time as memory is allocated and released and
isolated free blocks form. This is not usually an immediate problem, but as
more fragments form and they get smaller, the opportunity for an allocation
failure due to fragmentation increases. Eventually, if fragmentation gets severe
enough, an allocation may fail because no single free block is large enough
to accommodate the request even though the total amount of free memory is (Figure
3). Therefore, a good memory manager needs to either take steps to limit the
amount of fragmentation that occurs or be able to consolidate fragmentation
periodically through garbage collection. An allocation failure is often a lethal
error for game code and must be avoided at all costs. While it is generally
fairly easy to determine the maximum memory required by an application, fragmentation
can make such a calculation meaningless. Allocation policy, free block coalescing
and multiple heaps all play a part in minimizing fragmentation. Free memory
coalescing is a straight-forward technique for limiting fragmentation that attempts
to merge a newly released memory block with its neighbors. If the block immediately
preceding or following the newly released block is also free, merging the blocks
together results in a single, larger free block. This particular technique is
almost mandatory, as without it, fragmentation occurs very quickly. In addition,
this limits the size of the free list to the minimum number of blocks, which
generally has a positive performance impact on allocation. Since this technique
has no impact on how the memory manager is used externally, it is incorporated
into most memory manager designs.
The choice of allocation algorithm has a definite impact on fragmentation. Unfortunately,
because the size and timing of allocations and releases vary for every application
and are often different from run to run of an application, it is impossible
to say that a particular algorithm is always better or worse than another. However,
in general, it has been found that “best fit” algorithms tend to exhibit the
least fragmentation. See Johnstone and Wilson’s “The Memory Fragmentation Problem:
Solved?” in References for more information on the impact of allocation techniques
on fragmentation.
![]() |
|
Figure 3 : Although 128 free bytes are available, the largest possible allocation is only 64 bytes due to fragmentation. |
A technique that
requires more involvement from the memory manager user is to allocate blocks
with similar lifetimes close to each other. If the lifetime of a memory block
is known when it is being allocated, the allocation algorithm can attempt to
place it by other blocks with a similar lifetime. When these blocks are released,
they will be merged into a single large free block. The problem is that this
requires the caller to specify both the lifetime of each block as well as the
total size of the group of blocks of similar lifetime. As a result, a simpler
version of this technique is usually implemented through the use of multiple
heaps. By allocating blocks with similar lifetimes within their own heap, a
similar effect is achieved, though there are generally practical limitations
on the number of heaps that can be effectively utilized.
Debugging Support
One of the main benefits of writing a memory manager is the abilityto add capabilities
that are not provided in the supplied memory manager. By adding some debugging
features you can make sure that you are managing the memory rather than stepping
in the heap. To avoid stepping in the heap, you’ll need to be able to see the
heap. Adding information and tools to help visualize memory usage can be a big
help when managing memory. This can be done in a number of ways.
The debugging
techniques used by the Madden memory manager include the ability to audit
memory usage, collect usage statistics, check for consistency, and follow the
memory map from a debugger. Auditing is really nothing more than being able
to mark a group of blocks when they are allocated and later check to see if
they were all released. Usage statistics, such as maximum number of allocated
and free blocks, as well as maximum allocated memory, are valuable for evaluating
the memory utilization of the application.
Consistency checking goes through the memory map and makes sure there are no
detectable errors in it. This means verifying block sizes, checking boundary
tags, ensuring matching block counts, and so on. By performing consistency checking
prior to every allocation and release (in a debug build), certain forms of memory
corruption can be detected quickly. Also, the memory map contains ASCII debugging
information that can be viewed from within the debugger. Prefixing every allocated
block with a text string indicat-ing the module and line number of the caller
that allocated the memory greatly assists when debugging after a crash. Naming
the memory blocks is a necessity. Imagine going to a store where every item
was in a plain box that only had its UPC code and box size printed on it.
Finding the item you wanted to buy would be a big challenge. Likewise, if you
have a list of memory blocks that only contains their address and size, locating
memory blocks is going to be difficult unless you give those memory blocks names.
Using the source file name and line number where the allocation occurred would
be a good start. In C, this can be accomplished quite easily through the use
of macros and the C preprocessor. Now you can recognize your blocks in a crowd,
but it sure would be nice to know who they hang out with. By adding a group
label or flags, you can group your allocations at the conceptual level rather
than being limited to grouping your blocks by source file. This way you can
know that a memory block that was allocated is really being used by your main
character, even though the actual memory allocation occurred in your texture
library.
Figure 4 shows an example memory dump as seen by a (slightly drunk) debugger.
Including debugging information in textual form within the memory map allows
you can make sense of it from the debugger. For example, if you had an invalid
address access at 0x007cfea4 , you could find that address in the debugger and
page around it to see that it occurred in the Joystick Driver library and that
the memory in question was allocated by pJoyDrvr.c at line 225.
|
||||
|
Figure 4: Memory blocks with debug information as viewed in the debugger. |
||||
With all this information available, you will need to find ways to view that information that can help you when managing your game. A linear list of memory blocks can be useful for spotting potential memory fragmentation, while a list of memory blocks sorted by name can be useful when you have memory leaks.
If you do find
a memory block that has leaked, you will know from its name exactly where the
allocation occurred. With group labels you can print out compact memory maps
that show how each conceptual module in your code is using resources. By tracking
this throughout the project, you can easily spot modules that are using more
or less memory than you had budgeted in your original design. You can also create
functions to check whether groups of memory allocations have been freed. This
can help prevent memory leaks if you know that in certain situations some groups
have no allocations.
Keeping track of worst-case situations is also important. Have the memory manager
save the lowest amount of free memory it has ever encountered (update this every
time an allocation occurs). If you are using some sort of node system to keep
track of your memory blocks, keep track of the highest number of nodes that
the memory manager has ever used so that you know if you are close to running
out of memory blocks. Sentinels added to the beginning and end of memory allocations
can help detect overflow situations that would normally corrupt other data in
memory. If the sentinels don’t check correctly when the memory is freed, then
some code has been bad.
Filling memory with recognizable patterns can be extremely useful. We use a
different pattern for unallocated, allocated, and freed memory. This way, we
can tell in the debugger at a glance what the situation of a particular piece
of memory is. When someone forgets to initialize memory they allocated properly,
the “allocated” pattern shows up.
You can also have functions that scan free/unallocated memory and make sure
that it all still matches the prespecified patterns. If it doesn’t match, some
code out there is incorrectly writing to memory locations that it doesn’t own.
Finally, make sure that you set up an extra heap for debug memory and put all
this extra debug information there. You want your game memory to be as similar
as possible between a debug and final build.
Platform Specifics
A memory manager presents a logical view of memory in the same way that a file
system provides a logical view of a disk drive. Most often, memory managers
are concerned with managing dynamic RAM of some sort. Some console makers like
to make things more interesting by providing a relatively large main memory
but also scattering other smaller chunks of RAM around the system.
The memory manager
allows us to abstract away the physical details of the memory subsystem and
deal instead with a nice, logical model of memory. For example, on the PS2 we
don’t necessarily need to know that the first megabyte of RAM is reserved for
the operating system. It’s enough that the memory manager knows. By abstracting
away some of the details of the physical memory system, our game can become
more platform independent.
Most console hardware has alignment requirements that, not so surprisingly,
differ from platform to platform. Many platforms require anywhere from 4- to
64- byte alignment for buffers in graphics rendering or file IO. Each type of
hardware might need the memory manager to be tweaked to better fit the needs
and abilities of the platform. Often this information can be passed to the memory
manager using a heap setup structure.
Finally, you should be wary of third-party libraries that may use malloc and
free , effectively bypassing your memory manager’s interface. The printf function
in the PS2 standard C library uses malloc and free ; our solution was to write
our own printf function.
Hardware Support
On most modern computers, the issue of fragmentation has been greatly reduced
by the use of a hardware-based paged memory manager unit (PMMU).
Obviously, the fact that virtual memory provides an application with lots of
addressable memory means that even an inefficient memory manager can be used.
However, the more interesting point is that the PMMU without any secondary storage
can dramatically help with fragmentation. The PMMU takes a very large address
space (larger than the physical RAM it represents) and maps it onto physical
memory. Obviously, this mapping is not one-to-one, but rather it maps a subset
of the memory space onto a “working set” of memory pages.
The key impact of using a PMMU in terms of fragmentation is that when a memory
block is released, any portion of that block completely spanning one or more
PMMU pages can be remapped by the PMMU. The result is actually two forms of
fragmentation: address-space fragmentation and memory fragmentation. While this
effect might seem to make a bad problem worse, it actually simplifies things.
Because the PMMU provides a large address space, the address-space fragmentation
can be largely ignored. Instead, the allocation algorithm concentrates on minimizing
memory fragmentation by first attempting to utilize the unused portions of allocated
memory pages (pages that contain some allocation but are not completely full)
before remapping unused memory pages back into memory.
Managing Your Memory Manager
Memory managers are like most real-life managers. You have to keep them under
control, or else they tend to wander off without adequate information and make
strange decisions. Your memory manager sometimes needs your help to stay on
the right track too. Let’s look at some common problems that can occur and how
to work around them.
Fragged again.One of the major side effects of hand grenades and memory managers
is fragmentation. Previously, we discussed fragmentation in the context of designing
the memory manager to avoid or reduce the effects of fragmentation. However,
there are also some application- level techniques that can reduce fragmentation.
The use of static allocation (memory defined within a module with storage allocated
within the application image) avoids the memory manager completely and thus
avoids fragmentation. Of course, this usage really only makes sense when a single
object will be represented and when the lifetime of that object is approximately
the duration of the entire application. In such cases, static allocation can
provide a benefit by limiting utilization of the dynamic memory manager to those
memory objects that are truly dynamic.
Another strategy that relies entirely on the memory manager user is to perform
submanagement of memory based on specific program knowledge. For example, if
a module knows that it needs to allocate X objects of size Y, it may be far
more efficient for the user to allocate a single block of X * Y bytes and perform
its own management within that larger block. By reducing the number of blocks
that the memory manager has to deal with, the user has generally made the job
of course, a caveat. Depending on the amount of fragmentation and the size of
X * Y, it is possible that the application could find itself in the situation
where an allocation of X * Y fails due to fragmentation, whereas X allocations
of Y would have succeeded. We also try to discourage this practice when possible,
as there is code-maintenance overhead.
One way to help avoid memory fragmentation is to always free memory in the opposite
order from which it was allocated. In fact, if you were always able to allocate
and release in such an order, you would never have memory fragmentation. Realistically,
it’s not possible to follow this practice all the time, but doing it as much
as possible will help. Memory fragmentation is going to occur, and at some point
you will probably run into a situation where it causes a problem in your game.
You might have fragmentation that is occurring in one part of your game that
is causing a memory allocation to fail in a totally unrelated area. Fragmentation
might even manifest itself as a situation where your game will fail only when
arriving at part of your game through a specific path (that causes the fragmentation).
If you don’t have some advanced form of garbage collection, you are going to
have to use other, more crude methods to limit this problem. One possibility
is to change the code that is causing the fragmentation to use a different allocator
so that it doesn’t cause fragmentation. A common way to accomplish this is to
have an allocator that allocates from the other end of memory.
Depending on your game, fragmentation can become more problematic over time
(especially if your allocations don’t have close locality of lifespan). You
can use brute force to minimize these effects, such as shutting down and restarting
code modules between levels as a brute-force garbage collection technique.
Release builds.When running a release build, there isn’t any debugging information
in the game, as it will consume extra memory. But you still need a way to know
where the game runs out of memory. For Madden, we assume in memory. If
the game does run out, it will crash and optionally dump some memory info. With
a special build, we display some partial information about the memory manager
and the game state so that we can determine if it ran out of memory because
of fragmentation or other reasons.
Getting on the
Field
When talking about memory management, programmers often resort to words more
often associated with cow pastures than games. Terms like heaps, stacks, and
chunks are thrown around like flies buzzing around you-know-what. To see how
important a memory manager is to a game, you have to get past the abstract poo.
A good memory manager allows you to have more animations, more characters, more
textures, more sounds — in short, more of everything that your game-playing
customers love.
In this article we have described some of the issues that may come up in writing
and using your own memory manager. After years of writing and rewriting memory
managers for the Madden series, one piece of advice that bears highlighting
is simply to make sure you schedule adequate design time on this very important
piece of your system; you’ll be glad you did.
References
The Memory Management Reference Beginners Guide: www.memorymanagement.org/
articles/begin.html
Johnstone, Mark S., and Paul R. Wilson. The Memory Fragmentation Problem:
Solved? In Proceedings of the International Symposium on Memory Management.ACM
Press, 1998. pp. 2636. www.cs.utexas.edu/users/wilson/papers/fragsolved.pdf
Knuth, Donald E. The Art of Computer Programming, vol. 1, Fundamental Algorithms.
Reading, Mass.: Addison-Wesley, 1973. pp. 435444.
Copyright © 2003 CMP Media Inc. All rights reserved.