Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Profiling, Data Analysis, Scalability, and Magic Numbers, Part 2: Using Scalable Features and Conquering the Seven Deadly Performance Sins
View All     RSS
July 5, 2020
arrowPress Releases
July 5, 2020
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Profiling, Data Analysis, Scalability, and Magic Numbers, Part 2: Using Scalable Features and Conquering the Seven Deadly Performance Sins


August 16, 2000 Article Start Previous Page 2 of 3 Next
 

5. Inefficient memory usage. Poor performance can be caused by data structures that are not cache-line aligned, random access to main memory, using too much memory, allocating memory, and data dependencies. In AoK, memory problems could be especially severe since multiplayer games can last six hours or more, during which time tens of thousands of units can be created and destroyed.

Many data structures in performance-critical areas were compacted to fit in multiples or fractions of cache lines to improve memory access. There were also other areas that could have been improved by re-arranging data in structures of arrays, or streams (see For More Information section), but this would have made the code even more complicated.

To analyze and improve the memory usage of AoK, we used a number of different tools. The first tool that was a tremendous help was the set of Windows NT performance counters, which we used to examine memory statistics quickly. The NT performance counters provided a wide array of data about an application, including processor, process, memory, and network statistics. In the case of AoK, the most important memory statistic was Private Bytes, the amount of nonshared memory allocated for the AoK process.

By sampling the memory footprint at specific intervals, we created a general picture of the game memory footprint (Figures 3a and 3b). Since the game's memory requirements are effectively the same across Windows NT and Windows 98, the NT performance counters helped us examine how memory was used during a four-player game on the minimum specified player's system. This was key to helping us determine if AoK would fit within the minimum target memory size of 32MB.

Figure 3A. Four-player game memory usage over time.
Figure 3B. Eight-player game memory usage over time.

Given the minimum system game requirements (Figure 4), we estimated that a game should typically last about 45 to 60 minutes. In the four-player game example shown in Figure 1a, about 21MB of memory was allocated by the game upon start up. Thirty minutes into the game, memory usage rises to around 23MB.

Figure 4. AoK minimum system game specifications.
4 player; any combination of human and computer players
4-player map size
75-unit population cap
800X600 resolution
Low-detail terrain graphics quality*
*added as part of scalability effort

In contrast, look at the memory footprint of the eight-player game shown in Figure 3b. The addition of more players to the game requires more memory for their data at startup, as well as more memory to support the larger game map. The amount of memory consumed continues to grow during the game as more units and buildings are created until a plateau is reached. After reaching that plateau (not shown), the memory footprint starts tapering back down. The receding memory footprint occurs as players and units are defeated.

While these high-level memory statistics from the NT performance counters are quick and useful, often it's necessary to drill down to see which specific functions are allocating memory. To get that information, we created a simple memory instrumentation system to track memory allocations (see Listing 1). The memory allocation code tracked allocations and de-allocations by memory address, number of bytes requested, and file name and line number of the actual function call. It also provided a running count of the number of allocations and de-allocations, and the bytes of memory allocated in each game update loop.

Listing 1. A simple memory instrumentation system for AoK

//================================================
// memory.h header
//================================================
extern "C"
{
void *mymalloc( size_t size, const char *pfilename,
  const long dwline);
void myfree( void *memblock, const char *pfilename,
  const long dwline);
};
//================================================
#ifdef _INSTRUMENTMEMORY
#define malloc DEBUG_MALLOC
#define free DEBUG_FREE
#endif

#define DEBUG_MALLOC(size) mymalloc(size, __FILE__,   __LINE__)
#define DEBUG_FREE(block) myfree(block, __FILE__,   __LINE__)
//================================================
#ifdef _INSTRUMENTMEMORY
void MemoryInit(void);
int MemorySave(void);
void MemoryUpdate(void);
#else
#define MemoryInit
#define MemorySave
#define MemoryUpdate
#endif
//================================================
// eof: memory.h
//================================================

//================================================
// memory.cpp
//================================================
#include <windows.h>
#include <stdio.h>
#include <io.h>
// !!! DO NOT include memory.h header file here !!!
//================================================
static FILE *pmemfile, *pupdatefile;
static bool binitialized = false;
//================================================
static DWORD gdwAllocCount;
static DWORD gdwByteCount;
static DWORD gdwDeletions;
static DWORD gdwFrameCount;
//================================================
void MemoryInit(void);
//================================================
void MemoryUpdate(void)
{
    if (pupdatefile)
    {
        fprintf(pupdatefile, "%lu\t%lu\t%lu\t%lu\n",
                gdwFrameCount, gdwAllocCount, gdwDeletions, gdwByteCount);
        gdwDeletions = 0;
        gdwAllocCount = 0;
        gdwByteCount = 0;
        gdwFrameCount++;
    }
} // MemoryUpdate
//================================================
extern "C" void *mymalloc( size_t size, const char *pfilename, const long dwline)
{
    RGEMemoryEntry entry;
    gdwAllocCount++;
    gdwByteCount += size;
    void *p = malloc(size);
    if (!binitialized)
        MemoryInit();
    if (pmemfile)
        fprintf(pmemfile,         "malloc\t0x%X\t%ld\t%s\t%ld\n", p, size, pfilename, dwline);
    return p;
} // mymalloc
//================================================
extern "C" void myfree( void *memblock, const char *pfilename, const long dwline)
{
    RGEMemoryEntry entry;
    gdwDeletions++;
    if (!binitialized)
        MemoryInit();
    if (pmemfile)
        fprintf(pmemfile, "free\t0x%x\t\t%s\t%ld\n",         memblock,
pfilename, dwline);
    free(memblock);
} // myfree
//================================================
void MemoryInit(void)
{
    if (binitialized)
        return;
    pmemfile = fopen("c:\\memory-alloc.txt", "wb");
    pupdatefile = fopen("c:\\memory-update.txt", "wb");
    if (pmemfile)
        fputs("type\tptr\tbytes\tfilename\tline\n", p);
    if (pupdatefile)
        fputs("frame\tallocations\tdeletions\ttotal         bytes\n", p);
    binitialized = true;
} // MemoryInit
//================================================
int MemorySave(void)
{
    fclose(pmemfile);
    fclose(pupdatefile);
    pmemfile = 0;
    pupdatefile = 0;
    return 0;
} // MemorySave
//================================================
// eof: memory.cpp
//================================================

The sheer number of memory allocation schemes used in AoK complicated our memory analysis. AoK uses the C++ new and delete operators; C library malloc, free, and calloc functions; and Win32 GlobalAlloc, GlobalFree, LocalAlloc, and LocalFree functions. In the future, we will be actively restricting ourselves to a subset of these functions.

To reduce memory fragmentation and eliminate overhead caused by allocating and de-allocating memory, memory pooling was used in many subsystems. While this significantly increased performance, it did create problems when trying to fix bugs where code referred to recycled data.

In an attempt to improve performance further, we utilized MicroQuill's SmartHeap to manage memory in release builds. (We were unable to use it in debug builds due to incompatibilities with interactive debugging.) In the final analysis, the performance benefit of SmartHeap over the standard heap manager wasn't clear to us, due to the efforts we made to reduce and pool memory allocations.

After profiling performance and memory usage, it turned out that the most performance-limiting factor in AoK could be the Windows 95/98 virtual memory system. Unlike Windows NT/2000, Windows 95/98 doesn't require or configure a fixed-size swap file for virtual memory. To make matters worse, the swap file can grow and shrink as a program runs. An expert user can create a swap file of fixed size, but it's not something the vast majority of users can do or should have to worry about.

AoK relies on the virtual memory system to handle the growing footprint of game data over time within the game. It also uses multiple read-only memory-mapped files to access game graphics and sounds residing in large aggregated resource files. These memory-mapped files ranged in size from 28MB to 70MB. Since the amount of virtual memory available can vary so widely on a user's Windows 95/98 system, this ended up being the number one AoK performance issue beyond our control. It should be noted that this virtual memory problem didn't effect every minimally configured system. Virtual memory problems in Windows 95/98 seemed to occur just on certain systems, even when identically configured systems performed with little or no problem.

6. Inefficient code. Rewriting inefficient code is likely the most well known performance optimization, but it was typically the last resort to fix our performance problems. In many cases, the performance problem was resolved by identifying and fixing one of the previously mentioned deadly sins.

The easiest place to attempt to improve inefficient code is with the compiler optimization settings. Due to the size of AoK, we chose to compile release builds with the default "maximize speed" setting for all modules. This may cause some code bloat (since speed is favored over size), but in general it's a good choice. We chose not to use "full optimization" since we've seen few programs that could run after using it.

Since shipping AoK we've been looking at the performance benefits of compiling with "minimize size" and then using #pragma (or module settings) to optimize specific hotspots for speed. This seems to be a better trade-off than just using the single speed optimization setting for everything.

In AoK we chose to use the "only _inline" option in Visual C++, instead of inlining "any suitable" function. This let us choose which functions to inline based on their appearance in the profile list. Inlining any suitable function would most certainly increase the code size and lead to slower performance.

Using an alternate compiler, such as Intel's C/C++ compiler, to optimize one or more performance-intensive modules is also another way to realize some additional performance gains. We decided against this for AoK, however, because of the risk associated with changing compilers (or even compiler versions) near the ship date.

7. Other programs. One of the greatest strengths of Microsoft Windows is its ability to preemptively run multiple programs at the same time. However, it can be a huge drawback when programs that the user is unaware of take CPU time away from a game or cause the game to lock up. For instance, during the play-testing phase of AoK's development, we received reports of problems that we couldn't reproduce on our own systems. Sometimes these issues were caused when the game entered an unstable state, but often other programs running in the background on the tester's computer caused the reported problems.

Dark Age village.

Virus scanners and other programs spontaneously running in the background while a tester was playing AoK were the most widespread cause of externally induced performance problems. Unfortunately, there's no way to easily and adequately interrogate a player's computer and warn them about potential problems that other programs can cause.

The most severe issue related to other programs involved the Windows Critical Update Notification. Play-testers sometimes reported input lock ups during game play for no apparent reason. We accidentally discovered that when AoK was in full-screen mode, the Critical Update Notification could pop up a dialog box behind AoK. This would take the focus off AoK and make it appear to players as if the game had stopped accepting input. Changing AoK to handle situations like this was relatively easy once the problem was identified. Other applications likely cause similar behavior to occur, but it's only by trial and error that these problems are identified.


Article Start Previous Page 2 of 3 Next

Related Jobs

innogames
innogames — Hamburg, Germany
[07.03.20]

PHP Game Developer - Grepolis
Remedy Entertainment
Remedy Entertainment — Espoo, Finland
[07.03.20]

Programmer (Character Technology team)
Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[07.03.20]

Experienced Game Developer
Remedy Entertainment
Remedy Entertainment — Helsinki, Finland
[07.01.20]

Technical Director





Loading Comments

loader image