Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Profiling, Data Analysis, Scalability, and Magic Numbers, Part 2: Using Scalable Features and Conquering the Seven Deadly Performance Sins
View All     RSS
October 18, 2019
arrowPress Releases
October 18, 2019
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Profiling, Data Analysis, Scalability, and Magic Numbers, Part 2: Using Scalable Features and Conquering the Seven Deadly Performance Sins


August 16, 2000 Article Start Page 1 of 3 Next
 

Last month we discussed some of the performance issues facing Age Of Empires II: The Age Of Kings (AoK). I described some of the tools that we used at Ensemble to collect that data, including Intel's VTune, NuMega's TrueTime, and our own profiling code. In this concluding article, I'll describe how to improve performance by effectively using performance tools and instrumentation. We'll also look at general types of problems we encountered as we optimized AoK which can affect any game. Then we'll wrap things up by taking a look at the last bastion of getting a game to run on the minimum platform when all else fails: scalable features.

Performance Targets

In the first part of this article, we discussed the CPU and process utilization of AoK using Intel VTune and our own profiling code. However, the data and graphs presented only showed half the story - the culmination of optimizing AoK.

Luckily, depending on your point of view, we're in the middle of optimizing an expansion pack to AoK called Age of Empires II: The Conquerors (TC). In The Conquerors, we're again facing the same problems of raising performance to an acceptable level on the minimum platform: a 166 MHz Pentium computer with 32 MB of memory.

Having already gone through the performance process for AoK, we know rough CPU and process utilization breakdowns for the minimum platform (Figure 1c and 2c). Using this information, we can set performance targets that we need to reach in order to perform well on a given platform. Setting performance targets based on known data removes the guesswork of knowing when an optimization is reaching the point of diminishing returns, or when it's at least "good enough."

We were fortunate that AoK and TC share so much of the same code. This allowed us to use the shipping performance of AoK as the baseline performance target for TC. However, if you're working on a new game that doesn't follow on from an existing work, it's still possible to create test cases to establish performance targets for your game.

Optimizing Attila

In the case of TC, the data presented comes specifically from analyzing the performance of a new single player campaign based on Attila the Hun. The problem is that one of the scenarios only runs at a few frames per second on the minimum platform. The Attila Session 1 numbers show the CPU utilization (Figure 1a) and process utilization (Figure 2a) of the first performance test on Attila. Intel Vtune and profiling instrumentation were used to collect this data.

Examining the CPU utilization by itself (Figure 1a), it's not apparent where the slowdown actually occurs. By comparing the CPU utilization of the Attila scenario with data from a comparable scenario from AoK, it's obvious that TC needs to use about 20% less CPU on this scenario. This will give the virtual memory manager (VMM) time to run.

The reason the VMM needs to have a specific portion of the CPU is that AoK (and TC) trade CPU cycles for physical memory on the minimum platform. This trade, however, requires CPU time to run the VMM so that it can move data between physical and virtual memory. Since AoK/TC are typically memory bound, not CPU bound, this is an effective tradeoff on the minimum platform.

Figure 1. CPU Utilization by Process (Pentium 166, 32MB).

 

Figure 2. TC Process Breakdown (Pentium 166, 32MB).

To find the actual cause of the slowdown in Attila, we need to look at the breakdown of functionality within TC (Figure 2a). The first thing that's noticeable is that TC spends upwards of 40% of it's time in pathing code.

Increased pathing time could be a symptom of several problems: the pathing code is slow, a large number of units attempting to path, or units trying to path to an inaccessible location. Knowing that few changes were made to pathing code for TC ruled out slow pathing performance. Using statistics taken from AoK on a similar scenario, we can also see that we're executing pathing nearly four times as much as we would expect to be (Figure 2c).

With the pathing code ruled out, the next step is to watch the scenario to determine how many units are in it, and where they are going. By looking at the scenario, it became obvious that there were a number of units attempting to path to inaccessible locations. This was causing every unit that attempted to path to repeatedly try and fail to path to their desired location. After making the desired location accessible, the scenario was re-tested (Figure 2b). This change decreased pathing execution by nearly half.

Solving the pathing problem both reduced the amount of time TC spent in pathing code, and the amount of time the CPU spent executing TC (Figure 1b). Looking again to the performance targets, we can see that while we made an improvement, we still need to reduce TC's CPU utilization by nearly 20%, and still need to cut pathing by another 50%.

 

The next step will be to work with the designers to balance their goals for the scenario with the performance requirements for the minimum platform. This may be through either reducing the number of units, or creating special code in TC to handle the pathing situation that arises in this scenario.

The Seven Deadly Performance Sins

All the performance problems AoK encountered fell into one or more of seven general categories. These problems ranged from executing dead code to inefficient code and they can affect any game. Let's take a look at these categories.

1. Executing dead or superfluous code. Over the course of a long development cycle, a lot of code-based functionality is created, changed, and/or discarded. Sometimes discarded or superceded functionality is not removed from the game and continues to be executed. While it's a waste of effort to optimize code that should be removed in the first place, it can be difficult to determine whether a few lines of code, a function, or an entire subsystem is going unused.

One feature we had envisioned for AoK was renewable resources, so natural resources such as trees would increase over time if they weren't depleted. After play-testing the game, we found that this feature would often cause a game to last indefinitely, so we eliminated it. Later, when profiling game performance, we discovered that not all of the code had been removed -- the code that controlled tree regrowth appeared at the top of our profiler's function list, and we quickly removed it.

Unfortunately, superfluous code is not always so easily found, and often it's only when the code gets executed enough that you spot it on a profiling list. Such was the case with another problem also related to the trees in our game.

In our derived unit hierarchy of classes (described in last month's article), we easily added new units to the game by deriving new classes in the hierarchy. This hierarchy also is powerful in that functionality can be added or changed in a single place in the code to affect many different game units. One such change inadvertently added line-of-sight checking for trees, which is unnecessary since trees are not player-controlled. This was not an obvious performance problem and it was found only through logging data and stepping through code while trying to make the line-of-sight code faster.

2. Executing code too much. Trees, wall segments, and houses were often indicators of general performance issues in AoK, given the large amount of them on maps -- some AoK maps contain more than 15,000 trees. In order to process these units quickly, we created shortcuts in various derived functions within the unit hierarchy to avoid unnecessary unit processing. This became very complicated in some circumstances, since the computer player uses walls and houses as part of their line of sight. If it weren't for the differences between the way computer and human players used these units, the wall and house special processing would have been simpler. But the player's ability to use the buildings to scan for enemies made our AI processing simpler and more effective.

Pathing was another system that we spent a lot of time optimizing so that it wouldn't execute for too long. To do this, we capped the number of times the pathfinding system could be executed to a fixed number of iterations per unit per game update. When trying to optimize a pathing system by capping its execution, you have to balance the desire to limit CPU usage with the desire to not make players think the units exhibit dumb behavior when instructed to move or attack. This forced us to tweak the game a great deal to achieve the right balance between playability and speed, but that's often the trade-off you face when optimizing a game.

We tried a variety of caps to optimize the pathfinding system, and it was determined that at five or more pathing attempts, units attempting to retarget were the most responsive to the player. Five attempts were too many for the minimum platform, and we decided that two pathing attempts were too few based on the results of play-testing. We ultimately decided to cap the number of pathing attempts at three, once again based on our desire to balance playability with usability.

Briton Dark Age. Western European building set.
Viking Dark Age. Eastern European building set.
Castle Age. Western European building set.
Castle Age. Eastern European buidling set.
Post-Imperial Age towns. Western European building set.
Post-Imperial Age towns. Eastern European building set.

We also placed execution caps on other systems to improve performance. These included the number of pathing attempts made by a player's units, the amount of time the computer player could spend thinking during each game update, and the number of targets a unit could look for when retargeting.

3. Using inappropriate algorithms. While the pathing system in Age of Empires was a good general purpose system, it broke down in some specific circumstances (as discussed last month). Also, there were new performance issues raised by AoK, including a larger number of units and larger maps to path across.

We could have continued to attempt to optimize the single-pathing system, but it was obvious from the work performed on AoE that enough requirements had changed so that the algorithm could no longer stand on its own. What had been a good algorithm for AoE had become an inappropriate algorithm for AoK due to new and changing pathing requirements.

The AoE pathing system was used to path units from one general area to another over short distances in AoK. New pathing systems were added to path units quickly across the game map and to path units accurately within short distances. Also, as part of the new pathing system, a new unit obstruction manager (see Pottin-

ger in the For More Information section) was added for detecting unit collisions during pathing.

4. Encountering exceptional data. Built for efficiency from the start, the unit obstruction manager surprised us when it was identified by our performance profilers as one of the top problems. After reviewing the code to look for obvious (or not obvious) problems, we added instrumentation code that catalogued how units and their locations were stored within the quadtree.

With this logging code in place, we quickly saw that the majority of units placed in the quadtree ended up being not in the leaf nodes, but higher up in the quadtree branches. We also discovered that units touching the edge of a tile were interpreted as spanning two tiles, which caused performance problems. By bumping units back onto the proper tiles, we immediately saw a 300 percent performance boost in obstruction manager performance.

This code, as is most code, was written based on assumptions about the data. Programmers assume that the data processed by a function is of a certain type and will fall within certain limits or into certain sets. When data fell outside these expectations, our algorithm -- which would otherwise have performed well -- was identified as a performance problem.

Some sections of the game were instrumented from the very outset of development to help diagnose data processing problems that arose frequently in those sections of code. The unit AI, for instance, contained conditional #define statements to log approximately 50 different sets of performance information. These performance monitors could be used alone or in various combinations to help resolve performance issues related to data processing.


Article Start Page 1 of 3 Next

Related Jobs

University of Exeter
University of Exeter — Exeter, England, United Kingdom
[10.18.19]

Serious Games Developer
Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[10.17.19]

Experienced Game Developer
Wargaming Sydney
Wargaming Sydney — Sydney, New South Wales, Australia
[10.17.19]

Senior C++ Engineer (Gameplay Programmer)
Wargaming Sydney
Wargaming Sydney — Sydney, New South Wales, Australia
[10.17.19]

Gameplay Programmer, C++ - Vehicle Physics





Loading Comments

loader image