|
Last
month we discussed some of the performance issues facing Age Of Empires
II: The Age Of Kings (AoK). I described some of the tools that we
used at Ensemble to collect that data, including Intel's VTune, NuMega's
TrueTime, and our own profiling code. In this concluding article, I'll
describe how to improve performance by effectively using performance
tools and instrumentation. We'll also look at general types of problems
we encountered as we optimized AoK which can affect any game. Then we'll
wrap things up by taking a look at the last bastion of getting a game
to run on the minimum platform when all else fails: scalable features.
Performance
Targets
In the first part of this article, we discussed the CPU and process
utilization of AoK using Intel VTune and our own profiling code. However,
the data and graphs presented only showed half the story - the culmination
of optimizing AoK.
Luckily,
depending on your point of view, we're in the middle of optimizing an
expansion pack to AoK called Age of Empires II: The Conquerors
(TC). In The Conquerors, we're again facing the same problems
of raising performance to an acceptable level on the minimum platform:
a 166 MHz Pentium computer with 32 MB of memory.
Having
already gone through the performance process for AoK, we know rough
CPU and process utilization breakdowns for the minimum platform (Figure
1c and 2c). Using this information, we can set performance targets that
we need to reach in order to perform well on a given platform. Setting
performance targets based on known data removes the guesswork of knowing
when an optimization is reaching the point of diminishing returns, or
when it's at least "good enough."
We were
fortunate that AoK and TC share so much of the same code. This allowed
us to use the shipping performance of AoK as the baseline performance
target for TC. However, if you're working on a new game that doesn't
follow on from an existing work, it's still possible to create test
cases to establish performance targets for your game.
Optimizing Attila
In the
case of TC, the data presented comes specifically from analyzing the
performance of a new single player campaign based on Attila the Hun.
The problem is that one of the scenarios only runs at a few frames per
second on the minimum platform. The Attila Session 1 numbers show the
CPU utilization (Figure 1a) and process utilization (Figure 2a) of the
first performance test on Attila. Intel Vtune and profiling instrumentation
were used to collect this data.
Examining
the CPU utilization by itself (Figure 1a), it's not apparent where the
slowdown actually occurs. By comparing the CPU utilization of the Attila
scenario with data from a comparable scenario from AoK, it's obvious
that TC needs to use about 20% less CPU on this scenario. This will
give the virtual memory manager (VMM) time to run.
The reason
the VMM needs to have a specific portion of the CPU is that AoK (and
TC) trade CPU cycles for physical memory on the minimum platform. This
trade, however, requires CPU time to run the VMM so that it can move
data between physical and virtual memory. Since AoK/TC are typically
memory bound, not CPU bound, this is an effective tradeoff on the minimum
platform.
 |
|
Figure
1. CPU Utilization by Process (Pentium 166, 32MB).
|
 |
|
Figure
2. TC Process Breakdown (Pentium 166, 32MB).
|
To find
the actual cause of the slowdown in Attila, we need to look at the breakdown
of functionality within TC (Figure 2a). The first thing that's noticeable
is that TC spends upwards of 40% of it's time in pathing code.
Increased
pathing time could be a symptom of several problems: the pathing code
is slow, a large number of units attempting to path, or units trying
to path to an inaccessible location. Knowing that few changes were made
to pathing code for TC ruled out slow pathing performance. Using statistics
taken from AoK on a similar scenario, we can also see that we're executing
pathing nearly four times as much as we would expect to be (Figure 2c).
With the
pathing code ruled out, the next step is to watch the scenario to determine
how many units are in it, and where they are going. By looking at the
scenario, it became obvious that there were a number of units attempting
to path to inaccessible locations. This was causing every unit that
attempted to path to repeatedly try and fail to path to their desired
location. After making the desired location accessible, the scenario
was re-tested (Figure 2b). This change decreased pathing execution by
nearly half.
Solving
the pathing problem both reduced the amount of time TC spent in pathing
code, and the amount of time the CPU spent executing TC (Figure 1b).
Looking again to the performance targets, we can see that while we made
an improvement, we still need to reduce TC's CPU utilization by nearly
20%, and still need to cut pathing by another 50%.
The next
step will be to work with the designers to balance their goals for the
scenario with the performance requirements for the minimum platform.
This may be through either reducing the number of units, or creating
special code in TC to handle the pathing situation that arises in this
scenario.
The
Seven Deadly Performance Sins
All the
performance problems AoK encountered fell into one or more of seven
general categories. These problems ranged from executing dead code to
inefficient code and they can affect any game. Let's take a look at
these categories.
1.
Executing dead or superfluous code. Over the course of a long development
cycle, a lot of code-based functionality is created, changed, and/or
discarded. Sometimes discarded or superceded functionality is not removed
from the game and continues to be executed. While it's a waste of effort
to optimize code that should be removed in the first place, it can be
difficult to determine whether a few lines of code, a function, or an
entire subsystem is going unused.
One feature
we had envisioned for AoK was renewable resources, so natural resources
such as trees would increase over time if they weren't depleted. After
play-testing the game, we found that this feature would often cause
a game to last indefinitely, so we eliminated it. Later, when profiling
game performance, we discovered that not all of the code had been removed
-- the code that controlled tree regrowth appeared at the top of our
profiler's function list, and we quickly removed it.
Unfortunately,
superfluous code is not always so easily found, and often it's only
when the code gets executed enough that you spot it on a profiling list.
Such was the case with another problem also related to the trees in
our game.
In our
derived unit hierarchy of classes (described in last month's article),
we easily added new units to the game by deriving new classes in the
hierarchy. This hierarchy also is powerful in that functionality can
be added or changed in a single place in the code to affect many different
game units. One such change inadvertently added line-of-sight checking
for trees, which is unnecessary since trees are not player-controlled.
This was not an obvious performance problem and it was found only through
logging data and stepping through code while trying to make the line-of-sight
code faster.
2.
Executing code too much. Trees, wall segments, and houses were often
indicators of general performance issues in AoK, given the large amount
of them on maps -- some AoK maps contain more than 15,000 trees. In
order to process these units quickly, we created shortcuts in various
derived functions within the unit hierarchy to avoid unnecessary unit
processing. This became very complicated in some circumstances, since
the computer player uses walls and houses as part of their line of sight.
If it weren't for the differences between the way computer and human
players used these units, the wall and house special processing would
have been simpler. But the player's ability to use the buildings to
scan for enemies made our AI processing simpler and more effective.
Pathing
was another system that we spent a lot of time optimizing so that it
wouldn't execute for too long. To do this, we capped the number of times
the pathfinding system could be executed to a fixed number of iterations
per unit per game update. When trying to optimize a pathing system by
capping its execution, you have to balance the desire to limit CPU usage
with the desire to not make players think the units exhibit dumb behavior
when instructed to move or attack. This forced us to tweak the game
a great deal to achieve the right balance between playability and speed,
but that's often the trade-off you face when optimizing a game.
We tried
a variety of caps to optimize the pathfinding system, and it was determined
that at five or more pathing attempts, units attempting to retarget
were the most responsive to the player. Five attempts were too many
for the minimum platform, and we decided that two pathing attempts were
too few based on the results of play-testing. We ultimately decided
to cap the number of pathing attempts at three, once again based on
our desire to balance playability with usability.
 |
 |
|
Briton
Dark Age. Western European building set.
|
Viking
Dark Age. Eastern European building set.
|
 |
 |
|
Castle
Age. Western European building set.
|
Castle
Age. Eastern European buidling set.
|
 |
 |
|
Post-Imperial
Age towns. Western European building set.
|
Post-Imperial
Age towns. Eastern European building set. |
We also
placed execution caps on other systems to improve performance. These
included the number of pathing attempts made by a player's units, the
amount of time the computer player could spend thinking during each
game update, and the number of targets a unit could look for when retargeting.
3.
Using inappropriate algorithms. While the pathing system in Age
of Empires was a good general purpose system, it broke down in some
specific circumstances (as discussed last month). Also, there were new
performance issues raised by AoK, including a larger number of units
and larger maps to path across.
We could
have continued to attempt to optimize the single-pathing system, but
it was obvious from the work performed on AoE that enough requirements
had changed so that the algorithm could no longer stand on its own.
What had been a good algorithm for AoE had become an inappropriate algorithm
for AoK due to new and changing pathing requirements.
The AoE
pathing system was used to path units from one general area to another
over short distances in AoK. New pathing systems were added to path
units quickly across the game map and to path units accurately within
short distances. Also, as part of the new pathing system, a new unit
obstruction manager (see Pottin-
ger in
the For More Information section) was added for detecting unit collisions
during pathing.
4.
Encountering exceptional data. Built for efficiency from the start,
the unit obstruction manager surprised us when it was identified by
our performance profilers as one of the top problems. After reviewing
the code to look for obvious (or not obvious) problems, we added instrumentation
code that catalogued how units and their locations were stored within
the quadtree.
With this
logging code in place, we quickly saw that the majority of units placed
in the quadtree ended up being not in the leaf nodes, but higher up
in the quadtree branches. We also discovered that units touching the
edge of a tile were interpreted as spanning two tiles, which caused
performance problems. By bumping units back onto the proper tiles, we
immediately saw a 300 percent performance boost in obstruction manager
performance.
This code,
as is most code, was written based on assumptions about the data. Programmers
assume that the data processed by a function is of a certain type and
will fall within certain limits or into certain sets. When data fell
outside these expectations, our algorithm -- which would otherwise have
performed well -- was identified as a performance problem.
Some sections
of the game were instrumented from the very outset of development to
help diagnose data processing problems that arose frequently in those
sections of code. The unit AI, for instance, contained conditional #define
statements to log approximately 50 different sets of performance information.
These performance monitors could be used alone or in various combinations
to help resolve performance issues related to data processing.
|