Lessons
Learned
A few
key lessons were learned in the development of the networking for Age
of Empires that are applicable to development of any game's multiplayer
system.
Know
your user. Studying the user is key to understanding what their
expectations are for multiplayer performance, perceived lag, and command
latency. Each game genre is different, and you need to understand what
is right for your specific gameplay and controls.
Early
in the development process Mark sat down with the lead designer and
prototyped communications latency (this was something that was revisited
throughout the development process). Since the single-player game was
running, it was easy to simulate different ranges of command latency
and get player feedback on when it felt right, sluggish, jerky, or just
horrible.
For RTS
games, 250 milliseconds of command latency was not even noticed -- between
250 and 500 msec was very playable, and beyond 500 it started to be
noticeable. It was also interesting to note that players developed a
"game pace" and a mental expectation of the lag between when
they clicked and when their unit responded. A consistent slower response
was better than alternating between fast and slow command latency (say
between 80 and 500 msec) -- in that case a consistent 500 msec command
latency was playable, but one that varied was considered "jerky"
and hard to use.
In real
terms this directed a lot of the programming efforts at smoothness --
it was better to pick a longer turn length and be certain that everything
stayed smooth and consistent than to run as quickly as possible with
occasional slow-downs. Any changes to speed had to be gradual and in
as small increments as possible.
We also
metered the users demands on the system -- they would typically issue
commands (move, attack, chop trees) averaging about every 1.5 to 2 seconds,
with occasional spikes of 3 to 4 commands per second during heated battles.
Since our game built to crescendos of frantic activity the heaviest
communications demands were middle and late game.
When you
take the time to study your user behavior you'll notice other things
about how they play the game that can help your network play. In AoE,
clicking repeatedly when the users were excitedly attacking (clik-lik-lik-lik-lik
-- go go go ) was causing huge spikes in the number of commands issued
per second -- and if they were pathing a large group of units -- huge
spikes in the network demand as well. A simple filter to discard repeat
commands at the same location drastically reduced the impact of this
behavior.
In summary,
goals of user observation will let you:
-
Know
the latency expectations of the user for your game
-
Prototype
multiplayer aspects of play early
-
Watch
for behavior that hurts multiplayer performance.
Metering
is king. You will discover surprising things about how your
communications system is working if you put in metering early, make
it readable by testers, and use it to understand what is happening under
the hood of your networking engine.
Lesson:
Some of the problems with AoE communication happened when Mark took
the metering out too early, and did not re-verify message (length and
frequency) levels after the final code was in. Undetected things like
occasional AI race conditions, difficult-to-compute paths, and poorly
structured command packets could cause huge performance problems in
an otherwise well tuned system.
Have your
system notify testers and developers when it seems like it is exceeding
boundary conditions -- programmers and testers will notice during development
which tasks are stressing the system and let you know early enough to
do something about it.
Take the
time to educate your testers in how your communications system works,
and expose and explain the summary metering to them -- you might be
surprised what things they notice when the networking code inevitably
encounters strange failures.
In summary,
your metering should:
-
Be
human readable and understandable by testers
-
Reveal
bottlenecks, slowdowns, and problems
-
Be
low impact and kept running all the time.
Educating
the developers. Getting programmers who are used to developing
single-player applications to start thinking about a detachment between
the command being issued, received, and being processed is tricky. It
is easy to forget that you are requesting something that might not happen,
or might happen seconds after you originally issue the command. Commands
have to be checked for validity both on send and receive.
With the
synchronous model, programmers also had to be aware that the code must
not depend on any local factor (such as having free time, special hardware,
or different settings) when it was in the simulation. The code path
taken on all machines must match. For example having random terrain
sounds inside the simulation would cause the games to behave differently
(saving and re-seeding the pseudo-random number generator with the last
random number took care of things inside the simulation that we needed
to be random but not change the simulation.
Other
lessons. This should be common sense -- but If you depend on
a third-party network (in our case DirectPlay), write an independent
test application to verify that when they say "guaranteed delivery"
that the messages get there, that "guaranteed packet order"
truly is, and that the product does not have hidden bottlenecks or strange
behaviors handling the communications for your game.
Be prepared
to create simulation applications and stress test simulators. We ended
up with three different minimal test applications, all to isolate and
highlight problems like connection flooding, problems with simultaneous
matchmaking connects, and dropped guaranteed packets.
Test with
modems (and, if you are lucky, modem simulators) as early as possible
in the process; continue to include modem testing (as painful as it
is) throughout the development process. Because it is hard to isolate
problems (is that sudden performance drop because of the ISP, the game,
the communications software, the modem, the matchmaking service, or
the other end?) and users really don't want to hassle with flaky dialup
connections when they have been zipping along at instant-connection
LAN speeds. It is vital that you assure testing is done on modem connections
with the same zeal as the LAN multiplayer games.
Improvements
for Age of Empires 2
In Age
of Empires 2: The Age of Kings, we added new multiplayer features
such as recorded games, file transfer, and persistent stat tracking
on The Zone. We also refined the multiplayer systems such as DirectPlay
integration and speed control to address bugs and performance issues
that had come up since the release of Age of Empires.
The game
recording feature was one of those things that you just happen to stumble
upon as an "I could really use this for debugging" task that
ends up as a full-blown game feature. Recorded games are incredibly
popular with the fan sites as it allows gamers to trade and analyze
strategies, view famous battles, and review the games they played in.
As a debugging tool, recorded games are invaluable. Because our simulation
is deterministic, and recorded games are synchronous in the same way
that multiplayer is synchronous, a game recording gave us a great way
of passing around repro cases for bugs because it was guaranteed to
play out the exact same way every time.
Our integration
with the matchmaking system on The Zone was limited to straightforward
game launching for Age of Empires. In Age of Kings we
extended this to allow for launch parameter control and persistent stat
reporting. While not a fully inside-out system, we utilized DirectPlay's
lobby launch functionality to allow The Zone to control certain aspects
of the game settings from the pre-game tables, and "lock"
those settings in once the game was actually launched. This allowed
users to better find the games they wanted to play in, because they
could see the settings at the matchmaking level, rather than waiting
to launch into the game setup screen. On the backend we implemented
persistent stat reporting and tracking. We provide a common structure
to The Zone, which we fill out and upload at the end of a game. The
data in this structure is used to populate a number of user ratings
and rankings viewable on The Zone's web site.
|
1) LAN play with AoE1 with TCP/IP was slooow compared to IPX, why was this?
2) I was playing Rise of Nations with a friend of mine on a LAN, and I believe, having played AoE2 to death, that RoN was heavily based upon AoE2. We won an Axis vs Allies game against 3 AI opponents (we were Germany/Japan) which was absolutely fantastic, and as a way of extending the fun we decided to watch the replay. It was fine at first but at some stage, just prior to the turning of the tide in our favour, history was changed -- my side stopped doing anything (whilst my mate's side carried on) and and the AI pushed... and the Allies won...
What's with that? :)