A few key lessons were learned in the development of the networking for Age of Empires that are applicable to development of any game's multiplayer system.
Know your user. Studying the user is key to understanding what their expectations are for multiplayer performance, perceived lag, and command latency. Each game genre is different, and you need to understand what is right for your specific gameplay and controls.
Early in the development process Mark sat down with the lead designer and prototyped communications latency (this was something that was revisited throughout the development process). Since the single-player game was running, it was easy to simulate different ranges of command latency and get player feedback on when it felt right, sluggish, jerky, or just horrible.
For RTS games, 250 milliseconds of command latency was not even noticed -- between 250 and 500 msec was very playable, and beyond 500 it started to be noticeable. It was also interesting to note that players developed a "game pace" and a mental expectation of the lag between when they clicked and when their unit responded. A consistent slower response was better than alternating between fast and slow command latency (say between 80 and 500 msec) -- in that case a consistent 500 msec command latency was playable, but one that varied was considered "jerky" and hard to use.
In real terms this directed a lot of the programming efforts at smoothness -- it was better to pick a longer turn length and be certain that everything stayed smooth and consistent than to run as quickly as possible with occasional slow-downs. Any changes to speed had to be gradual and in as small increments as possible.
We also metered the users demands on the system -- they would typically issue commands (move, attack, chop trees) averaging about every 1.5 to 2 seconds, with occasional spikes of 3 to 4 commands per second during heated battles. Since our game built to crescendos of frantic activity the heaviest communications demands were middle and late game.
When you take the time to study your user behavior you'll notice other things about how they play the game that can help your network play. In AoE, clicking repeatedly when the users were excitedly attacking (clik-lik-lik-lik-lik -- go go go ) was causing huge spikes in the number of commands issued per second -- and if they were pathing a large group of units -- huge spikes in the network demand as well. A simple filter to discard repeat commands at the same location drastically reduced the impact of this behavior.
In summary, goals of user observation will let you:
Metering is king. You will discover surprising things about how your communications system is working if you put in metering early, make it readable by testers, and use it to understand what is happening under the hood of your networking engine.
Lesson: Some of the problems with AoE communication happened when Mark took the metering out too early, and did not re-verify message (length and frequency) levels after the final code was in. Undetected things like occasional AI race conditions, difficult-to-compute paths, and poorly structured command packets could cause huge performance problems in an otherwise well tuned system.
Have your system notify testers and developers when it seems like it is exceeding boundary conditions -- programmers and testers will notice during development which tasks are stressing the system and let you know early enough to do something about it.
Take the time to educate your testers in how your communications system works, and expose and explain the summary metering to them -- you might be surprised what things they notice when the networking code inevitably encounters strange failures.
In summary, your metering should:
Educating the developers. Getting programmers who are used to developing single-player applications to start thinking about a detachment between the command being issued, received, and being processed is tricky. It is easy to forget that you are requesting something that might not happen, or might happen seconds after you originally issue the command. Commands have to be checked for validity both on send and receive.
With the synchronous model, programmers also had to be aware that the code must not depend on any local factor (such as having free time, special hardware, or different settings) when it was in the simulation. The code path taken on all machines must match. For example having random terrain sounds inside the simulation would cause the games to behave differently (saving and re-seeding the pseudo-random number generator with the last random number took care of things inside the simulation that we needed to be random but not change the simulation.
Other lessons. This should be common sense -- but If you depend on a third-party network (in our case DirectPlay), write an independent test application to verify that when they say "guaranteed delivery" that the messages get there, that "guaranteed packet order" truly is, and that the product does not have hidden bottlenecks or strange behaviors handling the communications for your game.
Be prepared to create simulation applications and stress test simulators. We ended up with three different minimal test applications, all to isolate and highlight problems like connection flooding, problems with simultaneous matchmaking connects, and dropped guaranteed packets.
Test with modems (and, if you are lucky, modem simulators) as early as possible in the process; continue to include modem testing (as painful as it is) throughout the development process. Because it is hard to isolate problems (is that sudden performance drop because of the ISP, the game, the communications software, the modem, the matchmaking service, or the other end?) and users really don't want to hassle with flaky dialup connections when they have been zipping along at instant-connection LAN speeds. It is vital that you assure testing is done on modem connections with the same zeal as the LAN multiplayer games.
In Age of Empires 2: The Age of Kings, we added new multiplayer features such as recorded games, file transfer, and persistent stat tracking on The Zone. We also refined the multiplayer systems such as DirectPlay integration and speed control to address bugs and performance issues that had come up since the release of Age of Empires.
The game recording feature was one of those things that you just happen to stumble upon as an "I could really use this for debugging" task that ends up as a full-blown game feature. Recorded games are incredibly popular with the fan sites as it allows gamers to trade and analyze strategies, view famous battles, and review the games they played in. As a debugging tool, recorded games are invaluable. Because our simulation is deterministic, and recorded games are synchronous in the same way that multiplayer is synchronous, a game recording gave us a great way of passing around repro cases for bugs because it was guaranteed to play out the exact same way every time.
Our integration with the matchmaking system on The Zone was limited to straightforward game launching for Age of Empires. In Age of Kings we extended this to allow for launch parameter control and persistent stat reporting. While not a fully inside-out system, we utilized DirectPlay's lobby launch functionality to allow The Zone to control certain aspects of the game settings from the pre-game tables, and "lock" those settings in once the game was actually launched. This allowed users to better find the games they wanted to play in, because they could see the settings at the matchmaking level, rather than waiting to launch into the game setup screen. On the backend we implemented persistent stat reporting and tracking. We provide a common structure to The Zone, which we fill out and upload at the end of a game. The data in this structure is used to populate a number of user ratings and rankings viewable on The Zone's web site.