Half-Life and Team Fortress Networking: Closing the Loop on Scalable Network Gaming Backend Services
May 11, 2000 Page 1 of 3
This article focuses on some of the various back-end services you might wish to provide to your users as part of your game platform. The purpose of the article is to provide a sense for how these services can be designed, how they can be deployed, and hopefully, how you can avoid making incredibly painful mistakes in either their design or deployment.
More and more, having an on-line component to gaming is essential to the success of a title. In addition, as games become more of a consumer entertainment experience, making things easier on gamers becomes essential. The days of needing to do such things as manually type in IP addresses to connect to remote servers are coming rapidly to a close. Therefore, it is important for you to provide a seamless user experience for your gamers as they go on-line. To do this, you will possibly need to provision a variety of backend services. Doing so will dramatically increase the usability of your games, hopefully leading to increased sales and recognition.
This discussion will give you not only a sense of the many different backend services you should consider, but will also provide you with information about how some of these systems can be designed. Although much of the discussion is focused on a few specific kinds of backend services, the examples are relevant for any backend services you might need to deploy.
The main focus for this articlewill be on network messaging that is not directly related to the in-game flow of messages, though some design issues applicable to in-game flows will be covered. At the end, there will be a brief discussion of the PowerPlay industry initiative. PowerPlay is particularly relevant since it addresses Internet infrastructure problems and deployment problems that effect the ability of the Internet to handle both in-game and backend server traffic.
Game Master Server
For any game having servers spread around the Internet and hosted by end-users, it will be important to have a way for your client software to find these servers. While this example fits the client / server model of first person shooter style games quite well, the general rules are applicable to the back-end services that other kinds of games could use. With this in mind, an interesting back-end service that you will probably want to deploy is a game master server.
A game master server is used to collect a database of active game servers available on the Internet. Your game client (or external "server browser" applications) can then query this collection when users are searching for on-line games to join. The basic networking requirements for a game master server are:
- Ability to receive initiation / keepalive and termination of services messages from game servers; and
- · Ability to receive queries from game clients and return appropriate server identification information to those clients
Although this basic functionality requirements list is brief, there is a further set of additional services you might also choose to provide in deploying your game master server. For instance, you might want to support the capability to restrict the search with a set of search criteria.
Initiation / keepalive messages can be kept fairly small. The game server simply sends a packet to the master server using an agreed upon protocol. For instance, many games have chosen to signal "out of band" traffic by pre-pending a 32 bit integer negative one (0xffffffff) as a header to each query packet. The simplest initiation message in this scenario is then to send a single byte payload along with the packet header to signify that the game server is online. We used this mechanism in Half-Life. Thus, a packet having a five byte payload is about as small as these packets are going to get.
Because UDP packets also have a 28-byte packet header each, the simplest initiation / keepalive message weighs in at about 33 bytes (28 byte header plus 5 byte payload). Unless you are sending additional data only at startup of a server, it may be convenient to collapse keepalive and initiation messages into the same exact message. Also, rather than sending the server's IP address in the packet payload, we can just look at the IP address from which we received the packet to determine that information. This saves a few bytes and makes it marginally tougher to spoof the "from" address (more on that below). For Half-Life, the initiation and keepalive messages were the same. With this base message size in mind, traffic statistics can be estimated as follows (assuming a fairly popular on-line game having up to about 2500 active game servers reporting to the master server at any one time):
500 active game servers
300 seconds per keepalive
33 bytes per keepalive
= 275 bytes/second load and 8.33 transactions/second load
Using the above data, a game platform supporting this number of servers will create approximately (2500 * 33)/300 bytes/second in traffic to the game master server. This works out to about 275 bytes/second of inbound load on the server in the form of approximately 2500/300 or 8.33 transactions per second. Even a relatively low bandwidth connection at the master server should easily be able to accommodate the traffic coming in from game servers. In addition, termination messages are as simple as the basic keepalive message described so far (moreover, termination messages don't necessarily require sending any additional data even if initiation or keepalive messages do), and their frequency can be assumed to be quite low. Therefore, we'll assume that termination messages probably will not add much to the bandwidth requirements for the game master server.
Purging Servers From the List:
In the real world, game servers are known to crash or become disconnected fairly regularly. Therefore, the game master server must be able to purge non-responsive game servers from its list occasionally. Assuming that game servers send keepalive messages every five minutes, as in the example above, it is probably safe to discard the address of any game server that has not sent a keepalive message to the game master server over the last few multiples of the keepalive interval. For instance, if your keepalive messages come once every five minutes, then you would discard the server from the list if a keepalive message is not received at least once every fifteen minutes. Of course, receiving a termination packet causes the server to be discarded immediately. Because not all servers terminate cleanly, there are often a few non-responsive servers in the list of servers returned to users.
Denial Of Service Attacks on Game Master Server:
the basic game master server just described (which is directly responsive
to small keepalive messages from servers) is open to several straightforward
Denial of Service (DoS) attacks. For instance, the hacking community
is quite adept at spoofing the "from" address field in IP
headers. Thus, the following operations could cause a lot of problems
for a game master server:
while (< we think the gaming master is still alive >)
< create a random from address >
< send a fake "keepalive" message to master >
This kind of attack hurts the game master server by causing it to store a bunch of bogus IP addresses for purported game servers. One worry is that the attack would cause the machine to run out of memory. Assuming that each such record on the master occupies:
- 6 bytes IP address
- 4 bytes pointer to next server; and
- 4 bytes time of last server keepalive (for removing outdated servers)
Fortunately, in this example, at 14 bytes per record, it would be difficult, though not impossible, to force the game master server to die from running out of memory. This is especially true since you will almost certainly want to provision a high-end machine to act as the master server. Also, using this kind of attack, the malicious person might not be able to send enough keepalive messages before the master server starts removing old servers. Then again, the CPU load of iterating through a few hundred thousand servers and comparing timestamps every few seconds could still be a major problem. On the other hand, if your master server stores fairly substantial data about each server, then it is quite possible that the game master server could run out of resources.
The nastier problem with this attack, though, is that it makes the server lists returned to your game clients virtually useless. Not to mention that if the malicious user has added 100,000 or more bogus servers to your master server list that it would mean that every time a user asked for the server list that the master server would try to send them approximately 600Kb (100 K * 6 bytes per server) of data. The master server probably will not be able to serve up that amount of data if there are more than a few users querying it.
Challenge/Response System for Game Master Server
This is obviously a major problem. One solution to this kind of attack is to implement a challenge/response system that servers must go through in order to be listed on the game master server. To implement a basic challenge/response system, the game master server must store, for each potential server to be listed:
- IP address of the server
- the time of the request to send a keepalive; and
- a random number that must be returned to the server to allow the keepalive message
The master server creates this record (or updates the one for the same from IP address if it already exists) and then sends a packet to the game server containing the random number. The game server then must send a keepalive message to the master and must include the random number in that message. If the challenge request came in from a bogus IP address, then the requester would never receive the random number back from the game master server. In addition, if a keepalive message is received and the random number is wrong or the message is out of date (i.e., the challenge/response record is more than a couple of seconds old), then the keepalive message can be easily ignored.
Of course, even this system still has some vulnerability to a DoS attack where the attacker has sufficient bandwidth to occupy all of the "challenge" slots (depending on data structure used and number of slots allowed to be active and the timeout period on such slots) or simply to overwhelm the master server's connection itself, but those kinds of attacks are, at least, fairly traceable. For example, id Software, creators of the popular Quake series of games, experienced such an attack in January 2000. The attackers were able to saturate two full DS3 (T3) lines of 45Mbps capacity. I.e., the attackers must have had access to a better than T3 capacity connection.
The game master server described so far only encompasses the most basic functionality. Additional server specific data can be stored at the master server, especially as a way to streamline the amount of data sent back to users based on queries. This kind of tradeoff of processing and storage for bandwidth can often be a good idea. For instance, if the master server were to store current and maximum players, probably encoded as a byte or short each for most games, then the memory overhead wouldn't go up very much. Nor would the size of the keepalive packets from game servers containing this data. With this data, queries from users could request that the game master server filter out empty or full servers. In this fashion, the size of the server list returned by the master server is reduced, thereby lowering the server's outgoing bandwidth requirements. In addition, by not always requiring the end-user to talk to each game server to discover the number of players, the load placed on your game servers in responding to information queries could also be reduced.
Finally, the design of the master server should be looked at from a network reliability point of view. For example, is there much consequence to having a keepalive message packet dropped? In the Half-Life case, we decided that there probably wasn't much of a consequence.
Master Server Responses to Clients:
The more important part of the game master server is the client query response portion. Unlike traffic from keepalive messages from servers, the load from querying by users can by quite staggering. The main purpose of the game master server is to send each requester a list of the IP addresses of the active or relevant game servers. The request for the list of servers can be as simple as the keepalive message, except that instead of a keepalive code, the client sends a "list servers" code. Estimating the frequency and number of such list requests to the game master server is a little more difficult.
For instance, if we assume a community having 10,000 users simultaneously on-line at peak times and that each such user requests a new list of all servers from the master server every 15 minutes, we can generate some useful load statistics. Based on these numbers, the game master server must respond to about 11.1 such requests per second. If the requests are about 33 bytes each, then that's only 367 bytes or so per second of requests coming into the server. Where things become interesting is when we look at how expensive it is to the server to send out the responses.
Page 1 of 3