Handling Multiple Packet UDP Responses
Assuming again that there are 2500 active game servers, sending back the server list to the client will require 2500 * 6 bytes per server, or approximately 15Kb of data per query request. Now, at 11.1 such requests to service per second, the bandwidth required to service this many queries is about 166.5Kb/sec (1.33Mbps). That data rate would nearly saturate a T1 line. Of course, there is a major flaw in this scenario as well, at least if we are using UDP. UDP packets of 15K in size are not supported. In general, anything above 1450 bytes or so is considered bad form for IP based packets such as UDP packets. Routers or other equipment often discard packets larger than about 1450 bytes. This raises the question of how to solve this problem.
at least two straightforward solutions to this problem. Both end up sending
the server list to the requester in batches. In the first method, the
master server stores off the requester's IP address and over the next
few seconds, sends batches of server IP addresses to the client. Assuming
these packets will be at most 1450 bytes long, there is enough room for
about 240 or so IP addresses per batch. For the example above, this would
require about eleven such batches to communicate all 2500 servers to the
client. How quickly should these packets be sent back to the client? That
depends on two factors, the client's inbound bandwidth capacity and the
master server's outbound bandwidth capacity. How do we know the bandwidth
capacity of the requester? We don't, but perhaps part of the request is
the bandwidth capacity of the requester. Then the spacing of the packets
is easy to figure out. For example, if the requester states that it can
receive up to 2500 bytes per second on it's link, then the master would
determine the time for sending the next request as follows:
Next Packet Time = Current Time + Size of Current Packet / Remote Receiving Rate
Thus, for 1450 byte packets on a link that can handle 2500 bytes / s of data:
Next Packet Time = Current Time + (1450 bytes + 28 byte UDP header) / 2500
= Current Time + .5912 seconds
The update to this client will take place over several seconds. Of course, if the client's link speed is stated incorrectly (especially overstated), the game master server could easily flood out the client's connection. In addition, the requester would also require a mechanism for knowing how long to listen for additional data packets (e.g., embedding "packet # 1 of 6" in the responses, or signaling how many total servers will be forthcoming).
The second alternative is a bit simpler and avoids some of the potential problems noted above because it does not require that the master server remember any state info about the requester and it does not require that the game master server have any knowledge of the requester's link speed. The tradeoff here is that it can take a bit longer to get the full list from the server (depending on the ping from the requester to the game master server). The second method is more or less a batch request-response method. In this method, the master server stores the servers in sequential order (an implementation detail) with some sort of sequence number associated with each server and the requester simply requests the next batch of servers starting with the server after the last sequence numbered server it received. In other words, the requester code is something like this:
int nextbatch )
< Create packet header >
< Add in "list servers" request indicator >
< Add in nextbatch parameter >
Send packet to master server >
RequestBatch( 0 );
As responses are received, either a new batch is requested, or if the "nextbatch" the master server tells us to request is "0", then we know we have received the last batch of servers. Care should be taken to handle the case where the exact requested "batch" number is no longer active on the server (I.e., it timed out between the server response saying it can be requested and the actual request packet coming in to the master server).
< Read nextbatch from response packet >
< Read list of IP addresses from response packet >
( nextbatch != 0 )
// Continue requesting
RequestBatch( nextbatch );
The nice thing about this approach is that it "self-regulates" the bandwidth required. The next batch of server IP addresses is not requested until the previous batch has been successfully received over the link. The down side of the approach is that if the requester has a high-latency connection to the master server, then the round trip time per packet can be a bit higher than if the master server simply transmitted packets as fast as possible as stated in the first scenario above.
With either method of receiving the entire list, data querying can be accommodated. The returned list of servers IP addresses is just culled for servers that don't meet the requested criteria.
TCP vs. UDP:
So far, we've talked about the game master server as using a datagram driven communication model -- UDP. However, for each of your backend services, you will need to decide which networking protocol makes the most sense for its actual use. The main choice you have if you are developing on the Windows Operating System is whether to use TCP/IP or UDP/IP. The general properties of each as follows:
Based on the needs of the game master server described above, UDP is probably the better choice. In particular, the server is designed with more focus on handling the shear volume of list requests and cannot spare the overhead of maintaining sufficient listening sockets for completing query requests that can take upwards of 5 to 10 seconds to complete.
Dealing with Failure
Assuming the game master server has the basic functionality above, dealing with failure cases is where you will spend the next large portion of time coding and debugging. Having a clear idea of how you want things to behave when everything goes wrong is essential.
Handling of the failure to receive responses from the game master server can take several forms. Perhaps missing one of the IP address packets from the first method is no big deal. If so, then assuming at least one such IP address packet has been received, the request can be considered successful. Otherwise, you will need to decide whether the protocol should detect dropping a particular packet ("packet 5 of 6" was never received) and whether the whole series should be re-requested at that point or whether a special query for just the missing packet should be undertaken. Under the request-response model of the second example, if a response from the server is dropped, then the request can be remade shortly thereafter.
For our master server, a failure occurs when either a request packet is dropped or a response packet is dropped. Generally, a packet can be assumed dropped if a response is not received within a specified timeout period. The important thing about timeouts is to avoid race conditions. Race conditions can occur when:
The best way to avoid this is to grow (doubling, for example) the timeout period with each retransmission of the request. After a few such resends, if no response is received, then the requester is either experiencing a ton of packet loss between his or her machine and the game master server or the game master server has gone off-line for some reason.
Multiple Master Servers
If the game master server goes off-line and you do not have another one available, then your system has failed catastrophically. Something to consider is at least having redundant game master servers positioned on the East and West coast of the United States (and possibly Europe, Australia, or Japan) as it is quite common for the main East-West Internet backbone links to fail for short periods of time.
If you deploy multiple game master servers, then to avoid the above failure, your client must know how to talk to each available master server. If communication with the first one fails, you can try and talk to the next game master server, and so on.
There is a caveat with having multiple master servers, especially if your goal is to distribute workload between them. If you hard code all of your clients automatically to query only one of the master servers, instead of randomly distributing the requests, then it is likely that that server will be overworked while the other / failover servers will be underutilized. Instead, you should consider scrambling the list of game master servers that the requester will contact so that the load on the servers is evenly distributed. The only mitigating issue here is concerns about having requesters talk to the "closest" game master server (either geographically or by number of hops) so that the latency and packet loss issues are reduced.