Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
The Internet Sucks: Or, What I Learned Coding X-Wing vs. TIE Fighter
arrowPress Releases
May 27, 2019
Games Press
View All     RSS








If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

The Internet Sucks: Or, What I Learned Coding X-Wing vs. TIE Fighter


September 3, 1999 Article Start Previous Page 5 of 6 Next
 

Lessons Learned (The Internet Sucks)

First lesson: If all players dial into the same phone number, you are not testing the Internet. You are testing the modems and the POP server, but you are not testing the Internet. It’s obvious when you think about it. Your packets go over the modem to the POP server, and it sends them right back out to the other player. The packets never get past the POP server.

When we finally tried our game on some real network connections, it would fail within seconds. We were mystified. It worked great on the LAN, even with 500ms of artificial latency. When we ran some diagnostics we discovered that we were seeing some simply unbelievable latencies. 5 and 10 seconds was frequent, and we saw some as long as 50 seconds! Our game would simply fall apart under those conditions.

What was actually happening was that a packet would get lost. The TCP protocol specifies that packets will always be delivered, and furthermore, that they will always be delivered in order. TCP uses a system of acknowledgements to verify that packets are successfully delivered, and will re-send packets if they are lost in transmission. The "in order" specification means that if a packet must be re-sent, the packets that follow it are delayed until the lost packet is received. The problem is that when an Internet connection starts dropping packets, it becomes very likely that the re-sent packet will also get dropped. This means it can take several seconds for a packet to arrive at its destination.

Lesson two: TCP is evil. Don’t use TCP for a game. You would rather spend the rest of your life watching Titanic over and over in a theater full of 13 year old girls. First of all, TCP refuses to deliver any of the other packets in the stream while it waits for the next "in order" packet. This is why we would see latencies in the 5-second range. Second of all, if a packet is having a tough time getting to its destination, TCP will actually stop re-sending it! The theory is that if packets are being dropped that it's due to congestion. Therefore, it is worthless to try re-sending because that will only make the congestion worse. So TCP will actually stop sending packets, and start sending occasional little test packets. When the test packets start to get through reliably, TCP will gradually start sending real packets again. This "slow re-start" algorithm explains why we would see latencies in the 50-second range.

Lesson three: Use UDP. The solution to this evil protocol seems simple at first. Don’t use TCP, use UDP instead. Unlike TCP, UDP is an unreliable protocol. It does nothing to guarantee that a packet is delivered, and it does nothing to guarantee that a packet is delivered in order. In other words, it does nothing. So if you really need a packet to be delivered, you need to handle the re-sending and acknowledgements. There is one other extremely annoying thing about UDP. Modem connections are made using a protocol called PPP. When you end TCP packets over a PPP connection, it does some very clever compression of the Internet header data, reducing it from 22 bytes to 3 bytes (or less). When you send UDP packets over a PPP connection it does not perform this clever compression and sends the entire 22-byte header over the modem. So if you are using UDP, you shouldn’t send small packets.

Of course, our network system absolutely requires that every packet be delivered. If TCP actually worked, this would not be a problem. But TCP is hopelessly broken, so we had to write our own protocol to handle acknowledgements and re-sends. Unfortunately, we didn’t realize that right away, and it took us awhile to get there.

Our first step was to switch from TCP to UDP. This was as simple as passing a flag to DirectPlay. Of course, now the game would fail miserably as soon as the first packet was dropped. So, we implemented a simple re-sending mechanism to handle the dropped packets. This seemed to work a little better, but occasionally things would go horribly wrong exactly as they had before. Our first guess was that DirectPlay was actually ignoring the flag and using TCP anyway. But our diagnostics quickly showed us that the problem was even more evil than Microsoft: it was the Internet.

Lesson four: UDP is better than TCP, but it still sucks. We expected packets to be dropped occasionally, but the Internet is much worse than that. It turned out that on some connections, about every fifth packet we sent would just disappear into the Ethernet. When they say UDP is unreliable, they aren’t kidding! Our simple re-sending mechanism just didn’t perform well enough under these conditions. It was quite common for a re-sent packet to be dropped, and we saw several cases where the original packet and 4 or 5 re-sends of that packet would all be dropped. We were re-sending so many packets, we were starting to exceed the bandwidth of the modem, and then the latency would start to climb, and all hell would break loose.

Our solution was simple and surprisingly effective. Every packet would send copy of the last packet. This way if a packet were dropped, a copy of it would arrive with the next packet, and we could continue on our merry way. This would require nearly twice as much bandwidth, but fortunately our system required so little bandwidth that this was acceptable. This would only fail if two consecutive packets were dropped, and this seemed unlikely. If it did happen, then we would fall back on the re-sending code.

This seemed to work pretty well! We finally had the game working on the Internet! Sure the Internet had turned out to be far worse than we had thought, but we could deal with it.

Lesson five: Whenever you think the Internet can’t get any worse, it gets worse. More extensive testing showed that we still had some serious problems. Apparently we had some kind of bug in our re-sending code, because it seemed that occasionally players would just lose their connection and nothing would get through. After spending endless hours trying to find the bug in our code, we finally realized that our code was fine, it was the Internet that was broken!

It turns out that sometimes the Internet gets so bad, that practically no packets get through at all! We documented periods of 10 and even 20 seconds during which only 3 or 4 packets would be delivered. No wonder TCP decides to just give up! How can you possibly play a game under conditions like that? We had a major problem on our hands. This "lost connection" phenomenon was something we just weren’t prepared to deal with.

Fortunately, this condition is usually pretty short, on the order of a few seconds. We managed to get our code to handle that by just tweaking the re-sending code. The player who is suffering this condition will frequently have their game stopped while we wait for the connection to clear, but once the condition passes, they can resume playing.

Unfortunately, this "lost connection" condition can last pretty long, and when that happens, we just can’t handle it, and we end up having to disconnect that player from the game. This isn’t really a solution, but at least it meant one bad connection wouldn’t ruin everyone’s game.

One of the last refinements we made to the game to deal with the Internet involved dealing with the inaccuracy of the predicted world. Since latencies could be very long, we need a way to deal with the inaccuracy of the predicted world.

Our first clue that we had to address this issue was the result of implementing what we thought would be an improvement. We realized that if any one player had trouble getting their data to the host computer, then every player would suffer because the host would not send out the compiled data packets until it had received data from every player. We decided that if a player failed to get their data to the host within a reasonable amount of time, then we would simply drop that data and send out the compiled packet without it.

If you follow through the consequences of that action you will realize that it creates a very evil situation. Players normally predict the position of their own craft with perfect accuracy. After all, they know exactly what they have done, so they know exactly where they should be. But if the host drops their input from the "official" version of the world which is the basis of their predicted version of the world, then they will actually have to change their own position if they are going to stay in sync with the other players. The visual result of changing the local player’s position is that the position of everything in the world, including the star-field, will change position.

This effect, dubbed "star-field warping," is extremely disconcerting, and makes the game practically unplayable. We eventually compromised by only dropping a player’s data if it was extremely late, which made this event fairly rare. However, in hindsight it might have been better to use the same solution we eventually implemented for the other players.

This instantaneous jump in position, or "warp", will always occur for the other players, since their position is always incorrectly predicted. If latency is fairly low (less than 200ms) this jumping is not very noticeable, but as latency increases the inaccuracy of the predicted world increases, and this "warping" effect becomes more noticeable.

To address this problem we implemented a "smoothing" effect. The smoothing algorithm keeps track of our last prediction of each player’s position. It then takes the current prediction and moves it closer to the last prediction. This effectively smoothes out the motion of the other player’s craft, and it looks much better, even though it is probably less accurate.


Article Start Previous Page 5 of 6 Next

Related Jobs

Gear Inc.
Gear Inc. — Hanoi, Vietnam
[05.25.19]

Technical Director
Dream Harvest
Dream Harvest — Brighton, England, United Kingdom
[05.25.19]

Technical Game Designer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.24.19]

Senior Animation Programmer
Ubisoft RedLynx
Ubisoft RedLynx — Helsinki, Finland
[05.24.19]

Senior/Lead Graphics Programmer





Loading Comments

loader image