|
Resource
Guide

Of Internet Servers and SQL Databases:
Designing the Backend for Power and Performance
As online games proliferate, a growing number of Internet
servers are being created and deployed by game developers. These servers
must deal with massive quantities of data generated by the games that
connect to them. Off-the-shelf SQL database software seems like the ideal
way to handle all this data, but which RDBMS system is right for your
project, how will it impact server performance, and how should the data
be structured and used? This session will answer these questions and many
others. It is intended to give programmers a clear and practical understanding
of the things databases do well and the things they do horribly. Using
a database without this knowledge is a virtual guarantee of lousy server
performance. Topics include server architecture, ODBC vs. proprietary
connection APIs, keeping the query optimizer from killing you, and everything
you never wanted to know about designing tables, indexes, and stored procedures
to make your servers fly.
Background: the ITNet Network
The paper that follows grew out of my experiences of the last three years
during which I served as architect, lead programmer, and project manager
for Incredible Technologies Internet gaming network ITNet. ITNet is a
gaming network unlike any other: it provides Internet gaming services
to the company's flagship arcade product – the blockbuster coin-op
golf game Golden Tee Fore! Instead of interconnecting individual players,
ITNet links public gaming devices that are themselves accessed by hundreds
of individual players. This along with the phenomenal success of Golden
Tee Fore means that the ITNet system handles a truly staggering amount
of data: over 20,000 online machines worldwide dump 150,000 individual
transactions per day into the system. ITNet provides service around the
clock to games in 7 foreign countries and the United States, as well as
tracking individual game play statistics for over 700,000 registered players
worldwide. In addition to player statistics, ITNet manages the exchange
of complete financial and game play statistics, multiple national and
international tournament leader boards, user-defined option settings,
collection and diagnostic data, recorded “shadow games” for
multi-machine play, operator registration information, card purchase data,
detailed course statistics, code patches, advertising graphics and promotional
contests, as well as allowing for the sending, receiving, and deleting
of any file on the remote machine's hard drive. In addition, plans are
already underway to add two more arcade products to the ITNet network.
Since I will be constantly referring to ITNet throughout the remainder
of this paper, this description is intended to give you some idea of the
capabilities of the system I will be discussing.
Do You Need a SQL Database?
If you are in the process of designing or building an Internet game server
of any size, you have probably wondered whether you need a database. Considering
the amount of data that a modern Internet server must deal with, integrating
an enterprise-grade SQL database into your server backend becomes a more
and more attractive option all the time. Before you go rushing off to
buy that shiny copy of Oracle 9i, though, let's pause a moment and clarify
when a SQL database is really necessary.
There are basically three options you can take with respect to databases.
You can skip the database entirely and create a “stateless”
server, you can roll your own database solution, or you can buy database
software off the shelf and integrate it with your server. For the most
part, I recommend you go with the first or third options. If your project
is particularly small and simple, you may be able to get away without
any database at all. This type of architecture can work well for a simple
matchmaking and chat server that won't be tracking information about individual
players. If you can get away with building a stateless server, I highly
recommend you give it a try. The Internet server I created for Incredible
Technologies Golden Tee PC Golf game used this architecture, and it's
one of the most trouble-free server systems I've ever seen. It quite literally
requires attention once or twice a year.
If, on the other hand, your game is going to require some amount of state
data, then I recommend you go with an off the shelf SQL database rather
than a home-brewed solution. Although it might seem like the home-brewed
solution could be made cheaper and faster, you will rapidly discover that
this approach is more trouble than it's worth. The minute you need to
do anything halfway interesting with your data – even something
as simple as accessing it from an external viewing or reporting tool,
you'll wish you'd gone with SQL. In addition to remarkably fast performance,
an enterprise-grade SQL database will store your data in a format that
can be easily accessed and used by a wealth of external tools and systems
– from accounting packages to web servers. Even if you never do
anything other than issue queries interactively, you'll rapidly grow to
love the flexibility and power of SQL. In addition, you'll be able to
hire experienced help from a huge pool of potential applicants thanks
to the ubiquitous use of the SQL standard in business.
Physical Infrastructure
Once you've decided you need a SQL database, setting the system up is
pretty straightforward. Your SQL database will run on its own box and
will be accessed via a fast LAN link by your Internet servers, web server,
development tools, etc. Since you are going to want extremely fast network
speeds between your database and critical Internet server components,
you should carefully choose the physical networking infrastructure that
will connect these machines. A dedicated, high-speed Ethernet backplane
is a good choice here (we use 100 Mbps Ethernet for ITNet).
When purchasing computer hardware, I recommend that you buy the biggest
and buffest box you can afford for your database server. We discovered
after running ITNet for 2 years that the database is almost always the
speed bottleneck when your Internet servers slow down. Our main database
server is currently running on a duel Pentium IV machine with 1 Gig. of
RAM, and 5 of the largest and fastest SCSI hard drives (connected and
mirrored via a RAID array) that we could find.
Internet Server Design
Your servers will be the conduit between your game clients connecting
over the Internet and your database, so your first design questions are:
how should your clients connect to the servers and how should the servers
connect to the database?
Connecting to The Database
There are two basic ways of connecting to a database. You can either use
the Open Database Connectivity (ODBC) API, or connect using your database
vendor's proprietary access API. ODBC is generally considered to be slightly
slower, but has the advantage of being a universal standard on all major
databases. The major advantage of a native API over ODBC is it's purported
speed improvement, and a richer set of database access calls.
We used ODBC for ITNet and our query performance has exceeded all our
expectations. I strongly recommend you use it. In practice, you'll find
that you have other much more terrifying performance problems to worry
about than latency introduced by ODBC.
Connecting to The Games
The first and most important choice you will face when architecting your
servers is whether you should use TCP/IP or UDP as the underlying communications
protocol.
If you've attended any server design lectures in the past, you may have
heard it said that you should always use UDP for games. This is true as
long as fastest possible delivery of data is the overriding design consideration
for your project. In games where players are competing head-to-head over
the Internet in real time, this is clearly the case. In turn-based games
or other types of server-to-game interactions (like sending patch files),
however, things are not so clear cut. I'm not suggesting you make your
next multiplayer quake-killer using TCP/IP, but we need to be careful
not to throw the baby out with the bath water here. For applications where
it's appropriate, TCP/IP can cut months off a development schedule by
providing a foundation protocol with world-class reliable delivery, sliding
window packet delivery, flow control, retransmission characteristics,
and countless esoteric refinements and optimizations. After all, the last
thing you want to be doing while staring down the muzzle of a milestone
is trying to work out your own solution for send-side silly window avoidance.
For all the reasons noted above, I decided to use TCP/IP as the underlying
protocol when architecting my Internet server for Incredible Technologies
Golden Tee PC Golf game. PC Golf was a turn based game, so it was a perfect
fit for TCP/IP. This worked great except for one problem: with TCP you're
much farther from the metal than you are with UDP, so you don't have as
much information about what's going on when the network starts acting
up. With PC Golf, my strategy had been to open TCP connections between
the host and server and keep these connections open throughout a multiplayer
match. What I discovered, though, was that it's really difficult to recover
a lost TCP connection and figure out how to pick up where the last connection
left off.
When it came time to design ITNet we had another great fit for TCP/IP
on our hands, but I was wary of the problems I'd seen with TCP while creating
the PC Golf multiplayer code. Again I faced a difficult choice: should
we use UDP and reinvent all those expensive wheels, or should we somehow
try to work around the problems of TCP? The solution we came up with turned
out to one of the best architectural decisions we made while creating
ITNet. After several grueling design meetings, we decided to use TCP/IP
with a twist: instead of holding a TCP connection open during an entire
communication session, we would adopt the same brief connection model
used by web servers. Our design specified that ITNet clients would be
required to open a new TCP/IP connection for every message they needed
to send, and that each connection would stay open only until the server
had sent one or more messages in reply.
The design worked great. The brief connection strategy neatly solved
the connection recovery problem with TCP/IP: if a connection went south,
it was very easy for the client to recover by simply throwing away the
connection in progress, and then opening a new connection and resending
the last message. All those thorny issues of determining how much data
had already been sent and building servers that could resume processing
at any point in the message handling pipeline simply went away.
The only thing that bothered us about brief connection was the fear that
all those connects and disconnects would kill our data transfer efficiency.
Happily, this was a non-issue. As it turns out, TCP/IP connection setup
and teardown are actually pretty speedy affairs: opening a connection
requires only 3 IP message sends, and closing requires only 4. The fast
connection setup and teardown allowed us to make 50 to 60 brief connections
per call and still blow away all our targets for call speed.
Brief connection was one of the very best choices we made when designing
the ITNet servers. All told, it probably cut 3 or 4 months off our development
schedule, and has continued to work exceptionally well even after 2 years
of nonstop growth. Based on my experience, TCP/IP with brief connection
has earned a permanent place in my network programmers bag of tricks.
If it fits your project's parameters, I strongly urge you to give it a
try.
The Standard Server Architecture
Now that you know how to establish basic connections to the database
and your games, the next step is to settle on an overall architecture
for the server itself.
Important
Safety Tip: At Incredible Technologies, we write all our servers
in C and C++ to run under Windows NT. Since servers don't need much
of a user interface, I recommend avoiding MFC so that you don't add
unnecessary overhead. Also, any serious server application will need
to be written as a Windows NT service. A service is a successor to the
venerable DOS TSR. One of the most useful properties of a service is
that it can be set to run every time your NT server boots up - even
when no user is logged in. This is very important for robustness: if
anything should cause your server PC to reboot, you can sleep well knowing
your server software will be restarted right along with it.
When I originally sat down to design the ITNet servers, I had planned
to use a variant of the classical multithreaded server architecture. In
this architecture, a dedicated thread waits for incoming connections and
spawns a new thread for each client that connects to the server. Each
spawned thread is responsible for handling all communication with a single
client and is shut down when that client disconnects.
The variant on this design uses a small pool of pre-allocated “worker”
threads that can handle any request/reply message transaction with any
client. When a new message arrives from a game, it's handed off to an
idle worker thread for processing. This architecture improves on the classical
design by reducing the amount of context switching on a busy server, as
well as eliminating the need to constantly create and destroy threads.
Another nice feature of this architecture is that Windows NT has native
support for it through an object called an I/O Completion Port. A full
discussion of I/O Completion Ports is beyond the scope of this paper,
but you can find a good overview of the topic in the MSDN document “Writing
Windows NT Server Applications in MFC Using I/O Completion Ports”
by Ruediger Asche (it should be included in your Visual Studio 6.0 help).
This multithreaded pooling architecture was the design I had used for
the PC Golf server, and although it was very fast, it did have some problems.
In particular, it took a ton of work to resolve all the thread synchronization
bugs that cropped up. Also, having all the server's code in a single process
made the server extremely fragile: any bug in any part of the code could
bring the entire server crashing down. This made evolving the existing
code or adding new features exceptionally hazardous… sort of like
performing open heart surgery on a patient who's awake and walking around.
A New Hope
It was while mulling over these problems that we hit upon an alternative
scheme. The idea was to unbundle all the pieces of a standard multithreaded
server into a family of single-threaded servers that could each handle
a single, specialized task. Figuring out how to chop up one big server
into a lot of little servers was actually pretty easy. Most of the servers
would be dedicated to processing specific messages or groups of messages.
For example, we could have one server that handled incoming game data,
and another that maintained and distributed tournament leader boards.
The glue binding all these servers together would be a “meta”
server, which would be responsible for telling clients where to send individual
messages to get them processed. Clients would know only the IP address
of the meta server, but would begin each communication session by requesting
a “server resolution map” to get the addresses of the other
servers. A server resolution map is just a look-up table that matches
every client message type with an IP address and port number of the server
that processes that message.
In the end, we decided to use this second “server family”
architecture for ITNet. In addition to solving the multithreading and
fault-tolerance problems, the server family approach had a number of very
cool properties:
- A single
server could only ever be as powerful as the PC it was running on, but
a server family could be spread across multiple machines for greater
performance.
- The meta
server could dynamically create server resolution maps from data in
the database. This meant that new servers could be added dynamically
to a family, and also that we could override server assignments on a
client-by-client basis (useful for sending specific messages from just
one client to a debug server, for example).
- By designing
the meta to randomly route messages between two or more identical servers
in the same family we could build in automatic load-balancing for free.
This was our reasoning, at any rate, when we embarked on the creation
of ITNet. After running the ITNet server family under heavy load for two
years, though, my opinion of the server family architecture is mixed.
It did many of the things that we had hoped – and even surprised
us with some unexpected benefits – but it also had its share of
problems. The following sections explain what worked and what didn't work.
The Good
One of the biggest wins with the server family approach was being
able to selectively move parts of the “virtual server” around
to different computers on the network. This made debugging in the production
environment incredibly easy: we could run a problematic server right from
our development machines under the debugger when hunting bugs. The beauty
of this approach was that the bulk of the “virtual server”
continued to function exactly as it always did, which was a huge help
in tracking down code that misbehaved only when the servers were running
under load.
Being able to add new servers at will was also hugely beneficial. This
capability came in especially handy when adding new products to ITNet.
The architecture allowed us to easily add a new product-specific server
to the server family which contained all the code under development for
the new product. Even when this code was incredibly fragile and buggy,
it never effected the stable parts of ITNet: since it was running as a
separate process under a protected memory OS, it could crash all day long
without impacting the other servers.
Another big boon was the meta server. Having a dedicated “traffic
cop” managing the other servers let us add a lot of advanced behavior
very easily. It also gave us a natural place to put code that handled
global connection properties. The meta server eventually evolved into
a session and call management server as well as a place to get server
resolution maps.
There was one last benefit of the server family approach that really
took me by surprise. Having many small servers broken down by project
and function gave us an unexpectedly powerful tool for managing the production
process. It turned out that making individual programmers responsible
for specific servers neatly solved many of the problems that come with
shared programming projects. In particular, when an ITNet server crashed,
it was immediately obvious who's code had failed and who should be looking
into it. Up until this time, I hadn't realized how much energy could be
spent wrangling over who should do the work of finding certain tricky
bugs. This ambiguity is unavoidable in a typical programming project,
but it completely disappeared with ITNet. The best part about this was
the way it protected careful programmers from mistakes made by other members
of the team. Programmers who caught most of their bugs during coding just
naturally ended up doing less nasty debugging work. The efficiencies this
added to the server creation process are hard to overstate. Indeed, this
is such a shockingly valuable feature that it almost justifies using the
server family architecture all by itself!
The Bad
Although the idea of having a server that could “gracefully
degrade” sounded good on paper, in practice it didn't buy us much.
It didn't take long for us to realize that the problems caused by allowing
partial calls to complete more than outweighed the benefit of having a
system that could function with one or more failed servers. In the end,
it was much less trouble to just detect and deal with downed servers than
to try to build a system that could run robustly with “partially
successful” communication sessions.
Managing the dozen or so servers that eventually formed part of the ITNet
server family required more work than expected. Even checking the status
of the system meant looking at and digesting 12 separate status values
instead of just one. While this problem wasn't a deal-breaker, it was
nonetheless a pain in the neck.
Data sharing between the servers turned out to be a bigger problem than
originally anticipated. In particular, we found that the database was
a very unsuitable channel for transferring shared state between different
servers.
By far the biggest problem, though, was the single-threaded design of
the individual servers. Without multiple threads to spread out the load,
individual servers could easily build up a backlog of unprocessed messages
when sudden spikes in server activity occurred. The only solution to this
was essentially to hand-tune the servers: we ended up shifting message
processing responsibilities around among servers until each one was carrying
a roughly equal share of the processing load. This enabled the server
family to pipeline data efficiently, but it involved a lot of tedious
hand tweaking. Automatic load balancing might have helped alleviate the
problem, which brings us to our next section…
The Ugly
Somehow or other, the automatic load balancing feature never quite
got finished. Although it worked early on, something broke about six months
after rollout and I never got around to fixing it. By that time, we were
in full-blown crisis mode, and only the most urgent projects were being
worked on. Since we never actually started using the load balancing feature,
it went straight to the bottom of the project list and has stayed there
ever since.
________________________________________________________
Database
Design
|