|
Resource
Guide

Of Internet Servers and SQL Databases:
Designing the Backend for Power and Performance
Database Design
The first question that must be answered when setting up an enterprise-grade
database is what software you will use to run it. This is a complicated
decision influenced by many factors, but I'll try to distill it down to
the basic questions.
The major products to choose from are Microsoft's SQL Server, Sybase's
Adaptive Server Enterprise, Oracle's 9i Server, and IBM's DB2 software.
Any of these enterprise-grade Relational Database Management Systems (RDBMSs)
will do the job, however some will do it better than others. I am by no
means an expert on all these different packages, so I would urge you to
do your own research on any package you're considering. I can say from
experience, though, that migrating databases is a nightmare, so you'll
probably end up living with whatever software you choose for a long time
to come.
If your company is a Windows-only shop, your best bet is probably SQL
Server. Sybase's ASE product also runs well under Windows NT and is significantly
cheaper, however I advise you to steer clear of it. We went with Adaptive
Server Enterprise over SQL Server and have regretted it ever since. The
problem with ASE isn't performance – we've tweaked and tuned the
heck out of our database and it fairly screams - the problem is the lack
of tools and utilities bundled with it. SQL Server includes a ton of tools
to help you monitor the health of your database, do regular maintenance,
and tune its performance. With Sybase, you end up building or buying nearly
everything you need just to perform routine tasks. This is more than just
a painful chore: as newcomers to the world of enterprise-grade databases,
we had no idea what tools were even required to run our shop. It was a
long and painful process to first figure out what we needed, then look
for features in ASE that supported it, and finally determine how to buy
or build it ourselves. With SQL Server, the most important tools are ready
and waiting for you right in the box.
If your company is a Unix shop, then you probably should take a long,
hard look at Oracle. Although this is the most difficult database to setup
and keep running (you probably don't want to attempt it without hiring
a trained Oracle SA and/or DBA), it's also generally considered to be
the fastest and most scalable RDBMS out there. Other databases will run
under Linux or some other flavor of Unix, but Oracle is the best of the
lot.
If you're not strongly biased towards a particular OS, then make your
decision based on cost and how data-intensive your game will be. All of
these RDBMS products will require ongoing upkeep, but some are significantly
cheaper than others. SQL Server is at the low-end of the continuing cost
scale: it requires less work than most to keep running, and there are
plenty of qualified (and relatively inexpensive) professionals around
who can help. Oracle is at the high end of the scale, requiring more effort
from generally more expensive employees. One the plus side, you'll have
no trouble finding experienced candidates. Sybase and DB2 fall somewhere
in the middle in terms of cost. Be forewarned that since Sybase's market
share has slipped into the low single-digits it can be very difficult
to find potential employees with solid Sybase experience. We ended up
hiring professionals with related experience (SQL Server) and training
them ourselves.
In terms of determining how much database horsepower your game will need,
try to get a sense for how much database activity will go on during an
average client/server session and how many games will be connected to
your servers simultaneously. If you're building a Massively Multiplayer
Online game, you're probably going to need a high-end solution like Oracle.
If you're building a matchmaking and chat server for a simple PC game,
SQL Server or some other low-end RDBMS will probably give you plenty of
horsepower – and be cheaper and easier to use too. These factors
are intensely game-specific, though, so you'll need to make this call
based on your own intimate knowledge of your game.
A Server Programmer in Dataland
As a longtime C/C++ programmer, I started work on ITNet with some
very naïve attitudes about databases. I thought I could just treat
the database as a “black box” that was capable of storing
and retrieving data, and that I wouldn't have to concern myself with the
details of how that data was represented internally. We had one database
programmer at the time, and I imagined that he and I could just split
the ITNet server project neatly at the database/server interface: I would
figure out what the servers needed, and he would work his magic behind
the scenes to make the database comply.
It didn't take long for reality to disabuse me of this happy fantasy.
I quickly realized that databases could, indeed, serve up data however
I wanted it provided I was willing to pay a steep performance price. The
ugly truth is that if you want your servers to run quickly and reliably
with a database (and you want to avoid driving your DBA to drink) you
need to learn something about how databases work. More to the point, you
need to write your server code to take advantage of a database's strengths
and accommodate its weaknesses.
The rest of this paper is an attempt to pass on the most important bits
of information that server programmers needs to know when working with
databases. My hope is that you will be able to use this information to
cut short the multi-year learning curve I had to climb when building ITNet.
I will not attempt to cover all of the most basic database concepts here
since these can be filled in by any reasonably decent database book. Instead,
I will focus on the things that server programmers really need to pay
attention to, and that can only be learned through experience.
Tools of The Trade
The first thing I want to impress upon you when working with an enterprise-grade
database is that you need to get yourself a good database book. Experienced
programmers won't need to be told this, but even they may be surprised
at just how hamstrung they will be if they don't pick up a good book right
at the start of a project. We use the book “Sybase SQL Server 11
Unleashed” by Rankins, Garbus, Solomon, and McEwan, and I can vouch
for the fact that it is excellent. Make sure you buy a book that is specific
to the RDBMS software you will be using.
A standard tool suite for working with databases consists of the following:
a database management app (usually provided with the database software)
for creating users, tables, indexes, and other database components, a
text editor for writing stored procedures, and in interactive SQL interpreter
for issuing SQL queries to the database and viewing the results. You might
add some special purpose apps to this list later on, but this is the standard
set of development tools. Some or all of these components may come with
your database software, however if you don't get a good Windows-based
SQL interpreter (as we didn't with Sybase's ASE), check out the WinSQL
3.5 shareware package on the Internet.
The Two Basic Table Types
One of the first things I learned about working with databases is
that there are really only two effective strategies for storing data in
a table. In C++ you can create all sorts of crazy data structures and
manipulate them any way we like, but databases are very different beasts.
The two storage strategies that work best are what I call historical and
summary tables.
A historical table is one that is designed to add a new record every
time data arrives that needs to be saved. A good example of this is the
GameUnitPlayEvent (GUPE) table at the heart of the ITNet system: GUPE
stores a single record for every individual game that is played, and includes
detailed financial and game play data about that game. Whenever data about
a new game is sent to ITNet, another record is added to GUPE to hold it.
I call this type of table “historical” because it contains
a complete history of all data that has ever been added to the table.
This type of table can be used to store actual histories of various kinds.
For example, we use a historical table called CallSession that gets a
new record added every time a game initiates a connection to the meta
server. CallSession stores the ID number of the game, and the time and
date when the connection occurred, so it is literally possible to get
a “history” of all calls a game has ever made to ITNet by
requesting all records from this table with the game's ID number sorted
by the date when the call occurred.
A summary table holds a limited number of rows of data that are designed
to be updated with different values when new data is saved to the table.
A good example of this sort of table is ITNet's CourseStats table. CourseStats
holds statistics for every golf course available in the game, such as
the length of the longest drive ever hit on the course, the average number
of strokes to hole out, and so on. Whenever data about a new game arrives,
the record for the course the game was played on is updated. For example,
if the player beat the longest drive on that course, the CourseStat record
is updated to show the new longest drive distance and the name of the
new record holder.
Notice that both the historical GUPE table and the summary CourseStats
table are updated with data from the same server event: a message describing
a new game play. The moral of the story is that the same data can usually
be stored in multiple ways in the database, so the table design choices
you make must reflect how you plan to access and use the data.
Speaking of table design choices, just what are the advantages and disadvantages
of the two table types? Broadly speaking, historical tables are good at
providing very fast data inserts and maximum querying flexibility. The
disadvantages of historical tables are that they can grow at a terrifying
rate, and getting data out of them can take a long time (because they
usually have so many records to hunt through). Summary tables, on the
other hand, have a very small memory footprint, and are usually fast and
easy to get data out of. The downside is that they offer much slower data
inserts and are very inflexible when it comes to querying data.
Here are some rules of thumb to keep in mind when working with historical
tables:
- Keep
the amount of data stored in each record as small as possible. This
will help your query times immensely by limiting the amount of I/O the
database needs to perform when sifting through table rows.
- Even
a massive historical table can still provide fast query times when retrieving
a single row of data. You will need to have a proper index in place
to make this work, but it can come in very handy.
- With
fast growing tables, you will need a strategy for preventing them from
growing arbitrarily large. A good way to do this is to run a batch job
to periodically back up and delete old data from the table.
- Consider
adding an insertion date and time with every record you put in a historical
table. This can be an invaluable debugging tool, and also lets you batch
process new data added to the table quickly and cleanly. I now do this
with every historical table I create as a matter of course.
Here are a few rules of thumb to keep in mind when working with summary
tables:
- Think
long and hard about what values you need to track in each column of
the table. Once you start collecting data, you can't go back and change
your mind.
- You can
save off periodic “snapshots” of the records in a summary
table if you need some limited historical behavior. This works especially
well for periodic events (for example, you could record all the course
records at the start of a tournament if you wanted “before and
after” snapshots).
- Save
as much data per record as you want since you're usually not space-limited,
but be aware that “insert” times will go up with each additional
value that needs to be updated.
- Watch
for accumulators that may exceed the maximum value that can be saved
in the column's data type. This is especially important when you have
an integer column that is accumulating values that can change in large
increments.
Knowing the type of table your servers are dealing with will help you
work with it effectively. For ITNet, we use a mix of the two table types.
I'd set the proportion at about 70 percent historical tables and 30 percent
summary tables. The preponderance of historical tables reflects the fact
that slow-growing historical table can often give you a “best of
both worlds” solution: all the detail of a historical table without
the data bloat and excessive query times of a summary table.
________________________________________________________
How
Big is Big?
|