It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.    

Search articles, jobs, buyers guide, and more.

by Pete Hallenberg
Gamasutra
[Author's Bio]
September 16, 2002

Do You Need SQL?

Database Design

How Big is Big?

Printer Friendly Version

[Back To] Online Games Resource Guide

Sponsored by:

This feature originally appeared in the Proceedings of the 2002 Game Develoers Conference.

 

 


Resource Guide

Of Internet Servers and SQL Databases:
Designing the Backend for Power and Performance

Database Design

The first question that must be answered when setting up an enterprise-grade database is what software you will use to run it. This is a complicated decision influenced by many factors, but I'll try to distill it down to the basic questions.

The major products to choose from are Microsoft's SQL Server, Sybase's Adaptive Server Enterprise, Oracle's 9i Server, and IBM's DB2 software. Any of these enterprise-grade Relational Database Management Systems (RDBMSs) will do the job, however some will do it better than others. I am by no means an expert on all these different packages, so I would urge you to do your own research on any package you're considering. I can say from experience, though, that migrating databases is a nightmare, so you'll probably end up living with whatever software you choose for a long time to come.

If your company is a Windows-only shop, your best bet is probably SQL Server. Sybase's ASE product also runs well under Windows NT and is significantly cheaper, however I advise you to steer clear of it. We went with Adaptive Server Enterprise over SQL Server and have regretted it ever since. The problem with ASE isn't performance – we've tweaked and tuned the heck out of our database and it fairly screams - the problem is the lack of tools and utilities bundled with it. SQL Server includes a ton of tools to help you monitor the health of your database, do regular maintenance, and tune its performance. With Sybase, you end up building or buying nearly everything you need just to perform routine tasks. This is more than just a painful chore: as newcomers to the world of enterprise-grade databases, we had no idea what tools were even required to run our shop. It was a long and painful process to first figure out what we needed, then look for features in ASE that supported it, and finally determine how to buy or build it ourselves. With SQL Server, the most important tools are ready and waiting for you right in the box.

If your company is a Unix shop, then you probably should take a long, hard look at Oracle. Although this is the most difficult database to setup and keep running (you probably don't want to attempt it without hiring a trained Oracle SA and/or DBA), it's also generally considered to be the fastest and most scalable RDBMS out there. Other databases will run under Linux or some other flavor of Unix, but Oracle is the best of the lot.

If you're not strongly biased towards a particular OS, then make your decision based on cost and how data-intensive your game will be. All of these RDBMS products will require ongoing upkeep, but some are significantly cheaper than others. SQL Server is at the low-end of the continuing cost scale: it requires less work than most to keep running, and there are plenty of qualified (and relatively inexpensive) professionals around who can help. Oracle is at the high end of the scale, requiring more effort from generally more expensive employees. One the plus side, you'll have no trouble finding experienced candidates. Sybase and DB2 fall somewhere in the middle in terms of cost. Be forewarned that since Sybase's market share has slipped into the low single-digits it can be very difficult to find potential employees with solid Sybase experience. We ended up hiring professionals with related experience (SQL Server) and training them ourselves.

In terms of determining how much database horsepower your game will need, try to get a sense for how much database activity will go on during an average client/server session and how many games will be connected to your servers simultaneously. If you're building a Massively Multiplayer Online game, you're probably going to need a high-end solution like Oracle. If you're building a matchmaking and chat server for a simple PC game, SQL Server or some other low-end RDBMS will probably give you plenty of horsepower – and be cheaper and easier to use too. These factors are intensely game-specific, though, so you'll need to make this call based on your own intimate knowledge of your game.

A Server Programmer in Dataland
As a longtime C/C++ programmer, I started work on ITNet with some very naïve attitudes about databases. I thought I could just treat the database as a “black box” that was capable of storing and retrieving data, and that I wouldn't have to concern myself with the details of how that data was represented internally. We had one database programmer at the time, and I imagined that he and I could just split the ITNet server project neatly at the database/server interface: I would figure out what the servers needed, and he would work his magic behind the scenes to make the database comply.

It didn't take long for reality to disabuse me of this happy fantasy. I quickly realized that databases could, indeed, serve up data however I wanted it provided I was willing to pay a steep performance price. The ugly truth is that if you want your servers to run quickly and reliably with a database (and you want to avoid driving your DBA to drink) you need to learn something about how databases work. More to the point, you need to write your server code to take advantage of a database's strengths and accommodate its weaknesses.

The rest of this paper is an attempt to pass on the most important bits of information that server programmers needs to know when working with databases. My hope is that you will be able to use this information to cut short the multi-year learning curve I had to climb when building ITNet. I will not attempt to cover all of the most basic database concepts here since these can be filled in by any reasonably decent database book. Instead, I will focus on the things that server programmers really need to pay attention to, and that can only be learned through experience.

Tools of The Trade
The first thing I want to impress upon you when working with an enterprise-grade database is that you need to get yourself a good database book. Experienced programmers won't need to be told this, but even they may be surprised at just how hamstrung they will be if they don't pick up a good book right at the start of a project. We use the book “Sybase SQL Server 11 Unleashed” by Rankins, Garbus, Solomon, and McEwan, and I can vouch for the fact that it is excellent. Make sure you buy a book that is specific to the RDBMS software you will be using.

A standard tool suite for working with databases consists of the following: a database management app (usually provided with the database software) for creating users, tables, indexes, and other database components, a text editor for writing stored procedures, and in interactive SQL interpreter for issuing SQL queries to the database and viewing the results. You might add some special purpose apps to this list later on, but this is the standard set of development tools. Some or all of these components may come with your database software, however if you don't get a good Windows-based SQL interpreter (as we didn't with Sybase's ASE), check out the WinSQL 3.5 shareware package on the Internet.

The Two Basic Table Types
One of the first things I learned about working with databases is that there are really only two effective strategies for storing data in a table. In C++ you can create all sorts of crazy data structures and manipulate them any way we like, but databases are very different beasts. The two storage strategies that work best are what I call historical and summary tables.

A historical table is one that is designed to add a new record every time data arrives that needs to be saved. A good example of this is the GameUnitPlayEvent (GUPE) table at the heart of the ITNet system: GUPE stores a single record for every individual game that is played, and includes detailed financial and game play data about that game. Whenever data about a new game is sent to ITNet, another record is added to GUPE to hold it. I call this type of table “historical” because it contains a complete history of all data that has ever been added to the table. This type of table can be used to store actual histories of various kinds. For example, we use a historical table called CallSession that gets a new record added every time a game initiates a connection to the meta server. CallSession stores the ID number of the game, and the time and date when the connection occurred, so it is literally possible to get a “history” of all calls a game has ever made to ITNet by requesting all records from this table with the game's ID number sorted by the date when the call occurred.

A summary table holds a limited number of rows of data that are designed to be updated with different values when new data is saved to the table. A good example of this sort of table is ITNet's CourseStats table. CourseStats holds statistics for every golf course available in the game, such as the length of the longest drive ever hit on the course, the average number of strokes to hole out, and so on. Whenever data about a new game arrives, the record for the course the game was played on is updated. For example, if the player beat the longest drive on that course, the CourseStat record is updated to show the new longest drive distance and the name of the new record holder.

Notice that both the historical GUPE table and the summary CourseStats table are updated with data from the same server event: a message describing a new game play. The moral of the story is that the same data can usually be stored in multiple ways in the database, so the table design choices you make must reflect how you plan to access and use the data.

Speaking of table design choices, just what are the advantages and disadvantages of the two table types? Broadly speaking, historical tables are good at providing very fast data inserts and maximum querying flexibility. The disadvantages of historical tables are that they can grow at a terrifying rate, and getting data out of them can take a long time (because they usually have so many records to hunt through). Summary tables, on the other hand, have a very small memory footprint, and are usually fast and easy to get data out of. The downside is that they offer much slower data inserts and are very inflexible when it comes to querying data.

Here are some rules of thumb to keep in mind when working with historical tables:

  • Keep the amount of data stored in each record as small as possible. This will help your query times immensely by limiting the amount of I/O the database needs to perform when sifting through table rows.
  • Even a massive historical table can still provide fast query times when retrieving a single row of data. You will need to have a proper index in place to make this work, but it can come in very handy.
  • With fast growing tables, you will need a strategy for preventing them from growing arbitrarily large. A good way to do this is to run a batch job to periodically back up and delete old data from the table.
  • Consider adding an insertion date and time with every record you put in a historical table. This can be an invaluable debugging tool, and also lets you batch process new data added to the table quickly and cleanly. I now do this with every historical table I create as a matter of course.

Here are a few rules of thumb to keep in mind when working with summary tables:

  • Think long and hard about what values you need to track in each column of the table. Once you start collecting data, you can't go back and change your mind.
  • You can save off periodic “snapshots” of the records in a summary table if you need some limited historical behavior. This works especially well for periodic events (for example, you could record all the course records at the start of a tournament if you wanted “before and after” snapshots).
  • Save as much data per record as you want since you're usually not space-limited, but be aware that “insert” times will go up with each additional value that needs to be updated.
  • Watch for accumulators that may exceed the maximum value that can be saved in the column's data type. This is especially important when you have an integer column that is accumulating values that can change in large increments.

Knowing the type of table your servers are dealing with will help you work with it effectively. For ITNet, we use a mix of the two table types. I'd set the proportion at about 70 percent historical tables and 30 percent summary tables. The preponderance of historical tables reflects the fact that slow-growing historical table can often give you a “best of both worlds” solution: all the detail of a historical table without the data bloat and excessive query times of a summary table.

________________________________________________________

How Big is Big?


join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2002 CMP Media LLC. All rights reserved.
privacy policy | terms of service