Streaming for Next Generation Games

By fredrik lönn

Introduction

Next Generation

With Sony's PS2 console, you had about 32 MB of memory to fill with data. With Xbox 360, there is 16 times as much space to fill, but the hardware to do it is essentially the same standard DVD reader. Getting anywhere near optimal use out of the DVD is turning out to be more important than ever. In Just Cause we were faced with the task of filling a huge, 32x32km large game world with interesting content - without loading screens.

Possible Ways To Load Data

There are three major ways to read resources into memory.

Resources can be read at the game's or level's startup and kept in memory. This is best for resources used all the time, and critical resources that that we must guarantee are in memory at a specific time. Since these resources are read before the start of the game, we can optimize loading time by moving resources into this category at the cost of memory.

Data can also be read based on the camera's world position. As the camera moves, new resources are read and old resources are evicted from memory. This group is best for resources spread uniformly in the world, and resources that are kept inside an area or a zone.

Finally, data can be read based on some event, for example when the player talks to a NPC. When we load data based on an event we need to ensure we do not load data that directly affects the player, such as loading a physical item at the characters position. This has to be taken into consideration in the game design, as we can never guarantee the latency of the read.

When we designed our streaming system, the most important design criteria was to minimize seeks while keeping our memory budget. I recommend that you load all data at the initial game/level loading, if you can get away with it in your type of game.

Hardware

Some consoles only have a DVD for reading data. How fast we can read data depends on data layout and the quality of the media. Every time we switch layers to read from, it will cost us about 100 ms. In practice, you want all streamed data on a single layer and use the second layer for in-game movies or other data that is not used frequently. Each seek will cost us about 100 ms, and a safe estimate for sustained data rate is 10 MB/s. During the time we do one seek, we could have read 1 MB of data instead. It is almost always a good idea to duplicate data if it helps to avoid seeks. If you are designing for Blu-Ray and PS3, you will need to adjust these estimates.

Reading Data - The Basics

A good streaming system should be designed to always read data asynchronously, as nothing kills performance as blocking synchronous I/O. Asynchronous I/O can be implemented either by using asynchronous I/O functions like ReadFileEx() in Win32, having a separate thread call the I/O functions, or using a dedicated CPU as the IOP on the PS2. When using system calls for asynchronous I/O, note that these may be synchronous when reading from a development kit's hard drive. Always measure your actual read performance using burned discs or, if possible, a DVD emulator.

On Microsoft's system, asynchronous callbacks will not be processed until you call the SleepEx() function. A typical solution will call SleepEx() once per frame, which will cause a small amount of time not using the device between the completion of the I/O request and the call to the SleepEx() function. All of these small times quickly add up, especially when reading many small files.

The best approach here is probably to use a hardware thread to read data, which will work on all platforms and give good performance. The downside is that it makes the system harder to debug.

Using asynchronous I/O has some implications for level design. Game logic can never assume that a streamed resource is ready at the moment it is requested. For example, if a character is scripted to shout "charge" before attacking, the script has to wait for the resource to finish loading before actually attacking.


File Archive System

Console File Systems

The file systems used on consoles are rather simple, which has some implications. For example, simply opening and closing files takes a lot more time than you first might realize. When there are too many files in a folder, we need extra seeks to read file table data. There are hard limits on the number of files in a folder and the length of filenames. We want to allow artists and level designers to organize their data in a logical way, and we want to be able to quickly find data during development. However, even changing the folder being read from will cause an implicit seek. Checking if a file exists or a file's size may also cause an implicit seek.

Archives

Our solution to the overhead of handling files is to have a simple, platform independent game file system. All files are combined into several large archive files, and these files are kept open at all time. When the archive files are created, we hash filenames and store the hash. The archive file's table of contents is always kept in memory and contains the hashed filenames, the archive file handle, offsets of the files in the archive and the sizes of the files in the archive.

Since we have all archive files open at all time, we never need to use the expensive open/close system calls. As we always have the table of contents in memory, we can quickly return the size of every file. As all file request are simply seeks inside the already opened archive files, we do not cause extra seeks when we change to read from another directory. Hashing filenames alone does require that all filenames must be unique, regardless of where they are stored - something that needs to be verified by the content pipeline. We can use this to our advantage when we need to share files for different models.

struct SArchiveToc
{
HANDLE ArchiveFile;
unsigned int offset;
unsigned int size;
};
std::map TableOfContents;

Correct data alignment inside the archives is important to minimize reads. Each file must be aligned to a start on a sector, typically 2048 bytes on a DVD. If files were not aligned the system would in fact have to read extra sectors when reading files spanning sector boundaries. The drawback is that we waste space on the DVD.

Content Pipeline And Tools

In a production system, you do not want to be forced to build new archive files each time your content changes. For development, you need to be able to read files as separate files. The archive system is for final builds and archives should be created by the nightly build system. Another implication of the archive system is that you can no longer use DVD emulator logs directly, as you no longer know the seek/read time of the actual files only for "stuff" inside one of your archives. You need to write a tool to convert the emulator log files to a readable format, or add code to log streaming performance yourself.

Streaming system

Goals And System Overview

A good streaming system should spend as much time as possible to read data, as little as possible seeking, and never ever be idle. It has to read data while at the same time feeding data into the game. Feeding the game with data should finish before new data is read.


Our solution is to have one reader thread that fire callbacks when data is read. We use two buffers to be able to load new data at the same time as we process the recently loaded data.

Double Buffers

We use two large aligned static buffers of equal size. The size is a multiple of the sector size, 2048 bytes. The size of the static buffers sets the size limit of individual resources being read, for example how large texture files can be when stream. It is still possible to get around the limitation by having the game issue several read requests and then combine the data chunks. By using static buffers, we avoid memory fragmentation in the stream system, as we never need to do dynamic memory allocation.


Stream Queue Items

When a game system needs to load a resource through the stream system, it creates a stream item and passes it to the streamer. The stream item has the id of the file (the name or hash, depending on if we run in debug or release), the offset we need to read in the file, and the size of the read. It also contains a pointer to an object implementing a callback interface that will be called when the data is available or if something goes wrong.

struct StreamQueueItem
{
std::string filename;
unsigned int file_offset;
unsigned int size;
IStreamQueueItemCallback *callback;
};


The stream items are kept in one of several queues in the stream system. The streamer pops as many disc-stored items in the current queue as possible and fits them into the buffer. Data is then read by doing one read request from the reader thread. The buffer will eventually be filled by data when the threaded read operation completes, and during this time the buffer is marked as locked. When the read operation completes, the buffer is unlocked and the reader thread switches to repeat the process on the other buffer.

After the switch, a callback to the game system is generated for each stream queue item with a pointer to the newly loaded data, a pointer into one of the static buffers, and the size of the read data. While the callbacks are being issued, the buffer is marked as locked to prevent the read thread to overwrite the data before the game has had a chance to process it. When code called from one of the callback functions needs to access the data after the callback it has to lock the buffer. It is then up to the game system to unlock the buffer - until this happens, streaming will stall.

When running from a DVD, data processing is typically much faster than data read. As always, do profiling and tuning on the actual system you are designing for.

Priorities

Stream items are added to the streamer at one of several priority queues. The queues are processed in priority order, but we only change the current queue when reading the next item would cause a seek. In practice, you probably do not need more than three priority queues.

When the streaming system is under stress, changes in the game's state might cause the data to be obsolete before its read. Imagine an NPC that wants to play back dialogue, but before the dialogue has finished loading, the NPC gets killed. If the dialogue is still scheduled as a stream item we can simply remove the item from its queue. If we have started reading, the best is to let it load but ignore the callback.

Music Streaming

Streaming music is really nothing different from other types of data. When music streaming is implemented by middleware, we must take care to synchronize the different streaming systems and make sure the stream buffers for music are set up as large enough. If music is allowed to interrupt, our streaming system performance will suffer. One solution is to allow music streaming to take place when we are finished with our data, by adding a function to pause our streamer when we load music. Create a budget and decide how many data blocks we should be able to read after each music block. This kind of budget is easy to verify and serves as a guideline when you create the dimensions of the streaming buffers needed for music.

Compressed Data

Our streaming system is easily extended to support on the fly decompression of data by adding a third static buffer - a decompression buffer. As data is read into the static buffer, instead of passing it directly to the game system in a callback, it is decompressed into the decompression buffer one stream queue item a time. When the data is decompressed, we issue the callback and pass the game a pointer to the decompression buffer instead. We need to ensure an extra lock to handle decompression, and none of the stream queue items can be a larger size decompressed than the decompression buffer itself.


Just Cause

Conclusion

Using all of these methods in Just Cause, our hero Rico can do everything from walking around to flying jets through 32x32 km game environments. Everything is streamed directly from DVD on both current generation (PS2/XBOX) and next generation platforms (X360).

Return to the full version of this article
Copyright © UBM Tech, All rights reserved