Managing Data Relationships
June 25, 2009 Page 1 of 3
[How does your game data relate to each other? In this independent technical article, originally published in Game Developer magazine and posted online as part of Intel's Visual Computing section, game development veteran Noel Llopis looks at how coders should structure and retrieve the intertwined world of game data in memory.]
From a 10,000-foot view, all video games are just a sequence of bytes. Those bytes can be divided into code and data. Code is executed by the hardware and it performs operations on the data. This code is generated by the compiler and linker from the source code in our favorite computer language. Data is just about everything else.
As programmers, we're obsessed with code: beautiful algorithms, clean logic, and efficient execution. We spend most of our time thinking about it and make most decisions based on a code-centric view of the game.
Modern hardware architectures have turned things around. A data-centric approach can make much better use of hardware resources, and can produce code that is much simpler to implement, easier to test, and easier to understand.
Data is everything that is not code: meshes and textures, animations and skeletons, game entities and pathfinding networks, sounds and text, cut scene descriptions, and dialog trees. Our lives would be made simpler if data simply lived in memory, each bit totally isolated from the rest, but that's not the case.
In a game, just about all the data is intertwined in some way. A model refers to the meshes it contains, a character needs to know about its skeleton and its animations, and a special effect points to textures and sounds.
How are those relationships between different parts of data described? There are many approaches we can use, each with its own set of advantages and drawbacks. There isn't a one-size-fits-all solution. What's important is choosing the right tool for the job.
Pointing the Way
In C++, regular pointers (as opposed to "smart pointers," which we'll discuss later on) are the easiest and most straightforward way to refer to other data. Following a pointer is a very fast operation, and pointers are strongly typed, so it's always clear what type of data they're pointing to.
However, they have their share of shortcomings. The biggest drawback is that a pointer is just the memory address where the data happens to be located. We often have no control over that location, so pointer values usually change from run to run. This means if we attempt to save a game checkpoint which contains a pointer to other parts of the data, the pointer value will be incorrect when we restore it.
Pointers represent a many-to-one relationship. You can only follow a pointer one way, and it is possible to have many pointers pointing to the same piece of data (for example, many models pointing to the same texture). All of this means that it is not easy to relocate a piece of data that is referred to by pointers.
Unless we do some extra bookkeeping, we have no way of knowing what pointers are pointing to the data we want to relocate. And if we move or delete that data, all those pointers won't just be invalid, they'll be dangling pointers. They will point to a place in memory that contains something else, but the program will still think it has the original data in it, causing horrible bugs that are no fun to debug.
One last drawback of pointers is that even though they're easy to use, somewhere, somehow, they need to be set. Because the actual memory location addresses change from run to run, they can't be computed offline as part of the data build. So we need to have some extra step in the runtime to set the pointers after loading the data so the code can use them.
This is usually done either by explicit creation and linking of objects at runtime, by using other methods of identifying data, such as resource uids created from hashes, or through pointer fixup tables converting data offsets into real memory addresses. All of it adds some work and complexity to using pointers.
Given those characteristics, pointers are a good fit to model relationships to data that is never deleted or relocated, from data that does not need to be serialized. For example, a character loaded from disk can safely contain pointers to its meshes, skeletons, and animations if we know we're never going to be moving them around.
One way to get around the limitation of not being able to save and restore pointer values is to use offsets into a block of data. The problem with plain offsets is that the memory location pointed to by the offset then needs to be cast to the correct data type, which is cumbersome and prone to error.
A more common approach is to use indices into an array of data. Indices, in addition to being safe to save and restore, have the same advantage as pointers in that they're very fast, with no extra indirections or possible cache misses.
Unfortunately, they still suffer from the same problem as pointers of being strictly a many-to-one relationship and making it difficult to relocate or delete the data pointed to by the index. Additionally, arrays can only be used to store data of the same type (or different types but of the same size with some extra trickery on our part), which might be too restrictive for some uses.
A good use of indices into an array would be particle system descriptions. The game can create instances of particle systems by referring to their description by index into that array. On the other hand, the particle system instances themselves would not be a good candidate to refer to with indices because their lifetimes vary considerably and they will be constantly created and destroyed.
It's tempting to try and extend this approach to holding pointers in the array instead of the actual data values. That way, we would be able to deal with different types of data. Unfortunately, storing pointers means that we have to go through an extra indirection to reach our data, which incurs a small performance hit. Although this performance hit is something that we're going to have to live with for any system that allows us to relocate data, the important thing is to keep the performance hit as small as possible.
An even bigger problem is that, if the data is truly heterogeneous, we still need to cast it to the correct type before we use it. Unless all data referred to by the pointers inherits from a common base class that we can use to query for its derived type, we have no easy way to find out what type the data really is.
Page 1 of 3