[In an Intel-sponsored reprint of a Game Developer magazine article from September 2010, game industry coder and veteran Noel Llopis takes a closer look at Data-Oriented Design.]
In late 2009 I wrote about the basics of Data-Oriented Design in this column (see the September 2009 issue of Game Developer). In the time since that article, Data-Oriented Design has gained a lot of traction in game development and many teams are thinking in terms of data for some of the more performance-critical systems.
As a quick recap, the main goal of Data-Oriented Design is achieving high performance on modern hardware platforms. Specifically, that means making good use of memory accesses, multiple cores, and removing any unnecessary code. A side effect of Data-Oriented Design is that code becomes more modular and easier to test.
Data-Oriented Design concentrates on the input data available and the output data that needs to be generated. Code is not something to focus on, as it is with traditional Computer Science, but something that is written to transform the input data into the output data in an efficient way. In modern hardware, that often means applying the same code to large, contiguous blocks of homogeneous memory.
It's pretty easy to apply these ideas to a self-contained system that already works over mostly homogeneous data. Most particle systems in games are probably designed that way because one of their main goals is to be very efficient and handle a large number of particles at high frame rates. Sound processing is another system that is naturally implemented by thinking about data first and foremost.
So, what's stopping us from applying it to all the performance-sensitive systems in a game code base? Mostly just the way we think about the code. We need to be ready to really look at the data and be willing to split up the code into different phases. Let's take a high-level example and see how the code would have to be restructured when optimizing for data access.
Listing 1 shows pseudo code for what could be a typical update function for a generic game AI. To make things worse, that function might even be virtual and different types of entities might implement the code in different ways. Let's ignore that for now and concentrate on what it does. In particular, the pseudo code highlights that, as part of the entity update, it does many conditional ray casting queries, and also updates a state based on the results of those queries. In other words, we're confronted with the typical tree-traversal code structure so common in Object-Oriented Programming.
Ray casts against the world are a very common operation for game entities. That's how they "see" what's around them, and that's what allows them to react correctly to their surroundings. Unfortunately, ray casting is a heavyweight operation, and it involves potentially accessing many different areas in memory: a spatial data structure, other entity representations, polygons in a collision mesh, etc.
Additionally, the entity update function would be very hard to parallelize on multiple cores. It's unclear how much data is read or written in that function, and some of that data (like the world data structure) might be particularly hard and expensive to protect from updates coming from multiple threads.
If we reorganize things a bit, we can significantly improve performance and parallelization.
Without looking at any of the details inside the entity update, we can see the ray casts sticking out like a sore thumb in the middle. A ray cast operation is fairly independent of anything else related to the entity; it's heavyweight, and there could be many of them, so it's a perfect candidate to break up into a separate step.
Listing 2 shows what the broken up entity update code would look like. The update is now split in two different passes. The first pass does some of the updating that can be accomplished independently of any ray casts and decides which, if any, ray casts need to be performed sometime during this frame.
The game code in charge of updating the game processes all AI entities in batches (Listing 3). So instead of calling InitialUpdate(), solve ray casts, and FinalUpdate() for each entity, it iterates over all the AI entities calling InitialUpdate() and adds all the ray cast query requests to the output data. Once it has collected all the ray cast queries, it can process them all at once and store their results. Finally, it does one last pass and calls FinalUpdate() with the ray cast results on each entity.
By removing the ray cast calls from within the entity update function, we've shortened the call tree significantly. The functions are more selfcontained, easier to understand, and probably much more efficient because of better cache utilization. You can also see how it would be a lot easier to parallelize things now by sending all ray casts to one core while another core is busy updating something unrelated (or maybe by spreading all ray casts across multiple cores, depending on your level of granularity).
Note that after calling InitialUpdate() on all entities, we could do some processing on other game objects that might also need ray cast queries and collect them all. That way, we can batch all the ray casts and compute them at once. For years we've been drilled by graphics hardware manufacturers about how we should batch our render calls and avoid drawing individual polygons. This works the same way: by batching all ray casts in a single call, we have the potential to achieve a much higher performance.