Book Excerpt: Game Engine Architecture
November 25, 2009 Page 6 of 6
One easy and low-cost way to improve the consistency of game object states is to time-stamp them. It is then a trivial matter to determine whether a game object's state vector corresponds to its configuration at a previous time or the current time. Any code that queries the state of another game object during the update loop can assert or explicitly check the time stamp to ensure that the proper state information is being obtained.
Time-stamping does not address the inconsistency of states during the update of a bucket. However, we can set a global or static variable to reflect which bucket is currently being updated. Presumably every game object "knows" in which bucket it resides. So we can check the bucket of a queried game object against the currently updating bucket and assert that they are not equal in order to guard against inconsistent state queries.
14.6.4. Designing for Parallelism
In Section 7.6, we introduced a number of approaches that allow a game engine to take advantage of the parallel processing resources that have become the norm in recent gaming hardware. How, then, does parallelism affect the way in which game object states are updated?
18.104.22.168. Parallelizing the Game Object Model Itself
Game object models are notoriously difficult to parallelize, for a few reasons. Game objects tend to be highly interdependent upon one another and upon the data used and/or generated by numerous engine subsystems. Game objects communicate with one another, sometimes multiple times during the update loop, and the pattern of communication can be unpredictable and highly sensitive to the inputs of the player and the events that are occurring in the game world. This makes it difficult to process game object updates in multiple threads, for example, because the amount of thread synchronization that would be required to support inter-object communication is usually prohibitive from a performance standpoint. And the practice of peeking directly into a foreign game object's state vector makes it impossible to DMA a game object to the isolated memory of a coprocessor, such as the PLAYSTATION 3's SPU, for updating.
That said, game object updating can theoretically be done in parallel. To make it practical, we'd need to carefully design the entire object model to ensure that game objects never peek directly into the state vectors of other game objects. All inter-object communication would have to be done via message-passing, and we'd need an efficient system for passing messages between game objects even when those objects reside in totally separate memory spaces or are being processed by different physical CPU cores. Some research has been done into using a distributed programming language, such as Ericsson's Erlang (http://www.erlang.org), to code game object models. Such languages provide built-in support for parallel processing and message passing and handle context switching between threads much more efficiently and quickly than in a language like C or C++, and their programming idioms help programmers to never "break the rules" that allow concurrent, distributed, multiple agent designs to function properly and efficiently.
22.214.171.124. Interfacing with Concurrent Engine Subsystems
Although sophisticated, concurrent, distributed object models are theoretically feasible and are an area of extremely interesting research, at present most game teams do not use them. Instead, most game teams leave the object model in a single thread and use an old-fashioned game loop to update them. They focus their attention instead on parallelizing many of the lower-level engine systems upon which the game objects depend. This gives teams the biggest "bang for their buck," because low-level engine subsystems tend to be more performance-critical than the game object model. This is because low-level subsystems must process huge volumes of data every frame, while the amount of CPU power used by the game object model is oft en somewhat smaller. This is an example of the 80-20 rule in action.
Of course, using a single-threaded game object model does not mean that game programmers can be totally oblivious to parallel programming issues. The object model must still interact with engine subsystems that are themselves running concurrently with the object model. This paradigm shift requires game programmers to avoid certain programming paradigms that may have served them well in the pre-parallel-processing era and adopt some new ones in their place.
Probably the most important shift a game programmer must make is to begin thinking asynchronously . As described in Section 7.6.5, this means that when a game object requires a time-consuming operation to be performed, it should avoid calling a blocking function -- a function that does its work directly in the context of the calling thread, thereby blocking that thread until the work has been completed. Instead, whenever possible, large or expensive jobs should be requested by calling a non-blocking function -- a function that sends the request to be executed by another thread, core, or processor and then immediately returns control to the calling function. The main game loop can proceed with other unrelated work, including updating other game objects, while the original object waits for the results of its request. Later in the same frame, or next frame, that game object can pick up the results of its request and make use of them.
Batching is another shift in thinking for game programmers. As we mentioned in Section 14.6.2, it is more efficient to collect similar tasks into batches and perform them en masse than it is to run each task independently. This applies to the process of updating game object states as well. For example, if a game object needs to cast 100 rays into the collision world for various purposes, it is best if those ray cast requests can be queued up and executed as one big batch. If an existing game engine is being retrofitted for parallelism, this oft en requires code to be rewritten so that it batches requests rather than doing them individually.
One particularly tricky aspect of converting synchronous, unbatched code to use an asynchronous, batched approach is determining when during the game loop (a) to kick off the request and (b) to wait for and utilize the results. In doing this, it is oft en helpful to ask ourselves the following questions:
- How early can we kick off this request? The earlier we make the request, the more likely it is to be done when we actually need the results -- and this maximizes CPU utilization by helping to ensure that the main thread is never idle waiting for an asynchronous request to complete. So for any given request, we should determine the earliest point during the frame at which we have enough information to kick it off , and kick it there.
- How long can we wait before we need the results of this request? Perhaps we can wait until later in the update loop to do the second half of an operation. Perhaps we can tolerate a one-frame lag and use last frame's results to update the object's state this frame. (Some subsystems like AI can tolerate even longer lag times because they update only every few seconds.) In many circumstances, code that uses the results of a request can in fact be deferred until later in the frame, given a little thought, some code re-factoring, and possibly some additional caching of intermediate data.
Page 6 of 6