Gamasutra: The Art & Business of Making Gamesspacer
Book Excerpt: Game Engine Architecture

Printer-Friendly VersionPrinter-Friendly Version
View All     RSS
April 24, 2014
arrowPress Releases
April 24, 2014
PR Newswire
View All

If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:

Book Excerpt: Game Engine Architecture

November 25, 2009 Article Start Previous Page 6 of 6 Time-Stamping

One easy and low-cost way to improve the consistency of game object states is to time-stamp them. It is then a trivial matter to determine whether a game object's state vector corresponds to its configuration at a previous time or the current time. Any code that queries the state of another game object during the update loop can assert or explicitly check the time stamp to ensure that the proper state information is being obtained.

Time-stamping does not address the inconsistency of states during the update of a bucket. However, we can set a global or static variable to reflect which bucket is currently being updated. Presumably every game object "knows" in which bucket it resides. So we can check the bucket of a queried game object against the currently updating bucket and assert that they are not equal in order to guard against inconsistent state queries.

14.6.4. Designing for Parallelism

In Section 7.6, we introduced a number of approaches that allow a game engine to take advantage of the parallel processing resources that have become the norm in recent gaming hardware. How, then, does parallelism affect the way in which game object states are updated? Parallelizing the Game Object Model Itself

Game object models are notoriously difficult to parallelize, for a few reasons. Game objects tend to be highly interdependent upon one another and upon the data used and/or generated by numerous engine subsystems. Game objects communicate with one another, sometimes multiple times during the update loop, and the pattern of communication can be unpredictable and highly sensitive to the inputs of the player and the events that are occurring in the game world. This makes it difficult to process game object updates in multiple threads, for example, because the amount of thread synchronization that would be required to support inter-object communication is usually prohibitive from a performance standpoint. And the practice of peeking directly into a foreign game object's state vector makes it impossible to DMA a game object to the isolated memory of a coprocessor, such as the PLAYSTATION 3's SPU, for updating.

That said, game object updating can theoretically be done in parallel. To make it practical, we'd need to carefully design the entire object model to ensure that game objects never peek directly into the state vectors of other game objects. All inter-object communication would have to be done via message-passing, and we'd need an efficient system for passing messages between game objects even when those objects reside in totally separate memory spaces or are being processed by different physical CPU cores. Some research has been done into using a distributed programming language, such as Ericsson's Erlang (, to code game object models. Such languages provide built-in support for parallel processing and message passing and handle context switching between threads much more efficiently and quickly than in a language like C or C++, and their programming idioms help programmers to never "break the rules" that allow concurrent, distributed, multiple agent designs to function properly and efficiently. Interfacing with Concurrent Engine Subsystems

Although sophisticated, concurrent, distributed object models are theoretically feasible and are an area of extremely interesting research, at present most game teams do not use them. Instead, most game teams leave the object model in a single thread and use an old-fashioned game loop to update them. They focus their attention instead on parallelizing many of the lower-level engine systems upon which the game objects depend. This gives teams the biggest "bang for their buck," because low-level engine subsystems tend to be more performance-critical than the game object model. This is because low-level subsystems must process huge volumes of data every frame, while the amount of CPU power used by the game object model is oft en somewhat smaller. This is an example of the 80-20 rule in action.

Of course, using a single-threaded game object model does not mean that game programmers can be totally oblivious to parallel programming issues. The object model must still interact with engine subsystems that are themselves running concurrently with the object model. This paradigm shift requires game programmers to avoid certain programming paradigms that may have served them well in the pre-parallel-processing era and adopt some new ones in their place.

Probably the most important shift a game programmer must make is to begin thinking asynchronously . As described in Section 7.6.5, this means that when a game object requires a time-consuming operation to be performed, it should avoid calling a blocking function -- a function that does its work directly in the context of the calling thread, thereby blocking that thread until the work has been completed. Instead, whenever possible, large or expensive jobs should be requested by calling a non-blocking function -- a function that sends the request to be executed by another thread, core, or processor and then immediately returns control to the calling function. The main game loop can proceed with other unrelated work, including updating other game objects, while the original object waits for the results of its request. Later in the same frame, or next frame, that game object can pick up the results of its request and make use of them.

Batching is another shift in thinking for game programmers. As we mentioned in Section 14.6.2, it is more efficient to collect similar tasks into batches and perform them en masse than it is to run each task independently. This applies to the process of updating game object states as well. For example, if a game object needs to cast 100 rays into the collision world for various purposes, it is best if those ray cast requests can be queued up and executed as one big batch. If an existing game engine is being retrofitted for parallelism, this oft en requires code to be rewritten so that it batches requests rather than doing them individually.

One particularly tricky aspect of converting synchronous, unbatched code to use an asynchronous, batched approach is determining when during the game loop (a) to kick off the request and (b) to wait for and utilize the results. In doing this, it is oft en helpful to ask ourselves the following questions:

  • How early can we kick off this request? The earlier we make the request, the more likely it is to be done when we actually need the results -- and this maximizes CPU utilization by helping to ensure that the main thread is never idle waiting for an asynchronous request to complete. So for any given request, we should determine the earliest point during the frame at which we have enough information to kick it off , and kick it there.
  • How long can we wait before we need the results of this request? Perhaps we can wait until later in the update loop to do the second half of an operation. Perhaps we can tolerate a one-frame lag and use last frame's results to update the object's state this frame. (Some subsystems like AI can tolerate even longer lag times because they update only every few seconds.) In many circumstances, code that uses the results of a request can in fact be deferred until later in the frame, given a little thought, some code re-factoring, and possibly some additional caching of intermediate data.

Article Start Previous Page 6 of 6

Related Jobs

Turbine Inc.
Turbine Inc. — Needham, Massachusetts, United States

Director, Analytics Platform Development
Gameloft Toronto
Gameloft Toronto — Toronto, Ontario, Canada

Technical Artist
Muti Labs
Muti Labs — Santa Monica, California, United States

Senior Game Engineer
Digital Extremes
Digital Extremes — London, Ontario, Canada



James Hofmann
profile image
Cool read. I've started using something akin to buckets, but I call them "choreographers" and they're conceptually "fatter" - they aren't just there to resolve update dependencies. Instead they replace the actor model, and execute their own code to drive the dependent objects as completely as possible. A squad of soldiers, a tank and its driver, and a group of projectiles can each be expressed in a choreographed form: the choreographer just needs to be aware of which groups of objects it's acting on, and then it can "take over" their AI. Once I did this, the biggest dependency problems disappeared. Individual behavior can still vary in this model, so I haven't "lost" anything as far as I can tell.

Taure Anthony
profile image
A great read

Edelmar Schneider
profile image
Worth every line.

Quentin Smith
profile image
Very nice read. I wonder what affect event listeners would have on these various designs. The core of the system revolving around buckets, which should resolve 80% of timing issues, with exceptions, the 20%, handled by listeners attaching to update events. In your example, B would hold A, a depends on B. Even if A and B are in the same bucket, A knows it depends on B so A ignores updating and instead waits for broadcast from B. Cross dependency resolution rules would have to be applied, say when A depends on B, yet B depends on C, who depends on A. To ensure all objects update, however, more than likely we could throw away confusing linkages for simulation sake.

Timothy Ryan
profile image
Interesting except for multi-threaded solutions it's common practice to update animation, physics etc. using separate threads.

Jason Gregory
profile image
Very true... using SIMD, fork/join, one-thread-per-subsystem, and/or jobs for parallelism is described in detail in chapter 7 of the book. When major subsystems are updated via threads or jobs, the calls to "update" those systems from the main thread are still there, but they are responsible only for setting up and "kicking" the SPU jobs, or sending "start" commands to the threads involved.

Stephen Northcott
profile image
The spooky thing for me about this article is that it's like reading through the trial and error that I (and others I am sure) went through over the years when refining various engines I/they have been involved in....

Perhaps "spooky" is not the right word, and "reassuring" or "familiar" could be....

A good read.

Eirik Moseng
profile image
The article is for sure great read but I can really recommend the book too even if you are a professional game programmer. Its worth every single penny. I have read through a few books on game engine design and architecture and in my opinion, Jason's book is really outstanding in many cases.

Alban Wood
profile image
Caching object state of previous frames and subsequent interpolation of frames can also allow a system that can only update its simulation (gameplay, animation, physics, etc) at a given rate to still render at a higher rate. Of course, this introduces added latency since we must wait for two simulation frames to be ready before we can show anything, and the input feedback isn't any faster since it is tied to the simulation update, but useful nonetheless.

I agree that updating game objects in parallel is tricky. If static analysis can be performed (what data does the object read, what data does it write, what behaviour runs on it), it can be automated to a certain point (like most job trees would do), but it's easier said than done. Super interesting problem though.

Alban Wood
profile image
Jason, by the way, is your gameplay/simulation still deterministic? Do all jobs for a given frame absolutely non-optionally need to complete before the frame is considered over? Would the jobs being executed in a different order or finishing at different times for two identical runs of the simulation yield a different result?

One thing I consider paramount is deterministic behaviour, for many reasons. But one important reason is to be able to reproduce problems, behaviour, and tune the game against reproducible scenarios (and have the certainty that a problem is fixed or something is properly tuned by re-simulating the game to the problematic point with the fix).

Daniel Koenig-Schieber
profile image
I'll have to second Eirik's post, you did a great job Jason. I came across a couple of books about various programming fields and this particular one goes with the small selection that is as informative as it is just fun to read. I really hope that you may find the time and energy to do some more books like this on other topics.

Believe it or not, although I am not working within the game industry, some of the principles that are explained are as useful for other programming tasks and I always love to get a fresh perspective to handling data management problems. So it is actually both, an interesting look behind the scenes of the bits and bolts in game engines and a valuable resource for concepts that also do apply to other fields.

Ayotunde Ayoko
profile image

Yuri Syrov
profile image
All patterns mentioned were used 20-some years ago in pre-MT universe. Perhaps still relevant for mobiles. Now, with tens of cores, hundreds of threads and VNUMAs, things are not at all the same.

Jason Gregory
profile image
Alban: In the Uncharted 2 engine, SPU jobs are kicked by the central "game loop" thread on the PPU, as described above, or kicked automatically by previous SPU jobs. So all job "scheduling" is ultimately derived from what happens on the PPU. Given a set of initial conditions, what happens during a given frame is therefore *mostly* deterministic. We don't always control which of the 6 SPUs a particular job runs on, and certain subsystems like audio introduce some minor indeterminisms as well. But for the most part, it's deterministic. Our concept of "frame" is a bit loose, because rendering is tied to vsync, but the main game loop is only loosely coupled to rendering. During a given frame it's possible for, say, post processing from the previous frame to be running on the SPUs even *after* the next game frame has started executing on the PPU and kicking its own SPU jobs. This ensures we don't idle the PPU or SPUs waiting for vsync unless we absolutely have to.

Michael Dube
profile image
Nice and informative read. I'm thinking about buying the book. Looks very interesting indeed.

Dave Smith
profile image
Lucid and enlightening. Congratulations on the Front Line Award nomination for your book! I can't wait to read the rest of it.