[Single-threaded game engines can still work, but they are becoming increasingly outclassed by multithreaded solutions that are more sophisticated, but also more complex to create and optimize. In this sponsored article, part of the Intel Visual Computing Microsite, Intel application engineer Jeff Andrews lays that process bare.]
With the advent of multiple cores within a processor, the need to create a parallel game engine has become increasingly important. Although it is still possible to focus primarily on only the GPU and have a single-threaded game engine, the advantages of using all the processors on a system, whether CPU or GPU, can give the user a much greater experience. For example, by using more CPU cores, a game can increase the number of rigid body physics objects for greater effects on the screen. Optimized games might also yield a smarter AI.
This white paper covers the basic methods for creating and optimizing a multi-core game engine. It also describes the state manager and messaging mechanism that keeps data in sync and offers several tips for optimizing parallel execution. Multiple block diagrams are used to depict theoretical overviews for managing tasks and states, interfacing with scenes, objects, and tasks, and then initializing, loading, and looping within synchronized threads.
The Parallel Game Engine Framework or engine is a multi-threaded game engine that is designed to scale to all available processors within a platform. To do this, the engine executes different functional blocks in parallel so that it can use all available processors. However, this is often easier said than done, because many pieces in a game engine often interact with one another and can cause many threading errors. The engine takes these scenarios into account and has mechanisms for getting the proper synchronization of data without being bound by synchronization locks. The engine also has a method for executing data synchronization in parallel to keep serial execution time to a minimum.
The concept of a parallel execution state in an engine is crucial to an efficient multi-threaded runtime. For a game engine to truly run in parallel, with as little synchronization overhead as possible, each system must operate within its own execution state and interact minimally with anything else going on in the engine. Data still needs to be shared, but now instead of each system accessing a common data location to get position or orientation data, for example, each system has its own copy-removing the data dependency that exists between different parts of the engine. If a system makes any changes to the shared data, notices are sent to a state manager, which in turn queues the changes, called messaging. Once the different systems have finished executing, they are notified of the state changes and update their internal data structures, which is also part of messaging. Using this mechanism greatly reduces synchronization overhead, allowing systems to act more independently.
Execution state management works best when operations are synchronized to a clock, which means the different systems execute synchronously. The clock frequency may or may not be equivalent to a frame time, and it is not necessary for it to be so. The clock time does not even have to be fixed to a specific frequency, but instead can be tied to frame count, such that one clock step is equal to how long it takes to complete one frame, regardless of length. The implementation one chooses to use for the execution state will determine the clock time. Figure 1 shows the different systems operating in the free step mode of execution, which means they don't have to complete their execution on the same clock. There is also a lock step mode of execution (Figure 2) in which all systems complete in one clock. The main difference between the two modes is that free step provides flexibility in exchange for simplicity, while lock step is the reverse.
Figure 1. Execution state using the free step mode.
Free Step Mode
This execution mode allows a system to operate in the time it needs to complete its calculations. "Free" can be misleading, because a system is not free to complete whenever it wants, but is free to select the number of clocks it needs to execute. With this method, a simple notification of a state change to the state manager is not enough. Data must also be passed along with the state-change notification because a system that has modified shared data may still be executing when another system that wants the data is ready to do an update. This requires the use of more memory and copies, so it may not be the most ideal mode for all situations. Given the extra memory operations, free step might be slower than lock step, although this is not necessarily true.
Lock Step Mode
With this mode all systems must complete their execution in a single clock. Lock step mode is simpler to implement and does not require data to be passed with the notification because systems that are interested in a change made by another system can simply query that system for the value (at the end of execution).
Figure 2. Execution state using the lock step mode.
The lock step mode can also implement a pseudo-free-step mode of operation by staggering calculations across multiple steps. One use for this might be with an AI that calculates its initial "large view" goal in the first clock, then instead of just repeating the goal calculation for the next clock comes up with a more focused goal based on the initial goal.
It is possible for multiple systems to make changes to the same shared data. To do this, the messaging needs some sort of mechanism that can determine the correct value to use. Two such mechanisms can be used:
Time. The last system to make the change time-wise has the correct value.
Priority. A system with a higher priority is the one that has the correct value. This can also be combined with the time mechanism to resolve changes from systems of equal priority.
Data values that are determined to be stale, via either mechanism, will simply be overwritten or thrown out of the change notification queue.
Using relative values for the shared data can be difficult because some data may be order-dependent. To alleviate this problem, use absolute data values so that when systems update their local values they replace the old with the new. A combination of both absolute and relative data is ideal, although its use depends on each specific situation. For example, common data, such as position and orientation, should be kept absolute because creating a transformation matrix for it depends on the order in which the data are received. A custom system that generated particles, via the graphics system, and fully owned the particle information could merely send relative value updates.