Final Thoughts
Designing systems for functional decomposition, coupled with data
decomposition, will deliver a good amount of parallelization and also
ensure scalability with future processors that have an even larger
number of cores. Remember to use the state manager along with
the messaging mechanism to keep the data in sync with only minimal
synchronization overhead.
The observer design pattern is a function of the messaging
mechanism, and some time should be spent learning it. With
experience, the most efficient design possible can be implemented
to address the engine's needs. After all, the observer design pattern
is the mechanism of communication between the different systems
to synchronize all shared data.
Tasking plays an important role in proper load balancing. Follow the
tips in Appendix D to create an efficient task manager for your engine.
Designing a highly parallel engine is a manageable task if you use
clearly defined messaging and structure. Properly building parallelism
into your game engine will give it significant performance gains on
modern and future processors.
Appendix A
Example of an Engine Diagram
This diagram gives an example of how the different systems are
connected to the engine. All communication between the engine and
the systems goes through a common interface. Systems are loaded
via the platform manager (not shown).
The engine manager and system initializations.
Appendix B
Engine and System Relationship Diagram
Appendix C
The Observer Design Pattern
The observer design pattern is documented in Design Patterns:
Elements of Reusable Object-Oriented Software.
With this pattern, any items interested in data or state changes in
other items do not have to poll the items from time to time to see if
there are any changes. The pattern (Figure 13) defines a subject and
an observer that are used for the change notification-the observer
observes a subject for any changes. The change controller acts as a
mediator between the two.
Figure 13. The observer design pattern.
1. The observer, via the change controller, registers itself with
the subject for which it wants to observe changes.
2. The change controller is actually an observer. Instead of
registering the observer with the subject it registers itself
with the subject and keeps its own list of which observers are
registered with which subject.
3. The subject inserts the observer (actually the change controller)
in its list of observers that are interested in it; optionally there
can also be a change type that identifies what type of changes
the observer is interested in-this helps speed up the change
notification distribution process.
4. When the subject makes a change to its data or state it notifies
the observer via a callback mechanism and passes information
of the types that were changed.
5. The change controller queues the change notifications and
waits for the signal to distribute them.
6. During distribution the change controller calls the
actual observers.
7. The observers query the subject for the changed data or state
(or get the data from the message).
8. When the observer is no longer interested in the subject or is
being destroyed, it deregisters itself from the subject via the
change controller.
Appendix D
Tips on Implementing Tasks
Although task distribution can be implemented in many ways, try
to keep the number of worker threads equal to the number of
available logical processors of the platform. Avoid setting the affinity
of tasks to a specific thread, because the tasks from the different
systems will not complete at the same time. Specific affinities can
lead to a load imbalance among the worker threads, effectively reducing
your parallelization. Also, consider using a tasking library, such as
Intel® Threading Building Blocks, which can simplify task distribution.
Two optimizations can be done in the task manager to ensure
CPU-friendly execution of the different task submitted.
-
Reverse Issuing. If the order of primary tasks being issued is fairly static, the tasks can be alternately issued in reverse order from
frame to frame. The last task to execute in a previous frame will
more than likely still have its data in the cache, so issuing the tasks
in reverse order for the next frame will all but guarantee that the
CPU caches will not have to be repopulated with the correct data.
-
Cache Sharing. Some multi-core processors have their shared cache split into sections, so that two processors may share a cache,
while another two share a separate cache. Issuing sub-tasks from
the same system onto processors sharing a cache increases the
likelihood that the data will already be in the shared cache.
Bibliography
Gamma, Erich, Richard Helm, Ralph Johnson, and John M. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. USA:
Addison-Wesley, 1994.
Intel® Threading Building Blocks (TBB). Available from: http://www.threadingbuildingblocks.org/
Intel and Gamasutra - Visual Computing. Available from: http://www.gamasutra.com/visualcomputing/
Multi-threaded Game Programming and Hyper-Threading Technology. Available from:
http://software.intel.com/en-us/articles/multithreaded-game-programming-and-hyper-threading-technology
Reinders, James. Intel Threading Building Blocks. USA: O'Reilly Media, Inc., 2007.
Smoke - Game Technology Demo. Available from: http://software.intel.com/en-us/articles/smoke-game-technology-demo
Threading Basics for Games. Available from: http://software.intel.com/en-us/articles/threading-basics-for-games
Threading Methodology: Principles and Practice. Available from:
http://software.intel.com/en-us/articles/threading-methodology-principles-and-practice
|
Reminds me of Erlang.
Can you post some test results ?
Corei7 performance scaling will probably be decent (around 4x I bet) :p
But did you have the opportunity to tests on a xeon 7400 (6 cores) or on a dualxeon (12 cores) ?
Vincent