|
Figure 3. SystemAI and SystemGraphic are subscribing to position changes from SystemPhysics.
The registration of objects between systems occurs during initialization of the framework using the interfaces developed in Smoke. When a system modifies its data it tells the CCM, and that information is passed on during the next frame refresh (see Figure 3). The CCM does not pass the data through to each subscribed system, but instead leaves it to the system itself to copy the data only if necessary for its present operation.
Figure 4. Interfaces make communication between systems easy.
To maintain Smoke’s highly modular design, interfaces were developed between the various systems and the framework itself—a key feature. The framework’s job is to enable communication between the various systems (see Figure 4). For example, we might have a geometry interface for changing the positions of models or a behavior interface for changing AI states of specific objects. The Scheduler uses the task interface to schedule and invoke work with each system.
Running through a typical frame process in Smoke helps clarify this activity. Imagine a single frame in the engine being rendered, where the process starts with the framework’s systems subdividing the tasks for processing. The Scheduler invokes each system per frame allowing the system to naturally divide its own work into granular pieces that can be broken up into various jobs. Competent middleware is able to accomplish this quite easily, which minimizes the Scheduler’s necessary work.
Figure 5. Worker threads are assigned jobs from the pool of available processing work.
Once all of these tasks have been created, they are collected into a single pool that all available worker threads can access. A properly designed framework allows for one worker thread per core, and this is where the power and scalability of the design really starts to play out.
Each job in the pool is then assigned to a thread (see Figure 5) based on the load of each particular thread and the framework’s estimation on how processor-intense each job is.
The one area that the Smoke developers admit is problematic is in this step: Properly balancing the load of jobs across each thread requires some kind of pre-determined knowledge about the work in each job. Obviously in a very lightweight framework, such as the one we are describing, that information is difficult to come by.
Cache coherency on the processor is another important issue to keep in mind when subdividing tasks. In the most optimal setup threads should work on blocks of data rather than random or interleaved data, allowing the processing core to access cache and memory resources in a more linear, and thus quicker, fashion.
Figure 6. Each of the worker threads has a change queue that is accessed by the CCM.
During the processing of the current frame, the various worker threads and jobs post messages to the CCM (see Figure 6) indicating updates to the status of any registered objects.
Once the frame has been completed, the CCM sends those messages on to the appropriate systems that have subscribed to the updated objects before the next frame’s processing begins, allowing the updated data to propagate through the framework. And thus, the cycle repeats, and we have a fully threaded, independent-processing model for a game engine.
|
The real chalenge would be using more complex scene to render from 1 thread to 8 threads. For exemple a farm 8 time bigger or with a lot more rigid body.
The Smoke team is working on another demo called Horsepower. This demo is supposed to show “perceivable difference" based on the number of available threads; very similar to what you are suggesting. In Horsepower, with an increasing number of threads the demo will be able to process more objects animating, using physics, and running AI. All of the source code for Smoke is available here: http://software.intel.com/en-us/articles/smoke-game-technology-demo. Please feel free to experiment with Smoke and I’d love to hear what you do with it.
There is a lot of work to do for the last part :)
About Horsepower, it seems to be very interesting, can you give us more informations ?
I'am interesting about horsepower because I'am actualy working on a demo of my own with physics and 3D only, no animation (object or 2D), no particules, no AI. And of course I expected more objects to be compute based on the number of availlable threads :p
We will release Horsepower early next year. This demo adds LODs and multithreaded animation to the Smoke framework. The Horsepower demo shows a hilly scene with a large number of herding horses. The number of horses is dependent on the number of available threads and compute power. Just like Smoke, we are planning on giving away all the source code.
Keep me up-to-date on your demo. It sounds like an interesting project ^_^
You can have videos of our demo over here http://www.acclib.com/2009/01/load-balancing-23012009.html
Like I said before it's just a technological demo, not a game demo :)
Rgds
VIncent
On a corei7, I can deal with 4x more spheres on 8 threads (compare to 1 thread). Test is here : http://www.acclib.com/2009/01/load-balancing-corei7-27012009.html
Vincent
developers will find these series of articles helpful in reaching greater performance in their games.
We have already presented our first phase of optimization work on Smoke at the Intel Developer Forum in September. Luckily, the presentation was recorded and is already live on
Intel Software Network. Part 1 of the video can be found here: http://software.intel.com/en-us/videos/optimizing-a-video-game-smoke-fanning-the
-flames-to-really-make-it-burn-part-1/ which also has links to the rest of the presentation. The slides can be found here: http://www.intel.com/go/idfsessions
The presentation reviewed the demo code design and then showed a complete performance study, with step-by-step use of Intel tools:
- Benchmark and measure a baseline
- Find common memory and data race bugs with Intel® Parallel Studio
- Drill down into hot spots in the code, and highlight why they're hot with Intel Parallel Studio and Intel® VTuneTM Analyzer
- Find concurrency problems with Intel® Thread Profiler
- Show some speedups made in the code
We are planning to release the updates we made to the Smoke code in the coming weeks, which includes a performance speed up and a port of the code to
Visual Studio 2008. Both an executable and source code will be made available, as usual.
Please let us know if you have any suggestions for making Smoke better!