Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Sponsored Feature: Do-it-yourself Game Task Scheduling
View All     RSS
May 22, 2019
arrowPress Releases
May 22, 2019
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Sponsored Feature: Do-it-yourself Game Task Scheduling

February 24, 2010 Article Start Previous Page 3 of 3

A Parallel Game Loop

Figure 2

There are traditionally two main phases in a frame: the update that advances time and the draw that makes an image. In Nulstein, these phases have been subdivided further to achieve parallelism.

The update is split into two phases. The first is a pre-update phase where every entity can read from every other but cannot modify its public state. This allows every entity to make decisions based on the state at "previous frame." They will then apply changes during the second phase, which is the actual Update. The rule for this second phase is that an entity can write to its state but must not access any other entity.

This enables both of these phases to run as simple ParallelFor's and is trivial to implement unless there is a hard dependency between entities and you can't use the previous frame's state. A classic example would be a camera attached to a car: you don't want the viewport to move inside the car and need to know its exact position and orientation before you can update the camera.

In these cases an entity can declare itself dependent on another entity (or several entities) and be updated only once it has been updated. And because you know they have finished updating, it's okay for the dependent object to read the updated states. In the demo, this is how the small cubes manage to stay tightly attached to the corners of bigger cubes.

Figure 3

The draw phase is split into three phases. During the Draw, every entity is called to list items it needs to render and adds the items to a display list. The list has a 64 bits key that encodes an ID and other data such as z-order, alpha-blending, material, and so on.

This is done through a ParallelFor, with each thread adding to independent sections of an array. During this phase, things like visibility culling and filling of dynamic buffers can be done in parallel.

Once every entity has declared what it wants to draw, the array goes through Sort which can be done in parallel too (although here, with entities in the order of a thousand, it doesn't make a difference). Finally, the scene is rendered, each item in the sorted list calling back the parent entity which does the actual draw calls.

Figure 3

In figure 3, there are two Intel Thread Profiler captures of a release build running on a Intel Core i7 processor at 3.2GHz, at the same scale, with the task scheduler on and off. Because this is a release build, there is no annotation but the benefit of using the task scheduler is nevertheless quite clear; work is shown as green bars, with the serial case above and the parallel case below. The phase that remains serial is the render phase and it is mainly spent in DirectX and the graphics driver, with the gray line representing time spent waiting for vblank.

Figure 4, below, shows the demo in profile mode where it is instrumented to show actual work as sections in solid. This gives an idea of how tasks spread over all threads (timings are not accurate: instrumentation has a massive impact on the performance of our spinning mutex).

Figure 4

The resulting executable for this project is under 40K and if you use an exe packer, like kkrunchy by Farbrausch, it actually gets down to 16K. So, today, if you were to ask me whether you can use a task scheduler with stealing in a 64K, I can give you a definite yes!

Beyond this feat, and because I believe we need to experiment with things to really understand them, I'm hoping that this project will provide people interested in parallel programming with a nice toy to mess around with.

(For any project with less drastic size constraints, I recommend you turn to Intel Threading Building Blocks as it provides a lot more optimizations and features.)


Reinders, James. Intel Threading Building Blocks. USA: O'Reilly Media, Inc., 2007.

Pietrek, Matt. Remove Fatty Deposits From Your Applications Using Our 32-Bit Liposuction Tools. Microsoft Systems Journal, October 1996 issue.

Ericson Christer. Order your graphics draw calls around!

Article Start Previous Page 3 of 3

Related Jobs

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan

Experienced Game Developer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States

Technical Artist - Cinematics
Flight School
Flight School — Montreal, Quebec, Canada

Game Programmer
Dream Harvest
Dream Harvest — Brighton, England, United Kingdom

Technical Game Designer

Loading Comments

loader image