Gamasutra: The Art & Business of Making Gamesspacer
Sponsored Feature: Optimizing Game Architectures with Intel Threading Building Blocks
View All     RSS
October 25, 2014
arrowPress Releases
October 25, 2014
PR Newswire
View All





If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
Sponsored Feature: Optimizing Game Architectures with Intel Threading Building Blocks

March 30, 2009 Article Start Page 1 of 5 Next
 

Games are some of the most performance-demanding applications around. The scientist studying proteins or the animator working on the next photorealistic computer animated film can grudgingly wait for a computation to finish; a game player cannot. Game developers have the challenging task of squeezing as much performance as possible out of today's hardware.

This quest for performance has typically focused on graphics tricks and optimizing low-level instructions. The increasing popularity of multi-core CPUs in the consumer market has created an opportunity to make large performance gains by optimizing for multi-threaded execution. Intel has created a library called the Intel Threading Building Blocks (Intel TBB) to help achieve this goal.

This article demonstrates multiple paths to success for game architectures that optimize with Intel TBB. The techniques described are oriented primarily toward optimizing game architectures that already have some threading, showing how Intel TBB can enhance the performance of these architectures with relatively small amounts of coding effort. Even for a serial architecture, these techniques demonstrate straightforward ways of introducing performance threading.

This article is divided into three sections, ordered by increasing coding commitment.

  • The first section shows techniques in which Intel TBB provides optimization opportunities with minor coding effort and no algorithmic changes.
  • The second section details how Intel TBB's efficient implementation of loop parallelism can provide performance enhancements throughout a game architecture.
  • The final section demonstrates techniques for using Intel TBB as the basis for the threading in a game architecture and shows how to implement common threading paradigms using Intel TBB.

Applying these techniques will ensure that a game architecture is maximizing performance on the computers in the market now and will automatically take advantage of future advances in hardware.

The samples presented are available in complete form as a Microsoft Visual Studio project. Most of the samples can be ported to any platform where Intel TBB is available.

This article refers to the performance characteristics of these samples, as measured on a test system. Performance may vary on other systems. The specification of the test system:

Toes in the Water with Efficient Work-Alikes

One of the easiest ways to optimize a game's architecture is to swap in Intel TBB's high-performance implementations of standard containers or memory allocators. Almost all game architectures use containers and allocate memory dynamically, but the standard implementation of these common operations can carry some performance penalties when accessed from multiple threads. Using Intel TBB to optimize these operations requires minimal code changes.

Concurrent containers

Intel TBB provides concurrent implementations of common standard containers, including vector, queue, and hash. These containers use per-element locking to avoid contention from simultaneous access from multiple threads. When accessing standard containers from multiple threads, it is necessary to protect write accesses with mutual exclusion. Depending upon the exclusion mechanism and the amount of contention, this can slow down execution considerably.

Sample 1: Intel TBB containers don't require mutual exclusion

class Sample1StandardKernel: public Kernel
{
...
// access a standard container, but protect it first
EnterCriticalSection(&s_tLock);
s_tStandardVector[i] = i;
LeaveCriticalSection(&s_tLock);
...
};

class Sample1TBBKernel: public Kernel
{
...
// thread-safe containers need no protection
s_tTBBVector[i] = i;
...
};

Sample 1 is an example of how a game architecture might access a standard container and an Intel TBB container. The syntax is similar, but the standard container requires the addition of a mutual exclusion object. The code using the Intel TBB container is faster than the code using the standard container by a factor of 1.21 (21% faster) on the 4-core test system.

Multi-threaded memory allocators

Any game that dynamically allocates memory from multiple threads may be paying a hidden performance penalty. The standard implementations of C-style and C++-style allocators use internal mutual exclusion objects to allow multi-threaded access. Intel TBB provides a more efficient multi-threaded memory allocator that maintains a heap per thread to avoid contention.

Sample 2: Intel TBB allocators have performance advantages

class Sample2StandardKernel: public Kernel
{
...
// allocate and deallocate some memory in the standard fashion
unsigned int *m_pBuffer = (unsigned int *)
malloc(sizeof(unsigned int) * 1000);
free(m_pBuffer);
...
};

class Sample2TBBKernel: public Kernel
{
...
// allocate and deallocate some memory in a TBB fashion
unsigned int *m_pBuffer = (unsigned int *)
scalable_malloc(sizeof(unsigned int) * 1000);
scalable_free(m_pBuffer);
...
};

Sample 2 is an example of memory allocation and deallocation with both the standard C-style methods and with the multi-threaded Intel TBB allocator. The only syntax difference is the name of the function. The code using the Intel TBB allocator has a 1.17 speedup (17% faster) relative to the standard code when run on the 4-core test system.


Article Start Page 1 of 5 Next

Related Jobs

Red 5 Studios
Red 5 Studios — Orange County, California, United States
[10.24.14]

Graphics Programmer
Red 5 Studios
Red 5 Studios — Orange County, California, United States
[10.24.14]

Gameplay Programmer
Gearbox Software
Gearbox Software — Plano, Texas, United States
[10.24.14]

Server Programmer
Forio
Forio — San Francisco, California, United States
[10.24.14]

Web Application Developer Team Lead






Comments



none
 
Comment: