My Message close
GAME JOBS
Contents
Sponsored Feature: Optimizing Game Architectures with Intel Threading Building Blocks
 
 
Printer-Friendly VersionPrinter-Friendly Version
 


Part of:



Latest Jobs
spacer View All     Post a Job     RSS spacer
 
May 22, 2013
 
2K Games
Graphics Programmer - 2K Games
 
2K Games
Engine Programmer - 2K Games
 
2K Games
Tools Programmer - 2K Games
 
GREE International
Senior Product Manager, Growth and Revenue
 
GREE International
Business Intelligence Data Analyst
 
Synergy Blue
3D Artist / Animator
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
May 22, 2013
 
Using Small Studios As Stepping Stones In Your Career [2]
 
Kickstarter Fu
 
Why every developer should play Aliens: Colonial Marines
 
Coding "To The Metal" is a dangerous ideal.
 
10 PS4 Touch Pad Ideas You Can Steal [2]
spacer
About
spacer Editor-In-Chief:
Kris Graft
Blog Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Mike Rose, Kris Ligman
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
Education:
Gillian Crowley
 
Contact Gamasutra
 
Report a Problem
 
Submit News
 
Comment Guidelines
Sponsor
Features
  Sponsored Feature: Optimizing Game Architectures with Intel Threading Building Blocks
by Brad Werth [Programming, Visual Computing]
Post A Comment Share on Twitter Share on Facebook RSS
 
 
March 30, 2009 Article Start Page 1 of 5 Next
 

Games are some of the most performance-demanding applications around. The scientist studying proteins or the animator working on the next photorealistic computer animated film can grudgingly wait for a computation to finish; a game player cannot. Game developers have the challenging task of squeezing as much performance as possible out of today's hardware.

This quest for performance has typically focused on graphics tricks and optimizing low-level instructions. The increasing popularity of multi-core CPUs in the consumer market has created an opportunity to make large performance gains by optimizing for multi-threaded execution. Intel has created a library called the Intel Threading Building Blocks (Intel TBB) to help achieve this goal.

This article demonstrates multiple paths to success for game architectures that optimize with Intel TBB. The techniques described are oriented primarily toward optimizing game architectures that already have some threading, showing how Intel TBB can enhance the performance of these architectures with relatively small amounts of coding effort. Even for a serial architecture, these techniques demonstrate straightforward ways of introducing performance threading.

This article is divided into three sections, ordered by increasing coding commitment.

  • The first section shows techniques in which Intel TBB provides optimization opportunities with minor coding effort and no algorithmic changes.
  • The second section details how Intel TBB's efficient implementation of loop parallelism can provide performance enhancements throughout a game architecture.
  • The final section demonstrates techniques for using Intel TBB as the basis for the threading in a game architecture and shows how to implement common threading paradigms using Intel TBB.

Applying these techniques will ensure that a game architecture is maximizing performance on the computers in the market now and will automatically take advantage of future advances in hardware.

The samples presented are available in complete form as a Microsoft Visual Studio project. Most of the samples can be ported to any platform where Intel TBB is available.

This article refers to the performance characteristics of these samples, as measured on a test system. Performance may vary on other systems. The specification of the test system:

Toes in the Water with Efficient Work-Alikes

One of the easiest ways to optimize a game's architecture is to swap in Intel TBB's high-performance implementations of standard containers or memory allocators. Almost all game architectures use containers and allocate memory dynamically, but the standard implementation of these common operations can carry some performance penalties when accessed from multiple threads. Using Intel TBB to optimize these operations requires minimal code changes.

Concurrent containers

Intel TBB provides concurrent implementations of common standard containers, including vector, queue, and hash. These containers use per-element locking to avoid contention from simultaneous access from multiple threads. When accessing standard containers from multiple threads, it is necessary to protect write accesses with mutual exclusion. Depending upon the exclusion mechanism and the amount of contention, this can slow down execution considerably.

Sample 1: Intel TBB containers don't require mutual exclusion

class Sample1StandardKernel: public Kernel
{
...
// access a standard container, but protect it first
EnterCriticalSection(&s_tLock);
s_tStandardVector[i] = i;
LeaveCriticalSection(&s_tLock);
...
};

class Sample1TBBKernel: public Kernel
{
...
// thread-safe containers need no protection
s_tTBBVector[i] = i;
...
};

Sample 1 is an example of how a game architecture might access a standard container and an Intel TBB container. The syntax is similar, but the standard container requires the addition of a mutual exclusion object. The code using the Intel TBB container is faster than the code using the standard container by a factor of 1.21 (21% faster) on the 4-core test system.

Multi-threaded memory allocators

Any game that dynamically allocates memory from multiple threads may be paying a hidden performance penalty. The standard implementations of C-style and C++-style allocators use internal mutual exclusion objects to allow multi-threaded access. Intel TBB provides a more efficient multi-threaded memory allocator that maintains a heap per thread to avoid contention.

Sample 2: Intel TBB allocators have performance advantages

class Sample2StandardKernel: public Kernel
{
...
// allocate and deallocate some memory in the standard fashion
unsigned int *m_pBuffer = (unsigned int *)
malloc(sizeof(unsigned int) * 1000);
free(m_pBuffer);
...
};

class Sample2TBBKernel: public Kernel
{
...
// allocate and deallocate some memory in a TBB fashion
unsigned int *m_pBuffer = (unsigned int *)
scalable_malloc(sizeof(unsigned int) * 1000);
scalable_free(m_pBuffer);
...
};

Sample 2 is an example of memory allocation and deallocation with both the standard C-style methods and with the multi-threaded Intel TBB allocator. The only syntax difference is the name of the function. The code using the Intel TBB allocator has a 1.17 speedup (17% faster) relative to the standard code when run on the 4-core test system.

 
Article Start Page 1 of 5 Next
 
Top Stories

image
Xbox One is Microsoft's biggest play for living room domination
image
Opinion: Xbox One is a desperate prayer to stop time
image
Indies on Xbone: Where's the beef?
image
'If you're backwards compatible, you're really backwards.'
Comments


none
 
Comment:
 




UBM Tech