GAME JOBS
Contents
Sponsored Feature: Multi-Threading Goo!: A Programmer’s Diary
 
 
Printer-Friendly VersionPrinter-Friendly Version
 


Part of:



Latest Jobs
spacer View All     Post a Job     RSS spacer
 
June 7, 2013
 
Sledgehammer Games / Activision
Level Designer (Temporary)
 
High Moon / Activision
Senior Environment Artist
 
LeapFrog
Associate Producer
 
EA - Austin
Producer
 
Zindagi Games
Senior/Lead Online Multiplayer
 
Off Base Productions
Senior Front End Software Engineer
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
June 7, 2013
 
Tenets of Videodreams, Part 3: Musicality
 
Post Mortem: Minecraft Oakland
 
Free to Play: A Call for Games Lacking Challenge [1]
 
Cracking the Touchscreen Code [3]
 
10 Business Law and Tax Law Steps to Improve the Chance of Crowdfunding Success
spacer
About
spacer Editor-In-Chief:
Kris Graft
Blog Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Mike Rose, Kris Ligman
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
Education:
Gillian Crowley
 
Contact Gamasutra
 
Report a Problem
 
Submit News
 
Comment Guidelines
 
Blogging Guidelines
Sponsor
Features
  Sponsored Feature: Multi-Threading Goo!: A Programmer’s Diary
by Tommy Refenes [Programming, Visual Computing]
13 comments Share on Twitter Share on Facebook RSS
 
 
September 10, 2008 Article Start Page 1 of 3 Next
 

[In this Intel-sponsored feature, part of the Gamasutra Visual Computing microsite, Goo! developer Tommy Refenes of PillowFort matches wits with multi-threading in four gripping acts -- and emerges victorious]

I'll admit it. I'm a speed freak. (I tried to get help, but it turns out those clinics by the train station are for something totally different than my addiction.) Simply put, I like what I write to run fast, getting a certain amount of personal joy when something I've written runs at over 100 FPS.

I've always strived to make my code as efficient as possible, but I didn't realize how little I knew until my previous job. Back then I was an Xbox 360 engine programmer at a studio working crazy hours to get a game up on XBLA. I was thrown into the world of memory management, cache optimizations, rendering optimizations, and briefly touched on the land of threading.

In May 2006 I left that company to start working on Goo!, a game in which you control big globs of liquid and use those globs to envelop enemy goos. Players have total control over their goo, which means a ton of collision calculations and even more physics calculations.

My new job, once again, was an engine programmer, and I knew that to have real-time interactive liquid rendering on the screen at a speed that wouldn't give me FPS withdrawal, I would need to thread the collision and physics to the extreme.

(A small disclaimer: Goo! is my first attempt at multi-threaded programming. I've learned everything I know about threading reading MSDN help, a few random Intel documents, and various other resources on threads. I didn't have the fancy pants books or money to buy said books. So in this article I may miss a few official terms and concepts and I may be totally wrong about some things, but like all programming, it's all about results.)

Act 1: Power Overwhelming

Goo! began life on a Intel Pentium 4 processor 3.0 GHz with Hyper-Threading Technology. Goos are rendered by combining thousands of small blob particles on the screen to make up a height map, which is then passed through filters to make the goos look like liquid or mercury. That's how they are rendered now, but in the beginning they were hundreds of smiley faces.

Back then, Goo! rendered and calculated around 256 blobs on the screen. Blobs' collision calculations were culled based on velocity and position, but it was all pushed through the Intel Pentium 4 processor and needless to say the results weren't great.

With the game rendering at around 20 FPS I knew it was time to thread the collision detection. I broke down and purchased an Intel Pentium D 965 Extreme Edition processor. The dual 3.73 GHz cores would be more than enough to start building a new multi-threaded collision detection system. Like a five-year old back in 1986 with a brand new Optimus Prime, I was excited and anxious to get started on the new collision engine.

So began my first attempt at threading. I adopted a philosophy inspired by Ron Popeil (of Ronco fame): "Set it and forget it." Logically data that isn't needed immediately is a perfect candidate for a threaded operation. Collision detection fell into this thinking very nicely because most of the time collision results will be the same for several frames before changing. Therefore, I reasoned, collision could run on a separate thread and be checked every frame to see if the routine had finished, copying over the results when finished, and using them in the current frame's physics calculations.

At the time, this seemed like a great idea. Not waiting for collision detections to finish every frame allowed the engine to move on and continue crunching physics and gameplay calculations with old data, while still maintaining a pretty high frame rate. At this point, Goo! was pushing through around 625 blobs rendering at about 60 FPS.

The threading model was simple: A collision thread was created when the game loaded, sleeping until it had something to work on. The main thread copied position and size data for every blob into a static array the thread could access and then continued on to physics calculations and rendering. The collision thread woke up, saw it had work to do, and proceeded with a very brute force method of figuring out who's touching who. It then saved this data into an array of indices that corresponded to every blob in the game.

Once finished, it posted a flag to the main thread. When the main thread saw that the collision thread had completed its calculations, it copied the newly calculated collision data over the old data, copied new position and size data, and once again sent it to the collision thread to calculate. At this time, each collision calculation took around 20 ms, which meant that collision data was updated just about every 1.5 frames.

With 625 blobs on the screen rendering at around 60 FPS, gameplay could now start development. This is where the first problems began to hit. The goos themselves didn't really feel like goos. When 625 blobs represent one goo it looks fine, but Goo! is a game where you battle against other goos so having more than one on the screen made the goos look a little chunky.

Chunky goos (unless you are talking about GooGoo Clusters) are no good. The solution: more blobs! I expanded the threading model by adding another collision thread and splitting up the work over the two threads. The main thread now broke the collision work up into two pieces, telling collision thread 1 to calculate the first half of the data, and collision thread 2 to calculate the second half. Once the two threads finished, they sent two flags and waited for the main thread to see that data, copy it over, and send more work. This method yielded some better results, allowing around 900 blobs to be rendered to the screen at 60 FPS.

With 900 blobs on the screen goos started looking like goos, and it was time to put the collision to rest for a while and focus on rendering. But as gameplay development progressed, various little unexplained bugs started to pop up: Memory would appear as if it were being accessed after being freed, blobs wouldn't change ownership properly, and the simulation would occasionally explode. Although these were gamebreaking bugs, they were so infrequent that I didn't bother with them until about seven months later after some core gameplay was flushed out.

 
Article Start Page 1 of 3 Next
 
Top Stories

image
Microsoft's official stance on used games for Xbox One
image
Keeping the simulation dream alive
image
A 15-year-old critique of the game industry that's still relevant today
image
The demo is dead, revisited
Comments

Christoffer Lantz
profile image
Umm, creating and destroying threads constantly cannot possibly be the most efficient way of doing things (stack allocation, other resources, tear-down). I would suggest that you read up on the synchronization primitives that are available from the system. Putting a thread to sleep by having it wait on an Win32 Event or any other primitive will make it consume zero cpu time, while allowing for it to be moved to the ready queue instantly when the event is signalled.

jan Ciger
profile image
Yay for using brute force instead of a smarter algorithm. Why didn't you use something like kdTrees to quickly get the closest neighbours and collide only those? I was doing collision detection on 2000+ animated people in real time, on a single-core CPU (old P III Xeon) without threading, with collisions being calculated on EVERY frame - and it worked just fine because I explicitly *DIDN'T* collide everything with everything (O(^2) complexity), but used kdTrees for the nearest neighbor search (http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm - good overview).



Being smarter always beats the brute force approach ...



You can see the video of it here: http://vrlab.epfl.ch/multimedia/BA_behavioural_animation.html#Riot

Tommy Refenes
profile image
Goo! is no longer using brute force collision or the constant creation of threads, this article was about the very first attempts, not the latest build breifly mentioned in the epilogue.

SONG Vincent
profile image
Thank you for this interesting article, new things in multithreading programing are quite rare this days :)

Can you tell us more about the scaling from 1 to 4 cores ? and 2 to 4 cores ?

Do you think your threading architecture can handle a 16 or 32 core processor ?



Vincent

Matt Benic
profile image
"The new threading model is similar to the last one in which I create threads as I need them"

I also find it strange that frequently creating and destroying threads would be more efficient than waiting on a system object. How do you decide when you 'need' to create a new thread, and how do you allocate work to those threads? Are they dedicated to specific tasks or do they have some sort of generic queue system just running through any kind of task that may be available?

Erik deJong
profile image
Maybe I am missing some obvious piece of information in the article but what was wrong with using semaphores and set up this problem like a sort of producer/consumer problem? Wake up threads when there is work for them to be done instead of having them do busy waiting.

Florian T.
profile image
Nice read. Thanks for sharing your experience.

Tommy Refenes
profile image
Vincent:



The next article is going explain the new engine which has a threading model that self profiles itself on multiple core machines.



Matt / Erik:



The new threading model is similar because I do create more threads to do work when I need them, but when they are not working they are set to a wait state. These threads are assigned to specific functions instead of a giant thread pool that I pick out of when needed. I found this to be better due to the nature and frequency of the work needed to be ran on these threads. There are some threads that are created and then allowed to expire, but the ones that do collision detection / physics and object updates are activated by the main CPU, determine through their self profiling how many threads they can split the work over, create threads if needed (i.e. the profiling determines that instead of running over 3 threads, it will benefit from 4) and then does that work while the main CPU does something then they are put into a wait state and then activated by the main CPU once again when they are needed. This model is by far the most efficient. The purpose of this article was to explain the first attemps and put forth the obvious problems with that old threading model.

Matt Benic
profile image
Thanks Tommy, I'm looking forward to the followup :)

Erik deJong
profile image
Indeed, thanks for sharing the experience and I'm looking forward to reading more about your final solution.

fabian mejia
profile image
Hi Tommy, it's a great article.

I hope you the best in the contest. Personally I think using multiprocessor multithreaded power in games is the next step.

I have used threads in Embedded systems and they are really challenging but as soon as you understand them they help a lot optimizing your application.



I will be waiting for you next article

Tommy Refenes
profile image
Hey Fabian,



Thank you very much for the complement! http://softwarecommunity.intel.com/articles/eng/3978.htm

SONG Vincent
profile image
Great Job Tommy !!!

"First Place "Goo" by Tommy Refenes, PillowFort Games."



Vincent, as an appetizer to your next article, just tell us if your threading model may work on a 32 core processor :p


none
 
Comment:
 




UBM Tech