It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

Gamasutra
September 10 2008

Sponsored Feature: Multi-Threading Goo!: A Programmer’s Diary

arrowrightPage 1
arrowrightPage 2
arrowrightPage 3


Printer Friendly Version




Part of:



[More information...]
 

 

Sign up for the Gamasutra Daily Newsletter!

Telltale Games : Technical Artist [11.22.08]
Namco Networks America Inc. : Flash Games Engineer [11.21.08]
War Drum Studios, LLC : Network Game Programmer [11.21.08]
First Act Interactive : Project Lead Software Engineer [11.21.08]
Sony Computer Entertainment America : Senior Manager of Game System Engineering [11.21.08]
Cryptic Studios, Inc. : Lead Game Master Support Representative [11.21.08]
Sparkplay Media : QA Manager [11.21.08]
Cryptic Studios, Inc. : User Interface Designer [11.21.08]
Treyarch / Activision : Technical Director [11.21.08]
Ignition Entertainment : Motion Director [11.21.08]
Ignition Entertainment : Producer [11.21.08]
Sony Pictures Entertainment : Software Engineer / Lua Scripter [11.20.08]
Sony Computer Entertainment America : SR. TECHNICAL PROJECT MANAGER [11.20.08]
Edge of Reality : Project Manager [11.20.08]
Stealth Startup : 3d Graphics Programmer - Virtual Worlds [11.20.08]

View All    Post A Job

Post Resume


Upcoming Events:
DIG London Game Conference
London, Canada
11.27.08

5th Australasian Conference on Interactive Entertainment
Brisbane, Australia
12.03.08

IEEE Symposium on Computational Intelligence and Games
Perth, Australia
12.15.08

2K Bot Prize
Perth, Australia
12.15.08

The 6th Annual Mobile Games Forum 2009
London, United Kingdom
01.21.09

Submit Event

View All


Sponsored Feature: Multi-Threading Goo!: A Programmer’s Diary

(Page 1/3)
Next arrow


[In this Intel-sponsored feature, part of the Gamasutra Visual Computing microsite, Goo! developer Tommy Refenes of PillowFort matches wits with multi-threading in four gripping acts -- and emerges victorious]

I'll admit it. I'm a speed freak. (I tried to get help, but it turns out those clinics by the train station are for something totally different than my addiction.) Simply put, I like what I write to run fast, getting a certain amount of personal joy when something I've written runs at over 100 FPS.

I've always strived to make my code as efficient as possible, but I didn't realize how little I knew until my previous job. Back then I was an Xbox 360 engine programmer at a studio working crazy hours to get a game up on XBLA. I was thrown into the world of memory management, cache optimizations, rendering optimizations, and briefly touched on the land of threading.

In May 2006 I left that company to start working on Goo!, a game in which you control big globs of liquid and use those globs to envelop enemy goos. Players have total control over their goo, which means a ton of collision calculations and even more physics calculations.

My new job, once again, was an engine programmer, and I knew that to have real-time interactive liquid rendering on the screen at a speed that wouldn't give me FPS withdrawal, I would need to thread the collision and physics to the extreme.

(A small disclaimer: Goo! is my first attempt at multi-threaded programming. I've learned everything I know about threading reading MSDN help, a few random Intel documents, and various other resources on threads. I didn't have the fancy pants books or money to buy said books. So in this article I may miss a few official terms and concepts and I may be totally wrong about some things, but like all programming, it's all about results.)

Act 1: Power Overwhelming

Goo! began life on a Intel Pentium 4 processor 3.0 GHz with Hyper-Threading Technology. Goos are rendered by combining thousands of small blob particles on the screen to make up a height map, which is then passed through filters to make the goos look like liquid or mercury. That's how they are rendered now, but in the beginning they were hundreds of smiley faces.

Back then, Goo! rendered and calculated around 256 blobs on the screen. Blobs' collision calculations were culled based on velocity and position, but it was all pushed through the Intel Pentium 4 processor and needless to say the results weren't great.

With the game rendering at around 20 FPS I knew it was time to thread the collision detection. I broke down and purchased an Intel Pentium D 965 Extreme Edition processor. The dual 3.73 GHz cores would be more than enough to start building a new multi-threaded collision detection system. Like a five-year old back in 1986 with a brand new Optimus Prime, I was excited and anxious to get started on the new collision engine.

So began my first attempt at threading. I adopted a philosophy inspired by Ron Popeil (of Ronco fame): "Set it and forget it." Logically data that isn't needed immediately is a perfect candidate for a threaded operation. Collision detection fell into this thinking very nicely because most of the time collision results will be the same for several frames before changing. Therefore, I reasoned, collision could run on a separate thread and be checked every frame to see if the routine had finished, copying over the results when finished, and using them in the current frame's physics calculations.

At the time, this seemed like a great idea. Not waiting for collision detections to finish every frame allowed the engine to move on and continue crunching physics and gameplay calculations with old data, while still maintaining a pretty high frame rate. At this point, Goo! was pushing through around 625 blobs rendering at about 60 FPS.

The threading model was simple: A collision thread was created when the game loaded, sleeping until it had something to work on. The main thread copied position and size data for every blob into a static array the thread could access and then continued on to physics calculations and rendering. The collision thread woke up, saw it had work to do, and proceeded with a very brute force method of figuring out who's touching who. It then saved this data into an array of indices that corresponded to every blob in the game.

Once finished, it posted a flag to the main thread. When the main thread saw that the collision thread had completed its calculations, it copied the newly calculated collision data over the old data, copied new position and size data, and once again sent it to the collision thread to calculate. At this time, each collision calculation took around 20 ms, which meant that collision data was updated just about every 1.5 frames.

With 625 blobs on the screen rendering at around 60 FPS, gameplay could now start development. This is where the first problems began to hit. The goos themselves didn't really feel like goos. When 625 blobs represent one goo it looks fine, but Goo! is a game where you battle against other goos so having more than one on the screen made the goos look a little chunky.

Chunky goos (unless you are talking about GooGoo Clusters) are no good. The solution: more blobs! I expanded the threading model by adding another collision thread and splitting up the work over the two threads. The main thread now broke the collision work up into two pieces, telling collision thread 1 to calculate the first half of the data, and collision thread 2 to calculate the second half. Once the two threads finished, they sent two flags and waited for the main thread to see that data, copy it over, and send more work. This method yielded some better results, allowing around 900 blobs to be rendered to the screen at 60 FPS.

With 900 blobs on the screen goos started looking like goos, and it was time to put the collision to rest for a while and focus on rendering. But as gameplay development progressed, various little unexplained bugs started to pop up: Memory would appear as if it were being accessed after being freed, blobs wouldn't change ownership properly, and the simulation would occasionally explode. Although these were gamebreaking bugs, they were so infrequent that I didn't bother with them until about seven months later after some core gameplay was flushed out.


(Page 1/3)
Next arrow


Comments


Christoffer Lantz 11 Sep 2008 at 2:43 am PST
Umm, creating and destroying threads constantly cannot possibly be the most efficient way of doing things (stack allocation, other resources, tear-down). I would suggest that you read up on the synchronization primitives that are available from the system. Putting a thread to sleep by having it wait on an Win32 Event or any other primitive will make it consume zero cpu time, while allowing for it to be moved to the ready queue instantly when the event is signalled.

jan Ciger 11 Sep 2008 at 3:35 am PST
Yay for using brute force instead of a smarter algorithm. Why didn't you use something like kdTrees to quickly get the closest neighbours and collide only those? I was doing collision detection on 2000+ animated people in real time, on a single-core CPU (old P III Xeon) without threading, with collisions being calculated on EVERY frame - and it worked just fine because I explicitly *DIDN'T* collide everything with everything (O(^2) complexity), but used kdTrees for the nearest neighbor search (http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm - good overview).

Being smarter always beats the brute force approach ...

You can see the video of it here: http://vrlab.epfl.ch/multimedia/BA_behavioural_animation.html#Riot


Tommy Refenes 11 Sep 2008 at 7:24 am PST
Goo! is no longer using brute force collision or the constant creation of threads, this article was about the very first attempts, not the latest build breifly mentioned in the epilogue.

SONG Vincent 11 Sep 2008 at 2:10 pm PST
Thank you for this interesting article, new things in multithreading programing are quite rare this days :)
Can you tell us more about the scaling from 1 to 4 cores ? and 2 to 4 cores ?
Do you think your threading architecture can handle a 16 or 32 core processor ?

Vincent

Matt Benic 12 Sep 2008 at 6:31 am PST
"The new threading model is similar to the last one in which I create threads as I need them"
I also find it strange that frequently creating and destroying threads would be more efficient than waiting on a system object. How do you decide when you 'need' to create a new thread, and how do you allocate work to those threads? Are they dedicated to specific tasks or do they have some sort of generic queue system just running through any kind of task that may be available?

Erik de Jong 13 Sep 2008 at 1:15 pm PST
Maybe I am missing some obvious piece of information in the article but what was wrong with using semaphores and set up this problem like a sort of producer/consumer problem? Wake up threads when there is work for them to be done instead of having them do busy waiting.

Florian T. 13 Sep 2008 at 11:26 pm PST
Nice read. Thanks for sharing your experience.

Tommy Refenes 14 Sep 2008 at 6:38 am PST
Vincent:

The next article is going explain the new engine which has a threading model that self profiles itself on multiple core machines.

Matt / Erik:

The new threading model is similar because I do create more threads to do work when I need them, but when they are not working they are set to a wait state. These threads are assigned to specific functions instead of a giant thread pool that I pick out of when needed. I found this to be better due to the nature and frequency of the work needed to be ran on these threads. There are some threads that are created and then allowed to expire, but the ones that do collision detection / physics and object updates are activated by the main CPU, determine through their self profiling how many threads they can split the work over, create threads if needed (i.e. the profiling determines that instead of running over 3 threads, it will benefit from 4) and then does that work while the main CPU does something then they are put into a wait state and then activated by the main CPU once again when they are needed. This model is by far the most efficient. The purpose of this article was to explain the first attemps and put forth the obvious problems with that old threading model.

Matt Benic 15 Sep 2008 at 2:58 am PST
Thanks Tommy, I'm looking forward to the followup :)

Erik de Jong 15 Sep 2008 at 4:04 am PST
Indeed, thanks for sharing the experience and I'm looking forward to reading more about your final solution.

fabian mejia 18 Sep 2008 at 1:04 pm PST
Hi Tommy, it's a great article.
I hope you the best in the contest. Personally I think using multiprocessor multithreaded power in games is the next step.
I have used threads in Embedded systems and they are really challenging but as soon as you understand them they help a lot optimizing your application.

I will be waiting for you next article

Tommy Refenes 18 Sep 2008 at 5:16 pm PST
Hey Fabian,

Thank you very much for the complement! http://softwarecommunity.intel.com/articles/eng/3978.htm

SONG Vincent 21 Sep 2008 at 1:57 pm PST
Great Job Tommy !!!
"First Place "Goo" by Tommy Refenes, PillowFort Games."

Vincent, as an appetizer to your next article, just tell us if your threading model may work on a 32 core processor :p







join | contact us | advertise | write | my profile
news | features | contract work | jobs | resumes | education | product guide | store