Our Properties: Gamasutra GameCareerGuide IndieGames Indie Royale GDC IGF Game Developer Magazine GAO
My Message close
Contents
Sponsored Feature: Common Performance Issues in Game Programming
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
February 10, 2012
 
Analyst questions validity of unusual January NPD results [3]
 
DICE 2012: Blizzard's Pearce on World Of Warcraft's launch hangover
 
DICE 2012: Insomniac's Price on Quality Of Life, ditching the 'Loser' badge [2]
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
February 10, 2012
 
Sony Computer Entertainment America LLC
Audio Tools Engineer
 
Sony Computer Entertainment America LLC
World Wide Studios Technical Product Manager
 
Sony Computer Entertainment America LLC
Senior Software Application Engineer
 
Sony Computer Entertainment America LLC
Senior Gamer Insights Specialist
 
High 5 Games
Technical Artist
 
Airtight Games
Art Director
spacer
Latest Features
spacer View All spacer
 
February 10, 2012
 
arrow Principles of an Indie Game Bottom Feeder [18]
 
arrow Postmortem: CyberConnect 2's Solatorobo: Red the Hunter [1]
 
arrow Jerked Around by the Magic Circle - Clearing the Air Ten Years Later [39]
 
arrow Building the World of Reckoning [4]
 
arrow SPONSORED FEATURE: TwitchTV - How to Build Community Around Your Game in 2012 [13]
 
arrow Happy Action, Happy Developer: Tim Schafer on Reimagining Double Fine [9]
 
arrow Building an iOS Hit: Phase 1 [11]
 
arrow Postmortem: Appy Entertainment's SpellCraft School of Magic [5]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
February 10, 2012
 
Audio Passes: Success Through Layering
 
What the current RPG can learn from Diablo 1
 
Double Fine's Kickstarter Windfall: Will Patronage Supplant Traditional Game Publishing? [5]
 
The Principles of Game Monetization
 
Did DoubleFine Just break the publishing model for good? [12]
spacer
About
spacer Editor-In-Chief/News Director:
Kris Graft
Features Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Frank Cifaldi, Tom Curtis, Mike Rose, Eric Caoili, Kris Graft
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
 
Feature Submissions
 
Comment Guidelines
Sponsor
Features
  Sponsored Feature: Common Performance Issues in Game Programming
by Becky Heineman [XNA]
3 comments Share on Twitter Share on Facebook RSS
 
 
June 18, 2008 Article Start Page 1 of 3 Next
 

[In this technical article, part of Microsoft's XNA-related Gamasutra microsite, XNA Developer Connection staffer and Interplay co-founder Becky Heineman gives tips on avoiding the 'Load-Hit-Store' performance-killer when making games.]

"90% of the time is spent in 10% of the code, so make that 10% the fastest code it can be."

One of the most common problems encountered in creating computer games is performance. Issues like disk access, GPU performance, CPU performance, race conditions, and memory bandwidth (or lack thereof) can cause stalls or delays that may turn a 30-frames-per-second game into a 9-frames-per-second game.

This article will describe one of the most common CPU performance killers, the Load-Hit-Store, and give tips and tricks on how to avoid it.

Load-Hit-Store

Ask any Xbox 360 performance engineer about Load-Hit-Store and they usually go into a tirade. The sequence of a memory read operation (The Load), the assignment of the value to a register (The Hit), and the actual writing of the value into a register (The Store) is usually hidden away in stages of the pipeline so these operations cause no stalls. However, if the memory location being read was one recently written to by a previous write operation, it can take as many at 40 cycles before the "Store" operation can complete.

Example:

stfs fr3,0(r3) ;Store the float
lwz r9,0(r3) ;Read it back into an integer register
oris r9,r9,0x8000 ;Force to negative

The first instruction writes a 32-bit floating-point value into memory, and the following instruction reads it back. What's interesting is that the load instruction isn't where the stall occurs; it's the "oris" instruction. That instruction can't complete until the "store" into r9 finishes, and it's waiting for the L1 cache to update.

What's going on? The first instruction stores the data and marks the L1 cache as "dirty". It takes about 40 cycles for the data to be written into the L1 cache and become available for the CPU to use. During this window of time, an instruction requests that data from the cache and then "hits" R9 for a "store". Since the last instruction can't execute until the store is complete, you've got a stall.

The Microsoft tool, PIX, can locate these issues. Since it's confusing to tag the "oris" instruction as the cause of the stall (which it is), PIX flags the load instruction that started the chain of events so the programmer has a better chance of fixing the issue.

Three CPUs in One Thread

Think of the PowerPC as three completely separate CPUs, each with its own instruction set, register set, and ways of performing operations on the data. The first is the integer unit with its 32-integer registers, which is considered the workhorse, handling a large percentage of the operations.

The second is the floating-point unit with its 32 floating-point registers, handling all of the simple mathematics. Finally, the third is the VMX unit with its 128 registers dealing with complex vector operations.

Why think of the units as three CPUs that share a common instruction stream? These units have no way of directly transferring data between one another internally. Due to the lack of an instruction to move the contents of an integer register to a floating-point register, the CPU must write the integer value to memory, and then load it into a floating-point register using a memory read instruction. That pattern of operation is by nature, a Load-Hit-Store.

Moving data from the integer unit to the floating-point unit is as simple as...

Example:

int iTime;
float fTime;
fTime = static_cast<float>(iTime);

This is extremely simple code and very common, but on the PowerPC, an instruction is generated to store the integer value to memory such that a floating-point instruction can be executed to load from memory into a floating-point register. A fix-up instruction follows that converts the integer representation into a floating-point representation, and the sequence is complete.

A common way to generate Load-Hit-Store is using member values or reference pointers as iterators in tight loops.

Example:

for (int i=0;i<100;++i)
{
m_iData++;
}

Seldom are compilers smart enough to figure out that the above loop resolves into m_iData+=100 and optimizes it into a single operation. Most will happily load m_iData at runtime, increment it, and store it back into memory referenced by the "this" pointer. The first pass of the loop will run at full speed, but once it loops back, the m_iData value will incur a Load-Hit-Store from the write operation of the previous pass through the loop.

Since registers invoke no penalty, if the code was rewritten to look like this:

int iData = m_iData;
for (int i=0;i<100;++i) {
iData++;
}
m_iData = iData;

Not only will the code run much faster since the operations are all in registers, you increase the chances the compiler will reduce this to iData+=100 and remove any chance of a Load-Hit-Store bottleneck.

 
Article Start Page 1 of 3 Next
 
Comments

Ben Garcia
profile image
FWIW, these are pretty common traits/behaviors with many RISC processor architectures (which is why they're also referred to as "load/store" architectures).

Carlos O'Donell
profile image
Note: In general GCC 4.2 and above, on most targets, will turn the example code into m_iData+=100. More complex statements may require tweaking.

William Botti
profile image
Correction on __stvewx: First two arguments should be switched:
Example that works:

XMVECTOR Radius = { 1.0f, 2.0f, 3.0f, 4.0f };
float Z;
Z = Radius.z; //LHS

__stvewx(__vspltw(Radius,2), &Z,2); //Avoids LHS.

Cheers,
Will



none
 
Comment:
 




UBM Techweb
Game Network
Game Developers Conference | GDC Europe | GDC Online | GDC China | Gamasutra | Game Developer Magazine | Game Advertising Online
Game Career Guide | Independent Games Festival | Indie Royale | IndieGames

Other UBM TechWeb Networks
Business Technology | Business Technology Events | Telecommunications & Communications Providers

Privacy Policy | Terms of Service | Contact Us | Copyright © UBM TechWeb, All Rights Reserved.