Contents
Sponsored Feature: Common Performance Issues in Game Programming
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
February 10, 2010
 
Analysts: EA On The Right Track At Last
 
GamesBeat@GDC Confirms OnLive, GameStop, PlayStation Home Speakers
 
Ubisoft Q3 Sales Edge Down, As It Ramps Up Big Franchises
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
February 10, 2010
 
Konami Digital Entertainment Co., Ltd.
Programmer
 
THQ
Animator - Motion Builder (contract)
 
LucasArts
Senior Systems Designer
 
Trion Redwood City
<b>Sr. Brand Manager</b>
 
Telltale Games
Game Designer
 
Telltale Games
Senior Software Engineer - Core Technology
 
Airtight Games
IT System Administrator
 
Roblox
Apple Game Engineer - Kids' Virtual World
spacer
Latest Features
spacer View All spacer
 
February 10, 2010
 
arrow Television, Meet Games
 
arrow Two Halves, Together: Patrick Gilmore On Double Helix [1]
 
arrow The Road To Hell: The Creative Direction of Dante's Inferno [20]
 
arrow The Sensible Side of Immersion [11]
 
arrow Jumpstarting Your Creativity [6]
 
arrow Truth in Game Design [49]
 
arrow Postmortem: Vicious Cycle's Matt Hazard: Blood Bath and Beyond [4]
 
arrow Developers React: The iPad's Future [16]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
February 10, 2010
 
Lineage 2 Interview - 'Freya Update Is Just a Beginning' - Pt.2
 
Fixing the GDC 2010 Schedule Builder [3]
 
Swashbuckling for Landlubbers: Why you may already be encouraging piracy! [20]
spacer
About
spacer News Director:
Leigh Alexander
Features Director:
Christian Nutt
Editor At Large:
Chris Remo
Advertising:
John 'Malik' Watson
Recruitment/Education:
Gina Gross
 
Feature Submissions
Features
  Sponsored Feature: Common Performance Issues in Game Programming
by Becky Heineman
3 comments
Share RSS
 
 
June 18, 2008 Article Start Previous Page 2 of 3 Next
 

References

A Load-Hit-Store can happen in code, even when it looks like it shouldn't.

void foo(int &count)
{
count = 0;
for (int i=0;i<100;++i) {
if (Test(i)) {
++count;
}
}
}

Advertisement

That code generates a Load-Hit-Store. How?

The variable "count" is memory bound. All writes to it, and in many cases reads, go through memory. Anytime a variable is memory bound and in a tight loop, it can cause Load-Hit-Stores. A way of fixing this is similar to the previous code example.

void foo(int &output)
{
int count = 0;
for (int i=0;i<100;++i) {
if (Test(i)) {
++count;
}
}
output = count; // Write the result
}

VectorLoad-Hit-Store

The previous examples demonstrated how easy it is to cause Load-Hit-Store stalls with floating-point and integer transactions. The VMX register sets suffer from the same problem. It's common that some math operations could be done more efficiently in a VMX operation, but what if it involves non-vector data?

On the Xbox 360, the VMX register intrinsic __vector4 is mapped onto a structure. Run-time accessing of the elements of the structure should be discouraged for the reason below.

XMVECTOR Radius = CalcBounds();
pOut->fRadius = Radius.x;

The second line creates a Load-Hit-Store because the VMX register is used as a structure. As a result, the compiler has to write the contents of the entire register to local memory; then the first element is read with a floating-point register, and only then is the value written into pOut->fRadius.

Here is a way to write the same code without incurring the hidden Load-Hit-Store:

XMVECTOR Radius = CalcBounds();
__stvewx(&pout->fRadius,__vspltw(Radius,0),0);

VMX has the ability to write any specific entry as a single float. The vspltw() operation will copy the requested entry into a temp vector register and the stvewx() operation will handle the writing the float. Using the compiler's feature of accessing the value isn't recommended.

 
Article Start Previous Page 2 of 3 Next
 
Comments

Ben Garcia
profile image
FWIW, these are pretty common traits/behaviors with many RISC processor architectures (which is why they're also referred to as "load/store" architectures).

Carlos O'Donell
profile image
Note: In general GCC 4.2 and above, on most targets, will turn the example code into m_iData+=100. More complex statements may require tweaking.

William Botti
profile image
Correction on __stvewx: First two arguments should be switched:
Example that works:

XMVECTOR Radius = { 1.0f, 2.0f, 3.0f, 4.0f };
float Z;
Z = Radius.z; //LHS

__stvewx(__vspltw(Radius,2), &Z,2); //Avoids LHS.

Cheers,
Will



none
 
Comment:
 


Submit Comment