Contents
Sponsored Feature: Common Performance Issues in Game Programming
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
November 21, 2009
 
Video Game Watchdog National Institute On Media And The Family Shutting Down [9]
 
Modern Warfare 2 Infinity Ward's 'Most Successful PC Version' Yet [3]
 
New Tech, Design Details Of Project Natal To Emerge At Gamefest In February
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
November 21, 2009
 
Sucker Punch Productions
Texture Artist
 
Sucker Punch Productions
3D Environment Artist
 
Sucker Punch Productions
Network Programmer
 
Sucker Punch Productions
Character Artist
 
Crystal Dynamics
Sr. Level Designer
 
Sony Online Entertainment
Brand Manager
 
Monolith Productions
Sr. Software Engineer, Engine - Monolith Productions - #113767
 
Gargantuan Studios
Technical Art Director
spacer
Latest Features
spacer View All spacer
 
November 21, 2009
 
arrow Upping The Craft: Susan O'Connor On Games Writing [5]
 
arrow Small Developers: Minimizing Risks in Large Productions - Part II [6]
 
arrow iPhone Piracy: The Inside Story [48]
 
arrow And Yet It Grows: Analyzing the Size and Growth of the European Game Market [5]
 
arrow NPD: Behind the Numbers, October 2009 [13]
 
arrow Reflecting On Uncharted 2: How They Did It [5]
 
arrow Sponsored Feature: Rasterization on Larrabee -- Adaptive Rasterization Helps Boost Efficiency
 
arrow Postmortem: Wadjet Eye's The Blackwell Convergence [2]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
November 21, 2009
 
Planckogenesis, Part II: Song Structure & Gravy Train
 
Designing Games Is About Matching Personalities [1]
 
An Indie Developer’s “Biggest Mistake” [9]
spacer
About
spacer News Director:
Leigh Alexander
Features Director:
Christian Nutt
Editor At Large:
Chris Remo
Advertising:
John 'Malik' Watson
Recruitment/Education:
Gina Gross
 
Features
  Sponsored Feature: Common Performance Issues in Game Programming
by Becky Heineman
2 comments
Share RSS
 
 
June 18, 2008 Article Start Previous Page 2 of 3 Next
 

References

A Load-Hit-Store can happen in code, even when it looks like it shouldn't.

void foo(int &count)
{
count = 0;
for (int i=0;i<100;++i) {
if (Test(i)) {
++count;
}
}
}

Advertisement

That code generates a Load-Hit-Store. How?

The variable "count" is memory bound. All writes to it, and in many cases reads, go through memory. Anytime a variable is memory bound and in a tight loop, it can cause Load-Hit-Stores. A way of fixing this is similar to the previous code example.

void foo(int &output)
{
int count = 0;
for (int i=0;i<100;++i) {
if (Test(i)) {
++count;
}
}
output = count; // Write the result
}

VectorLoad-Hit-Store

The previous examples demonstrated how easy it is to cause Load-Hit-Store stalls with floating-point and integer transactions. The VMX register sets suffer from the same problem. It's common that some math operations could be done more efficiently in a VMX operation, but what if it involves non-vector data?

On the Xbox 360, the VMX register intrinsic __vector4 is mapped onto a structure. Run-time accessing of the elements of the structure should be discouraged for the reason below.

XMVECTOR Radius = CalcBounds();
pOut->fRadius = Radius.x;

The second line creates a Load-Hit-Store because the VMX register is used as a structure. As a result, the compiler has to write the contents of the entire register to local memory; then the first element is read with a floating-point register, and only then is the value written into pOut->fRadius.

Here is a way to write the same code without incurring the hidden Load-Hit-Store:

XMVECTOR Radius = CalcBounds();
__stvewx(&pout->fRadius,__vspltw(Radius,0),0);

VMX has the ability to write any specific entry as a single float. The vspltw() operation will copy the requested entry into a temp vector register and the stvewx() operation will handle the writing the float. Using the compiler's feature of accessing the value isn't recommended.

 
Article Start Previous Page 2 of 3 Next
 
Comments

Ben Garcia
profile image
FWIW, these are pretty common traits/behaviors with many RISC processor architectures (which is why they're also referred to as "load/store" architectures).

Carlos O'Donell
profile image
Note: In general GCC 4.2 and above, on most targets, will turn the example code into m_iData+=100. More complex statements may require tweaking.


none
 
Comment:
 


Submit Comment