Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Sponsored Feature: Who Moved the Goal Posts? The Rapidly Changing World of CPUs
View All     RSS
October 20, 2020
arrowPress Releases
October 20, 2020
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Sponsored Feature: Who Moved the Goal Posts? The Rapidly Changing World of CPUs


October 19, 2009 Article Start Previous Page 6 of 7 Next
 

Intel Hyper-Threading Technology Can Benefit Performance

Introduced with the Intel Pentium 4 processor, Intel HT Technology enables a second thread of computing on a single-core processor at a small cost of transistors (Figure 13). In theory it is a power-efficient way to increase performance because the two threads would be sharing the same execution cores and resources. In many cases a core has "bubbles" in which a single thread cannot execute the maximum four instructions per clock due to dependencies; these bubbles can be filled in with instructions and work from another thread, thereby completing earlier than they would otherwise.

The implication here is that while overall core clock ticks per instruction improves, individual thread performance can go down slightly as the processor juggles the data and instructions for each thread. This is normally something that can be ignored, but occasionally it can create an issue when a performance-critical thread executes alongside a thread doing low-priority work because SMT will not distinguish between them and treats both threads as equal. Where a thread executes depends on the OS; newer OSs improve how multiple threads are distributed, maximizing hardware performance.


Figure 13. Intel Hyper-Threading Technology removes "bubbles" from process threads improving core clock ticks per instruction.

SMT support has been improved in its implementation on the Intel Core i7 processor by removing many of the bottlenecks that limited its efficiency on earlier architectures. With the increase in memory bandwidth by a factor of three, the processor is much less likely to be bandwidth limited and can keep multiple thread feed with data. Resources are no longer split at boot time when Intel HT Technology is enabled, as was the case with the Intel Pentium 4 processor; the Intel Core i7 processor logic is designed to be totally dynamic.

When the processor notices that the user is running only a single thread, the application gains access to all the compute resources, and they are shared only when the software and OS are running more than one thread on that physical core. The actual number of resources available to the threads has also increased (Figure 14) relative to the previous generation.


Figure 14. Simultaneous multi-threading improvements on the Intel Core i7 processor microarchitecture.


Figure 15. Programming for SMT can avoid performance penalties.

There are also some simple approaches to the application coding that will allow SMT to help performance or, at the very least, prevent potential problems (Figure 15) in situations where the work is unevenly distributed across logical and physical cores. The first fix is to break up the tasks into more granular pieces that can be shuffled between logical processors more often. This will effectively lower the likelihood that the program will end up with the two biggest tasks running on the same physical core.

Another issue arises in cases where a developer is trying to spin threads for scheduling purposes. A thread spinning on a logical processor uses execution resources that could be better utilized by the other thread on the core. The solution is to use SMT aware methods such as an EnterCriticalSectionSpinwait, which can put the actual thread to sleep for 25 cycles before a recheck or use events to signal when a thread can actually run rather than relying on Sleep.

A similar affect can happen if a developer is using "sleep zero" to make a background task yield to the primary threads in an application and there are fewer threads than logical processors on the system the yield won't stop the background task running because the OS thinks it has a completely free processor. The spin essentially gets ignored, and the single thread that was supposed to run only occasionally gets thrashed over and over, causing a slowdown in performance for any thread sharing the same physical core. In this case more explicit synchronization is needed such as a windows event to ensure threads only get run once and at an appropriate time.


Article Start Previous Page 6 of 7 Next

Related Jobs

Remedy Entertainment
Remedy Entertainment — Espoo, Finland
[10.20.20]

Senior Material Artist
Remedy Entertainment
Remedy Entertainment — Espoo, Finland
[10.20.20]

Development Manager (Xdev Team)
Remedy Entertainment
Remedy Entertainment — Espoo, Finland
[10.20.20]

Animation Director
innogames
innogames — Hamburg, Germany
[10.20.20]

Mobile Software Developer (C++) - Video Game: Forge of Empires





Loading Comments

loader image