[In this Intel-sponsored Gamasutra feature, a special game-related "onloading" technique called Onloaded Shadows is explored, examining notable performance ramifications and future improvement possibilities.]
With the recent introduction of 2nd Generation Intel® Core™ processors (formerly code named "Sandy Bridge"), graphics functionality is increasingly becoming more tightly integrated with the CPU.
There are many interesting opportunities and techniques to increase the cooperation of the CPU and GPU, including "onloading" graphics techniques, which several of my colleagues are working on.
This article explores an "onloading" technique called Onloaded Shadows, developed by Zane Mankowski with support from Josh Doss, Steve Smith, and Doug Binks. In addition to explaining the technique itself, Zane and team also include interesting performance numbers on processor graphics and discreet graphics cards.
Once you have read through the details, download the source code and give it a try.
-- Orion Granatir
Many games have outdoor scenes where the sun is often the primary light and changes direction slowly over time. Generating shadow maps for these outdoor scenes and for static objects isn't required every frame. They can be generated asynchronously to frame rendering, at a cadence of only a few times a second or even once every few seconds.
Using the GPU to generate these shadow maps synchronously, we can split the workload apart and distribute it across several frames. The CPU can perform this workload asynchronously with Microsoft's Windows Advanced Rasterization Platform (WARP) software rasterizer.
The Onloaded Shadows technique uses WARP to asynchronously generate shadow maps. Copying the data from the CPU to the GPU is the only synchronous work required. The overhead of the copy operation is distributed across several frames to reduce the impact.
Figure 1: Screenshot of the application with Onloaded Shadows technique.
This technique uses WARP for CPU-side rasterization to generate the shadow map on the CPU. By default, WARP uses all available cores on a system, resulting in stalls on the main thread due to thread contention. The WARP device also supports running on a single core; we've chosen this approach for Onloaded Shadows – resulting in the use of only two threads in use.