|
As faster rasterization hardware has become more widespread on PCs, the throughput of the front end of the 3D pipeline (i.e., the geometry set up and lighting portions of the process) has become more critical to overall performance. Not too long ago, application performance was limited by poorly performing 3D accelerating hardware – or the lack of an accelerator altogether. Now game developers must be more concerned about feeding the 3D accelerator enough data to keep it busy.
Computing 3D geometry is a fairly straightforward task. It simply requires spitting out processed vertices as fast as possible to keep the accelerator well fed. Per-vertex lighting (usually performed within the geometry engine) is used by the rasterizer to compute polygon fills and perform texture modulation.
Intel’s Streaming SIMD Extensions includes SIMD floating point operations that operate on new 4-wide packed, single-precision floating-point registers. It’s no coincidence that this feature is perfect for 3D geometry, and we’ll demonstrate how to exploit this new extension using a high-level language (C++) and still achieve outstanding performance.
Why Custom 3D Pipelines?
By nature, custom 3D pipelines can take advantage of known, a priori characteristics of 3D content that a general engine cannot always exploit. This can lead to both performance and/or quality advantages. Let’s consider the pros and cons of a custom engine:
Pros:
- Data structures can be optimized. Some APIs (e.g., Direct3D) require input data to be in an array of structures (AOS – more on that shortly). With these APIs, this data must be reformatted (i.e. swizzled) to a structure of arrays (SOA) format for vertical SIMD-style processing. Optimized structures increase performance.
- Custom lighting can be used, which often looks better a standard API’s lighting support. This results in better visual quality.
- Custom engines can be built to support advanced techniques (e.g., NURBS tessellation), resulting in better performance and better visual quality than non-specialized APIs. For instance, a custom 3D engine could just transform a small number of control points and tessellate to the proper level of detail (resulting in better performance), as well as properly deform when external forces are applied (resulting in improved visual quality).
Cons:
- You have to tune and maintain your own engine instead of relying on the developer of the API do it for you.
The bottom line is that when performance and quality are a high priority, the motivation for using a custom engine is greater. What we intend to show here is that a high performing custom engine can be written efficiently in C++ exclusively and still take advantage of Intel’s new Streaming SIMD Extensions. Support for Streaming SIMD Extensions using C++ and intrinsics is part of the support given by the Intel C/C++ compiler. For more information on developing with this tool (including hybrid development using Intel's compiler for some code and another compiler for other code), please see http://developer.intel.com/vtune/icl.
|