Backface culling is an important part of how a 3D engine performs visibility checks. Its purpose is to detect polygons that are invisible in a particular scene – that is, polygons that face away from the viewer. Detecting back-facing polygons allows us to eliminate them from an engine's rendering pipeline at an early stage, thus reducing the amount of computation and memory traffic. There are various ways to detect whether a polygon faces towards or away from the viewer, and each method has its own level of complexity and accuracy.
This article describes a new culling technique that reduces the storage requirements for facet normals when performing backface culling. This highly accurate technique, which performs accurate backface culling in object space, is particularly applicable to front-end culling, and requires only half as much storage as the standard facet-normal technique. Included at the end of this article is an appendix containing C++ implementations of both the traditional method and the compact method, both of which come in scalar and SIMD modes. The code for these methods is contained in a downloadable archive.
Motivation for front-end culling
Backface culling can be performed at several stages in the 3D pipeline. Though we could just sit back and let the rasterizer cull for us, it's advantageous to perform the culling step earlier. After all, the earlier we get rid of irrelevant data, the less data needs to be pushed around the system (saving bandwidth) and the fewer calculations that must be performed (saving CPU load). Cull may be performed during one of three stages:
1. Before transformation and lighting
2. After transformation but before lighting
3. In the rasterizer (after transformation and lighting).
Culling during stages 2 and 3 is typically performed in screen space by checking the clockwise/counterclockwise order of a polygon's vertices. Front-end culling (stage 1) is typically performed by calculating the dot product of the viewing vector and the facet normal of the polygon. The facet normal can be calculated on the fly or precalculated and stored with the data set. In any case, culling up front (stage 1) typically is faster than the other culling strategies (stages 2 or 3) since it saves bandwidth and requires fewer calculations.
The real expense of front-end culling is that we either have to calculate the facet normal on the fly or use a precalculated facet normal, which increases the size of our model database. However, this increase can be reduced by half, as we show below. And although the precalculated facet normals add to the model size, the method accesses memory sequentially, with is advantageous to us.
On the other hand, with the culling based on vertex-order testing (stages 2 or 3), the culling procedure loops through the triangles and accesses the vertices for each triangle. The vertices of neighboring triangles (and even of the same triangle) can be spread all over the vertex pool, causing random memory accesses during the culling. These random accesses are slow and lead to poor cache utilization.
The culling technique based on facet normals also loops through the triangles, but it accesses their facet normals instead of their vertices. Since the facet normals are stored on per-triangle basis, they are fetched sequentially in the culling loop. Therefore the sequential memory accesses are fast, they utilize the cache effectively, and they can be further accelerated using prefetch.
Our work with game developers has shown that front-end culling often increases performance substantially (a 10 to 20 percent boost in the frame-rate) if done properly. In the following sections, we show how to implement efficient front-end culling, using precalculated facet normals stored in a compact form with only half the storage requirements of the standard form.