|
Backface
culling is an important part of how a 3D engine performs visibility checks.
Its purpose is to detect polygons that are invisible in a particular scene
– that is, polygons that face away from the viewer. Detecting back-facing
polygons allows us to eliminate them from an engine's rendering pipeline
at an early stage, thus reducing the amount of computation and memory
traffic. There are various ways to detect whether a polygon faces towards
or away from the viewer, and each method has its own level of complexity
and accuracy.
This article
describes a new culling technique that reduces the storage requirements
for facet normals when performing backface culling. This highly accurate
technique, which performs accurate backface culling in object space, is
particularly applicable to front-end culling, and requires only half as
much storage as the standard facet-normal technique. Included at the end
of this article is an appendix containing C++ implementations of both
the traditional method and the compact method, both of which come in scalar
and SIMD modes. The code for these methods is contained in a downloadable
archive.
Motivation
for front-end culling
Backface
culling can be performed at several stages in the 3D pipeline. Though
we could just sit back and let the rasterizer cull for us, it's advantageous
to perform the culling step earlier. After all, the earlier we get rid
of irrelevant data, the less data needs to be pushed around the system
(saving bandwidth) and the fewer calculations that must be performed (saving
CPU load). Cull may be performed during one of three stages:
1. Before
transformation and lighting
2. After
transformation but before lighting
3. In the
rasterizer (after transformation and lighting).
Culling
during stages 2 and 3 is typically performed in screen space by checking
the clockwise/counterclockwise order of a polygon's vertices. Front-end
culling (stage 1) is typically performed by calculating the dot product
of the viewing vector and the facet normal of the polygon. The facet normal
can be calculated on the fly or precalculated and stored with the data
set. In any case, culling up front (stage 1) typically is faster than
the other culling strategies (stages 2 or 3) since it saves bandwidth
and requires fewer calculations.
The real
expense of front-end culling is that we either have to calculate the facet
normal on the fly or use a precalculated facet normal, which increases
the size of our model database. However, this increase can be reduced
by half, as we show below. And although the precalculated facet normals
add to the model size, the method accesses memory sequentially, with is
advantageous to us.
On the other
hand, with the culling based on vertex-order testing (stages 2 or 3),
the culling procedure loops through the triangles and accesses the vertices
for each triangle. The vertices of neighboring triangles (and even of
the same triangle) can be spread all over the vertex pool, causing random
memory accesses during the culling. These random accesses are slow and
lead to poor cache utilization.
The culling
technique based on facet normals also loops through the triangles, but
it accesses their facet normals instead of their vertices. Since the facet
normals are stored on per-triangle basis, they are fetched sequentially
in the culling loop. Therefore the sequential memory accesses are fast,
they utilize the cache effectively, and they can be further accelerated
using prefetch.
Our work
with game developers has shown that front-end culling often increases
performance substantially (a 10 to 20 percent boost in the frame-rate)
if done properly. In the following sections, we show how to implement
efficient front-end culling, using precalculated facet normals stored
in a compact form with only half the storage requirements of the standard
form.
|