Occlusion Culling Algorithms

By Eric Haines,Tomas Möller
Excerpted from Real-Time Rendering
(AK Peters, 1999)

One of the great myths concerning computers is that one day we will have enough processing power. Even in a relatively simple application like word processing, we find that additional power can be applied to all sorts of things, such as on-the-fly spell and grammar checking, more elaborate graphic presentation, antialiased text display, automated voice recognition and dictation, etc.

In real-time rendering we have at least three performance goals: more frames per second, higher resolution, and more (and more realistic) objects in the scene. A speed of 60-72 frames per second is generally considered enough, and perhaps 1600x1200 is enough resolution for a while, but there is no real upper limit on scene complexity. The rendering of a Boeing-777 would include 132,500 unique parts and over 3,000,000 fasteners, which would yield a polygonal model with over 500,000,000 polygons [Cripe98]. Our conclusion: speed-up techniques and acceleration schemes will always be needed.

In this article, we will talk about a certain class of acceleration scheme called occlusion culling techniques. Most of this article is an excerpt from chapter 7, "Speed-Up Techniques" from our book Real-Time Rendering (www.realtimerendering.com or www.acm.org/tog/resources/RTR/). In the book, the occlusion culling section is preceded by sections on backface and clustered culling, hierarchical view-frustum culling, portal culling, and detail culling. Sections on impostor algorithms, level-of-detail techniques, triangle fan, strip and polygon mesh techniques follow after.

To cull can mean to "select from a flock," and in the context of computer graphics this is exactly what culling techniques do. The flock is the whole scene that we want to render, and the selection is limited to those portions of the scene that are not considered to contribute to the final image. The rest of the scene is sent through the rendering pipeline. The actual culling can theoretically take place at any stage of the rendering pipeline. For culling algorithms that are implemented in hardware, we can typically
only enable/disable or set some parameters for the culling function. For full control, the programmer can implement the algorithm in the application stage (on the CPU). Culling is often achieved by using geometric calculations but is in no way limited to these. For example, an algorithm may also use the contents of the frame buffer.

As we all know, visibility may be solved via a hardware construction called the Z-buffer. Even though it may solve visibility correctly, the Z-buffer is not a very smart mechanism in all respects. For example, it has the following implications. Imagine that the viewer is looking along a line where 10 spheres are placed. This is illustrated in Figure 1.

Figure 1. An illustration of how occlusion culling can be useful. Ten spheres are placed in a line, and the viewer is looking along this line (left). The depth complexity image in the middle shows that some pixels are written to several times, even though the final image (on the right) only shows one sphere.

An image rendered from this viewpoint will show but one sphere, even though all 10 spheres will be scan-converted and compared to the Z-buffer and then potentially written to the color buffer and Z-buffer. The simple conclusion in this case is that nine spheres will be drawn unnecessarily. This uninteresting scene is not that likely to be found in reality, but it describes (from its viewpoint) a densely populated model. These sorts of configurations are found in such real scenes as a rain forest, an engine, a city, and the inside of a skyscraper.

Thus it seems plausible that an algorithmic approach to avoid this kind of inefficiency may pay off in terms of speed. Such approaches go under the name of occlusion culling algorithms, since they try to cull away (avoid drawing) objects that are occluded, that is, inside the view frustum but not visible in the final image. The optimal occlusion culling algorithm would select only the objects that are visible. In a sense, the Z-buffer selects and renders only those objects which are visible, but not before all objects are sent through the pipeline. The idea behind efficient occlusion culling algorithms is to perform some simple tests early on and so avoid sending data through much of the pipeline.

Pseudocode for a general occlusion culling algorithm is shown in Figure 2, where the function isOccluded, often called the visibility test, checks whether an object is occluded. G is the set of geometrical objects to be rendered; OR is the occlusion representation.

1: OcclusionCullingAlgorithm (G)
2: OR=empty
3: for each object g in G
4: if(isOccluded(g,OR))
5: Skip(g)
6: else
7: Render(g)
8: Update(OR,g)
9: end
10: end
Figure 2. Pseudocode for a general occlusion culling algorithm. G contains all the objects in the scene, and OR is the occlusion representation.

Depending on the particular algorithm, OR represents some kind of occlusion information. OR is set to be empty at the beginning. After that all objects (which passed the view-frustum culling test) are processed.

Consider a particular object. First we test whether the object is occluded with respect to the occlusion representation OR. If it is occluded, then it is not processed further, since we then know that it will not contribute to the image. If the object is determined not to be occluded, then that object has to be rendered, since it probably contributes to the image (at that point in the rendering). Finally OR is updated with that object.

For some algorithms, it is expensive to update the occlusion representation, so this is only done once (before the actual rendering starts) with the objects that are believed to be good occluders. This set is then updated from frame to frame.

A number of occlusion algorithms will be scrutinized in this section.


Hierarchical Z-Buffering and the Hierarchical Visibility Algorithm

One approach to occlusion culling is the hierarchical visibility (HV) algorithm [Greene93]. This algorithm maintains the scene model in an octree, and a frame's Z-buffer as an image pyramid, which we call a Z-pyramid. The octree enables hierarchical culling of occluded regions of the scene, and the Z pyramid enables hierarchical Z-buffering of individual primitives and bounding volumes. The Z-pyramid is thus the occlusion representation of this algorithm. Examples of these data structures are shown in Figure 3.

Any method can be employed for organizing scene primitives in an octree, although Greene et al. [Greene93] recommend a specific algorithm that avoids assigning small primitives to large octree nodes. In general, an octree is constructed by enclosing the entire scene in a minimal axis-aligned box. The rest of the procedure is recursive in nature, and starts by checking whether the box contains fewer than a threshold number of primitives. If it does, the algorithm binds the primitives to the box and then terminates the recursion. Otherwise, it subdivides the box along its main axes using three planes, thereby forming eight boxes (hence the name octree). Each new box is tested and possibly subdivided again into 2x2x2 smaller boxes. This process continues until each box contains fewer than the threshold number of primitives, or until the recursion has reached a specified deepest level [Samet89a,Samet89b]. This is illustrated in two dimensions, where the data structure is called a quadtree, in Figure 4.

Figure 3. Example of occlusion culling with the hierarchical visibility algorithm [Greene95], showing a complex scene (lower right) with the corresponding Z-pyramid (on the left), and octree subdivision (upper right). By traversing the octree from front to back and culling occluded octree nodes as they are encountered, this algorithm only visits visible octree nodes and their children (the nodes portrayed at the upper right) and only renders the polygons in visible boxes. In this example, culling of occluded octree nodes reduces the depth complexity of the polygons that need to be rendered from 84 to 2.5. (Courtesy of Ned Greene/Apple Computer.)

The construction of an octree takes too much time to be done at runtime, so this method is best suited for static models.

Once the octree has been created, each frame is rendered in approximately front-to-back order by calling the procedure ProcessOctreeNode (outlined in Figure 5) with the root node of the octree. Octree nodes that are outside the view frustum are culled away. The first step determines whether the node's bounding box is visible with respect to the Z-pyramid, using a procedure that will be described later. In this case, a node's bounding box is a box in the octree. If the node is occluded, we do not need to process the contents of that box further, since its contents do not contribute to the final image. Otherwise, we render the primitives associated with the node into the Z-pyramid (tileInto in the pseudocode) and then process each of the node's children (if it has any) in front-to-back order using this same recursive procedure. When recursion finishes, all visible primitives have been tiled into the Z-pyramid, and a standard Z-buffer image of the scene has been created.

Figure 4. The construction of a quadtree (which is the two-dimensional version of an octree). The construction starts from the left by enclosing all objects in a bounding box. Then the boxes are recursively divided into four equal-sized boxes until each box (in this case) is empty or contains one object.

The HV algorithm performs occlusion culling very efficiently because it only traverses visible octree nodes and their children, and it only renders the primitives in visible nodes. This can save much of the work in scenes that are densely occluded. For example, in the scene pictured in Figure 3, more than 99% of on-screen polygons are inside occluded octree nodes, which are therefore culled by the Z pyramid [Greene95].

1: ProcessOctreeNode(OctreeNode N)
2: if(isOccluded(NBV, ZP)) then return;
3: for each primitive p in N
4: tileInto(p, ZP)
5: end
6: for each child node C in N in front-to-back order
7: ProcessOctreeNode(C)
8: end
Figure 5. Pseudocode for the hierarchical visibility algorithm. To render a frame this procedure is called with the root node of the octree. NBV is the bounding volume of the octree node N, and ZP is the Z-pyramid that is the occlusion representation of this algorithm. The operation tileInto renders a primitive p into the Z-pyramid, ZP, and this also updates the entire Z-pyramid.

Now we will describe how the Z-pyramid is maintained and how it is used to accelerate culling. The finest (highest-resolution) level of the Z-pyramid is simply a standard Z-buffer. At all other levels, each z-value is the farthest z in the corresponding 2x2 window of the adjacent finer level. Therefore each z-value represents the farthest z for a square region of the screen. To maintain a Z-pyramid, whenever a z-value is overwritten in the Z-buffer it is propagated through the coarser levels of the Z-pyramid. This is done recursively until the top of the image pyramid is reached, where only one z-value remains (this is illustrated
in Figure 6.

Figure 6. On the left, a 4x4 piece of the Z-buffer is shown. The numerical values are the actual z-values. This is downsampled to a 2x2 region where each value is the farthest (largest) of the four 2x2 regions on the left. Finally, the farthest value of the remaining four z-values is computed. These three maps compose an image pyramid which is called the hierarchical Z-buffer.

Next, we describe how hierarchical culling of octree nodes is done. To determine whether a node is visible, the front faces of its bounding box are tested against the Z-pyramid. The node is occluded if all of its front faces are occluded by the Z-pyramid. To establish whether an individual face is occluded, we begin at the coarsest Z-pyramid cell that encloses the face's screen projection. The face's nearest depth within the cell
(znear) is then compared to the Z-pyramid value, and if znear is farther, the face is known to be occluded. For densely occluded scenes, this procedure often culls an occluded face with a single depth comparison. When this initial test fails to cull a face, its visibility can be definitively established by recursively traversing from the initial Z-pyramid cell down to finer levels in the Z-pyramid. Additional depth comparisons at the encountered child cells must be performed. If this subdivision procedure does not ultimately find a visible sample on the face, the face is occluded. In scenes where the bounding boxes of octree nodes overlap deeply on the screen, this hierarchical procedure can establish visibility much more efficiently than can conventional Z-buffering. For example, in the scene pictured in Figure 3, hierarchical culling of octree nodes with the Z-pyramid generates roughly one hundred times fewer depth comparisons than visibility testing with conventional Z-buffering.

As we have seen, the original HV algorithm [Greene93] used a Z-pyramid only for occlusion tests and employed traditional Z-buffer scan conversion to render the polygons in visible octree nodes. Subsequently, a more efficient method, called hierarchical polygon tiling [Greene96], was developed. This algorithm adapts the basic screen subdivision procedure described above to polygon "tiling," that is, finding the visible samples on a polygon. When tiling a polygon into the Z-pyramid, it must be determined, at each level of subdivision where the polygon is compared to a Z-pyramid cell, whether the polygon overlaps the cell and if so, whether the polygon is occluded by the cell. Overlap tests are accelerated with coverage masks that indicate which subcells within a Z-pyramid cell are covered by the polygon, and occlusion tests are performed by comparing the polygon's nearest z-value within the cell to the Z-pyramid value. This procedure for hierarchical Z-buffering is very efficient, because it only traverses regions of the screen where a polygon is visible or nearly visible.

Hierarchical polygon tiling can be performed even more efficiently if the polygons in a scene are organized into a BSP tree or an "octree of BSP trees" [Greene96]. The reason is that this enables traversing polygons in strict front-to-back order, which eliminates the need to maintain depth information. Rather, the only occlusion information required is whether or not each image sample has been written, and this information can be maintained in an image pyramid of coverage masks called a coverage pyramid. This is the data structure maintained by hierarchical polygon tiling with coverage masks [Greene96], which is similar to hierarchical Z-buffering except that occlusion tests are performed with coverage-mask operations instead of depth comparisons, which accelerates tiling considerably.

The HV algorithm implemented with hierarchical tiling may well be the most efficient method known for software rendering of complex scenes composed of polygons, but it is not fast enough for real-time rendering of complex scenes on today's microprocessors. To enable real-time rendering, Greene et al. [Greene93] suggest modifying hardware Z-buffer pipelines to support HV, which requires substituting a Z-pyramid for the Z-buffer, and including a fast feedback path to report visibility of bounding volumes. It is likely that these modifications would extend the domain of real-time rendering to much more complex scenes, such as the scene in Figure 3.

In the absence of this kind of hardware support, the HV algorithm can be accelerated on systems having conventional Z-buffer hardware by exploiting frame-to-frame coherency [Greene93]. The idea is that octree nodes that were visible in one frame tend to be visible in the next. With this variation, the first frame of an animation sequence is generated with the standard HV algorithm, except that after completing the frame, a list of octree nodes that were visible in that frame (the visible node list) is created by testing nodes for visibility against the Z-pyramid. Subsequent frames are generated with the following two-pass algorithm. In the first rendering pass, primitives associated with nodes on the visible node list are rendered by Z-buffer hardware. Then, the Z-buffer of the partially rendered scene is read back from the hardware, and a Z-pyramid is built from this Z-buffer. In the second rendering pass, the standard HV algorithm is run in software, traversing the octree from front to back but skipping nodes which have already been rendered. This second pass fills in any missing parts of the scene. The final step in processing a frame is to update the visible node list. Typically, this variation of the HV algorithm runs considerably faster than the all-software version, because nearly all visible polygons are rendered with Z-buffer hardware.

Greene and Kass [Greene94b] have developed an extension to hierarchical Z-buffering which renders antialiased scenes with error bounds. Another interesting algorithm for occlusion culling is the visibility skeleton developed by Durand et al. [Durand97,Durand97b].


The HOM Algorithm

The hierarchical occlusion map (HOM) algorithm [Zhang97] is another way of enabling hierarchical image-space culling (such as the hierarchical Z-buffering algorithm). However, the HOM algorithm can be used on systems that have graphics hardware but not a hardware Z-pyramid, and it can also handle dynamic scenes. The HOM algorithm is described in detail in Zhang's Ph.D. thesis [Zhang98].

We start by describing how the function isOccluded works. This function, used in the pseudocode in Figure 2, is a key part of the algorithm. This occlusion test takes place after the view transform, so the viewer is located at the origin looking down the negative z-axis, with the x-axis going to the right, and the y-axis going upwards. The test is then divided into two parts: a one-dimensional depth test in the z-direction and a two-dimensional overlap test in the xy plane, i.e., whereby the image gets projected. The overlap test supports approximate visibility culling, where objects that "shine through" small holes in the occluders can be culled away using an opacity threshold parameter.

For both tests, a set of potentially good occluders is identified before the scene is rendered, and the occlusion representation is built from these. This step is followed by the rendering of the scene, where the occluders are rendered without an occlusion test. Then the rest of the scene is processed by having each object tested against the occlusion representation. If the object occluded by the occluder representation, it is not rendered.

For the two-dimensional overlap test, the occluders are first rendered into the color buffer with a white color on a black background. Therefore, texturing, lighting, and Z-buffering can be turned off. An advantage of this operation is that a number of small occluders can be combined into a large occluder. The
rendered image, which is called an occlusion map, is read back into the main memory of the computer. For simplicity, we assume that this image has the resolution of 2^n x 2^n pixels. It is used as the base for the occlusion representation. Then a hierarchy of occlusion maps (HOM), i.e., an image pyramid of occlusion maps, is created by averaging over 2^n-1 x 2^n-1 pixel blocks to form an image of 2^n-1 x 2^n-1 pixels. This is done recursively until a minimum size is reached (for example 4x4 pixels). The highest-resolution level of the HOM is numbered 0, with increasing numbers having decreasing resolution. The gray-scale values in the HOM are said to be the opacity of the pixels. A high opacity value (near white) for a pixel at a level above 0 means that most of the pixels it represents are covered by the HOM.

The creation of the HOM can be implemented either on the CPU or by texture mapping, with bilinear interpolation used as a minification filter. For large image sizes, the texture filtering approach was found to be faster, and for small image sizes, the CPU was faster. Of course, this varies with CPUs and graphics hardware. For a 1024x1024 image, Zhang et al. [Zhang97] used a 256x256 image as the base for the HOM. An example of a HOM is shown in Figure 7.

Figure 7. On the left is an image of 956x956 pixels. Since this covers many pixels on the screen and is rather close to the viewer, it is a good candidate for an occluder. Its HOM is created by rendering this object in white against a black background in 256x256 pixels, an image which is called occlusion map 0. This image is subsampled into 128x128 pixels by averaging over 2x2 pixels. This is done recursively down to 8 x 8 pixels. (Model is reused courtesy of Nya Perspektiv Design AB.)

The overlap test against the HOM starts by projecting the bounding volume of the object to be tested onto the screen (Zhang et al. [Zhang97] used oriented bounding boxes). This projection is then bounded by a rectangle, which then covers more pixels than the object enclosed in the bounding volume. So this test is a conservative test, meaning that even if the test results show that the object is not occluded, it may still be so. This rectangle is then compared against the HOM for overlap. The overlap test starts at the level in which the size of the pixel in the HOM is approximately the size of the rectangle. If all pixels in the rectangle are opaque (which means fully white for non-approximate culling), then the rectangle is occluded in the xy plane and the object is said to pass the test. On the other hand, if a pixel is not opaque, then the test for that pixel continues recursively to the subpixels in the HOM which are covered by the rectangle, meaning that the resolution of the occlusion maps increases with each test.

For approximate visibility culling, the pixels in the HOM are not compared to full opacity, i.e., white, but rather against an opacity threshold value, a gray-scale value. The lower the threshold value, the more approximate the culling. The advantage here is that if a pixel is not fully opaque (white) but still higher than the threshold, then the overlap test can terminate earlier. The penalty is that some object may be omitted from rendering even though it is (partially) visible. The opacity values are not constant from one level to another in the HOM, as shown in the following example.

Example: Computation of opacity threshold values

Assume the rendered image is 1024x1024 pixels and that the lowest level in the HOM (i.e., the one with the largest resolution) has a resolution of 128x128 pixels. A pixel in this level-zero occlusion map corresponds to an 8x8-pixel region in the rendered image. Also assume that a 2x2 region of black pixels in an 8x8 region can pass as a negligible hole. This would give an opacity value O=1-2^2/8^8=0.9375. The next level in the HOM would then have a 64x64 resolution, and a pixel at this level would correspond to 16x16 pixels in the rendered image. So the opacity threshold at this level would be O=1-2^2/16^2 which is approximately 0.984.

We will now derive a recursive formula for computing the opacity values of the different levels in the HOM. The opacity of the level with the highest resolution in the HOM is O0=1-n/m, where n is equal to the number of black pixels that can be considered a negligible hole, and m is the number of pixels in the rendered image represented by one pixel in this occlusion map (m=8x8 in the example above). The next level in the HOM has a threshold of O1=1-n/(4m)=1-(1-O0)/4=(3+O0)/4. This reasoning can be generalized to the formula in below for the kth level in the HOM.

Ok+1=(3+Ok)/4

For more details on this topic, consult Zhang's Ph.D. thesis [Zhang98].

For the one-dimensional z-depth test, we must be able to determine whether an object is behind the selected occluders. Zhang [Zhang98] describes a number of methods, and we choose to describe the depth estimation buffer, which provides reasonable estimation and does not require a Z-buffer. It is implemented as a software Z-buffer that divides the screen into a number of rectangular regions that are rather large in relation to the pixel size. The selected occluders are inserted into this buffer. For each region the farthest z-value is stored. This is in contrast to a normal Z-buffer, which stores the nearest z-value at each pixel. An estimation is used to obtain a far value for an occluder quickly. The z-value of the farthest vertex of the bounding box is used to estimate the farthest z-value of an occluder. An example of a depth estimation buffer is shown in Figure 8.

Figure 8. Illustration of the depth estimation buffer. Illustration after Zhang [Zhang98].

The depth estimation buffer is built for each frame. During rendering, to test whether an object passes the depth test (i.e., whether it is behind the occluders) the z-value of the nearest vertex of its bounding box is computed. This value is compared against the z-values of all regions in the depth estimation buffer that the bounding box rectangle covers in screen space. If the near value of the bounding box is larger than the stored z-depth in all regions, then the object passes the depth test, and is thus occluded in the depth direction. A resolution of 64x64 regions in the depth estimation buffer was used by Zhang et al. [Zhang97].

For an object to be occluded, it must thus first pass the overlap test; i.e., the rectangle of the projected bounding volume of the object must pass the HOM test. Then it must pass the depth test, i.e., it must be behind the occluders. If an object passes both tests, the object is occluded and is not rendered.

Before the algorithm starts, an occluder database is built, where a few criteria are used to exclude certain objects [Zhang98]. First, small objects do not work well in the occluder database, since they usually cover a small portion of the image unless the viewer is very close to them. Second, objects with a high polygon count are also avoided, as these may negatively affect the performance of the rendering of the occlusion map. Third, objects with large or ill-shaped bounding boxes (e.g., a bounding box for a skinny polygon) should be avoided, as they may cause the depth estimation buffer to be too conservative. Finally, redundant objects are avoided: for example, a clock on a wall does not contribute as much to occlusion as the wall itself.

At runtime, occluders are selected from the database. To avoid allowing the creation of the HOM to become a bottleneck, there is a limit to the number of occluders that can be selected. These are selected with respect to their distance from the viewer and to their size. Only objects inside the view frustum are selected. A case when this does not work well is shown in Figure 9.

Figure 9. Scenario where the occluder selection for the HOM algorithm may be unsuccessful. The algorithm may select the dark gray objects as occluders, but when the light gray objects (which also are good occluders) are considered, the occluder count budget may have already been reached.

The number of occluders can vary during runtime using an adaptive scheme. If the portion of the scene that is culled away is low, then this number may be increased. On the other hand, if a high portion is culled away but the frame rate is low, then this number may be decreased [Zhang97]. Another technique that can be used here is to render simplified versions of the occluders. This works well as long as the occluders cover approximately the same pixels as they did before [Zhang98].

For extremely dense scenes, i.e., those with high depth complexity, the HOM algorithm was able to cull away between about 50% and 95% of the scene, with a speed-up of up to six times in some cases.


VISUALIZE: fx's Hardware Implementation

Hewlett-Packard has implemented occlusion culling in the VISUALIZE fx graphics hardware [Scott98]. The algorithm works as follows. When an object is about to be rendered, each pixel covered by its bounding box is scan-converted and tested against the contents of the Z-buffer using special-purpose hardware. If all of these pixels are further away from the viewer than the contents already in the frame buffer, then the object is guaranteed to be obscured, and it is thus not necessary to process the model inside the bounding box. Otherwise, the model is processed and rendered into the frame buffer.

This means that if a complex object is obscured, then instead of drawing the whole object, only a bounding box (consisting of at most three quadrilaterals) is scan-converted (but not drawn into the frame buffer). In this case, we gain performance by avoiding sending the complex object through the rendering pipeline. Otherwise, the bounding box is scan-converted and the object is drawn, and we actually lose a bit of performance.

Note also, that as for most occlusion culling algorithms, the performance is dependent on the order in which objects are drawn. As an example, consider a car with a motor inside it. If the hood of the car is drawn first, then the motor will (probably) be culled away. On the other hand, if the motor is drawn first, then the hood of the car will not be culled. Therefore, performance can be improved by techniques such as rough front-to-back sorting of the objects by their approximate distance from the viewer and rendering in this order.

With such techniques, performance has been reported to be between 25% and 100% faster than rendering that does not use any occlusion culling [Scott98].

Shadow Culling

Here we describe briefly the work by Coorg and Teller [Coorg96,Coorg97] and Hudson et al. [Hudson97b]. These algorithms are quite similar, and the main idea is to select a small set of large occluders and discard the objects behind the occluders, i.e., objects that are shadowed by an occluder with respect to a certain viewpoint. This is done with some geometric calculations. The basic tests are shown in Figure 10, where the objects are represented by bounding boxes.

Figure 10. The left figure shows how Coorg and Teller's algorithm detects whether an object (the box) is occluded by a large polygon. If the viewer is in the region marked "occluded," then the object is occluded. The right figure shows how Hudson et al. detect occlusion - objects fully in shadow are occluded.

Coorg and Teller [Coorg97] make use of separating planes and. A separating plane is formed by an edge of an occluder polygon and a vertex of the bounding box (BB), and in such a way that the objects are on different sides of the plane. A supporting plane is constructed in a similar way except that the occluder and the BB should be located on the same side of the plane. An object is occluded if the viewer is inside all of the supporting planes, which means that the object is in shadow with respect to the viewer and the occluder. Coorg and Teller also describe a way of using several polygons that share edges as an aggregate occluder, and also an efficient way of computing the separating/supporting planes using preprocessing and runtime table look-ups.

Hudson et al. [Hudson97b] use an algorithm that culls away AABBs or OBBs in much the same way as we have done for a view-frustum algorithm, but the frustum does not have a far plane and may have more than four side planes, i.e., more than a left, right, bottom, and top plane. The near plane is the plane in which the occluder lies.

To select a good occluder, Coorg and Teller [Coorg97] use the following metric, which estimates the solid angle that a polygon subtends:

g= -a dot(n, v)/dot(d, d)

Here, a is the area of the polygon, n is the normal of the polygon, v is the view direction vector, and d is the vector from the viewpoint to the center of the polygon. Both v and n are assumed to be normalized. The geometry involved is shown in Figure 11.

Figure 11: The geometry involved in
the estimation of the solid angle.

The solid angle is the two-dimensional angle concept extended to three dimensions [Glassner95b]. In two dimensions, an angle of 2*pi radians covers the whole unit circle. If we extend this to three dimensions, the solid angle would cover the whole area of the unit sphere (4*pi steradians). The higher the value of g, the better the occluder is to use. The solid angle approximation estimates the "usefulness" of an occluder because: a) the larger the area, the larger the value; b) the value is inversely proportional to the distance to the occluder; and c) the maximum value is reached when the viewer looks at a polygon "head-on," and the value decreases with an increasing angle between the polygon normal and the view direction [Coorg97].

Hudson et al. use the same formula but also test with random sampling to determine whether a selected occluder is good in practice. They do this by choosing a number of random viewpoints and counting the number of objects occluded by the selected occluder. They also exploit coherence in that they assume that an occluder that was good (or bad) for one frame is probably good (or bad) the next frame too. The algorithm stores the objects that were culled by an occluder for each frame. Hudson et al. found that using about eight occluders was reasonable for their algorithm.

Both of these algorithms use a hierarchical data structure to represent the scene. Coorg and Teller use a k-d tree, and Hudson et al. use a bounding volume hierarchy of AABBs. At runtime the algorithms first perform view-frustum culling, and then identify good occluders. After that, the remaining objects are culled against the occluders.

Coorg and Teller's [Coorg97] preprocessing phase used very few seconds, and the speed-up was approximately 2 to 5 times, depending on the architecture. Hudson et al. [Hudson97b] note a speed-up of 55%, on average. Remember, though, that these tests were done on different platforms and for different scenes.


References

[Coorg96] Coorg, S., and S. Teller, "Temporally Coherent Conservative Visibility", appeared in the Twelfth Annual ACM Symposium on Computational Geometry, May 1996.

[Coorg97] Coorg, S., and S. Teller, "Real-Time Occlusion Culling for Models with Large Occluders", in Proceedings 1997 Symposium on Interactive 3D Graphics, pp. 83-90, April 1997.

[Cripe98] Cripe, Brian and Thomas Gaskins, "The DirectModel Toolkit: Meeting the 3D Graphics Needs of Technical Applications", Hewlett-Packard Journal, pp. 19-27, May 1998. http://www.hp.com/hpj/98may/ma98a3.htm

[Durand97] Durand, Frédo, George Drettakis, and Claude Puech, "The Visibility Skeleton: A Powerful and Efficient Multi-Purpose Global Visibility Tool", Computer Graphics (SIGGRAPH 97 Proceedings), pp. 89-100, August 1997. http://w3imagis.imag.fr/Membres/Fredo.Durand/PUBLI/siggraph97/index.htm

[Durand97b] Durand, Frédo, George Drettakis, and Claude Puech, "The 3D Visibility Complex: a unified data - structure for global visibility of scenes of polygons and smooth objects", in Canadian Conference on Computational Geometry, pp. 153-158, August 1997.

[Glassner95b] Glassner, Andrew S., Principles of Digital Image Synthesis, vol. 2, Morgan Kaufmann Publishers Inc., San Francisco, 1995.

[Greene93] Greene, Ned, Michael Kass, and Gavin Miller, "Hierarchical Z-Buffer Visibility", Computer Graphics (SIGGRAPH 93 Proceedings), pp. 231-238, August 1993.

[Greene94b] Greene, Ned, and Michael Kass, "Error-Bounded Antialiased Rendering of Complex Environments", Computer Graphics (SIGGRAPH 94 Proceedings), pp. 59-66, July 1994.

[Greene95] Greene, Ned, Hierarchical Rendering of Complex Environments, Ph.D. Thesis, University of California at Santa Cruz, Report No. UCSC-CRL-95-27, June 1995.

[Greene96] Greene, Ned, "Hierarchical Polygon Tiling with Coverage Masks", Computer Graphics (SIGGRAPH 96 Proceedings), pp. 65-74, August 1996.

[Hudson97b] Hudson, T., D. Manocha, J. Cohen, M. Lin, K. Hoff and H. Zhang, "Accelerated Occlusion Culling using Shadow Frusta", Thirteenth ACM Symposium on Computational Geometry, Nice, France, June 1997.

[Samet89a] Samet, Hanan, Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS, Addison-Wesley, Reading, Massachusetts, 1989.

[Samet89b] Samet, Hanan, The Design and Analysis of Spatial Data Structures, Addison-Wesley, Reading, Massachusetts, 1989.

[Scott98] Scott, N., D. Olsen, and E. Gannett, "An Overview of the VISUALIZE fx Graphics Accelerator Hardware", Hewlett-Packard Journal, pp. 28-34, May 1998. http://www.hp.com/hpj/98may/ma98a4.htm

[Zhang97] Zhang, H., D. Manocha, T. Hudson, and K.E. Hoff III, "Visibility Culling using Hierarchical Occlusion Maps", Computer Graphics (SIGGRAPH 97 Proceedings), pp. 77-88, August 1997. http://www.cs.unc.edu/~zhangh/hom.html

[Zhang98] Zhang, Hansong, Effective Occlusion Culling for the Interactive Display of Arbitrary Models, Ph.D. Thesis, Department of Computer Science, University of North Carolina at Chapel Hill, July 1998.

Tomas Möller has a MSc in Computer Science and Computer Engineering (1995) from Lund Institute of Technology, Sweden, and a PhD in Computer Graphics (1998) from Chalmers University of Technology, Sweden. He has worked in the graphics industry for five years, and is currently a project scientist at the Department of Computer Engineering, Chalmers University of Technology. During 2000, he will be a visiting scholar at the graphics group at the University of California at Berkeley.

Eric Haines has a B.S. in Computer Science from RPI (1980) and an M.S. in the area of Computer Graphics from Cornell University (1985). He has worked in the field of computer graphics for 16 years. During this time he has developed rendering software for Autodesk, Hewlett-Packard, Spatial, and 3D/EYE. He has contributed to a number of books on rendering, has taught courses at SIGGRAPH, and is an editor for the "journal of graphics tools," among other activities.

Return to the full version of this article
Copyright © UBM Tech, All rights reserved