[In this sponsored feature, Umbra Software discusses the pros and cons of various methods currently being used for occlusion culling and explains its own automated occlusion culling system that can take any type of polygon soup as input.
Founded in 2006, Umbra Software is a Finnish middleware company specializing in graphics rendering technology. The company was spun off from Hybrid Graphics when it acquired the dPVS system and continued its development into what became Umbra occlusion culling middleware. The core engineering team at Umbra has worked in the field of occlusion culling since 2000. Their technology is being used in games from developers such as Bungie, BioWare, 38 Studios, Square Enix Co., IO Interactive, Remedy, Specular Interactive and many others who have helped shape the technology.]
Watch a video of Umbra 3. Contact Umbra here.
When rendering 3D world views in a game, resources spent in processing elements that are invisible to the player are inevitably wasted. These resources could be better used to increase the visual complexity of the visible elements or to decrease the time taken to produce a frame. For this, we must identify the objects that are not visible to the player.
Determining the set of elements that are not visible from a particular viewpoint, due to being occluded by elements in front of them, is known as occlusion culling.
Image 1: The basics of occlusion culling. The red circle in the view frustum is not visible to the camera, because it is being occluded by the blue rectangles. The upper red circle, for its part, is not visible, because it does not intersect the view frustum.
In real-time interactive applications, occlusion culling is traditionally used as a rendering optimization technique. It allows the production of frames at a rate that creates the perception of continuous movement. There is, however, a variety of uses for visibility information outside of pure rendering. Knowing whether an object is currently visible or not can be used to:
- Influence AI behavior
- Simplify or disable physics simulation and animation
- Reduce the amount of network traffic required to replicate player positions across the network.
When assessing the value of an occlusion culling system, some of the desirable properties are:
It works automatically and universally
Ideally, an occlusion culling system automatically works with any kind of 3D content, from simple object viewers to massive virtual worlds, and requires no manual work from the artists building and modeling the game world. Furthermore, the system should not pose restrictions on the creativity of the artist. Finally, the system should not depend on any specific hardware features, rendering conventions, or authoring methods or tools.
It is conservatively correct
A system that sometimes determines fully or partially visible objects to be fully occluded, is bound to produce rendering artifacts, whereas a system that sometimes reports fully occluded objects to be visible can, usually, generate the correct visual output.
It adds value
For rendering purposes, occlusion culling must be judged against the reference solution of simply rendering everything in the view frustum. For example, a sports game based in an arena where only a little of the total amount of content is occluded at any given time, is not a good candidate for an occlusion system. The effort put into determining occlusion is wasted as no benefit can be gained. However, when visual complexity and a lot of detail in complex 3D worlds are required, the benefits of an occlusion system begin to increase significantly.
In this article, we first briefly introduce the problem domain and look at the popular methods currently being used for occlusion culling, highlighting the challenges they pose in the context of game development. Then, we describe a novel approach to occlusion culling, developed to satisfy the needs of our partners and clients building the next generation of game engines.
This approach is called the Umbra 3 occlusion culling system.
The roots of occlusion culling in 3D graphics lie in hidden line and hidden surface determination algorithms. These algorithms are necessary to produce visually correct images of 2D projections in a 3D space.
The problem is simple enough to grasp -- out of the rays of light travelling from surfaces in the world towards your eye, only the ones that do not run into obstacles on the way will contribute to the final image. This is an instance of the visibility problem and the formulation readily suggests one possible solution for 3D rendering: we could simply trace light rays back from the eye and find the first surface that each ray intersects with.
All modern polygon rasterizing renderers, both software and hardware, track the smallest distance value per sample and only update the sample when encountering a distance value smaller than the current minimum. This solution is known as Z-buffering or depth buffering, for the additional buffer of Z values maintained. Given the amount of work already done for 2D projection and raster sampling, the computation of the Z component is relatively cheap and guarantees the correct visual result. Computation can be reduced by introducing primitives in a front-to-back order: rendering the nearest primitive for a given sample first means that the contribution of all other primitives along the same ray can be rejected by the depth test and, therefore, any computation for determining the final output of a hidden sample, such as interpolating vertex attributes or doing texture lookups, can be skipped.
Z-buffering with a front-to-back primitive rendering order gets pretty close to the ideal of only calculating a single value for each output pixel. Unfortunately, the culling takes place very late in the rendering pipeline of a 3D game application. At the point where the renderer rejects an occluded sample, the sample has gone through several stages of the graphics pipeline from feeding the input geometry and dependent resources to the GPU for per-vertex processing, triangle setup and rasterization. Methods exist for early Z-culling in the rendering pipeline, but ultimately the largest computation savings are obtained by culling the largest renderable entities of the engine prior to feeding them into the rendering subsystem.
Traditionally, occlusion culling refers to this type of rendering optimization.
Most runtime occlusion culling strategies tell if an object will be visible by doing the equivalent of per-sample depth testing for the transformed bounds of potentially occluded individual objects or object hierarchies. The challenge is to build a depth buffer estimate of the view before the actual rendering takes place.
One widely used solution is to use a separate depth rendering pass or the depth results of a previous frame on the GPU, and use occlusion queries. An occlusion query returns the number of samples that potentially pass the depth test, without actually processing or writing pixel values. The challenge with this approach is the synchronization between the CPU and GPU required to transfer the test data and to obtain the occlusion information. In practice, it is virtually impossible to use this approach for anything else than rendering optimization.
The upside is that there is no extra work done in generating the depth buffer and it represents the exact final image for any kind of occluder geometry. To allow the CPU and GPU to operate asynchronously, and to minimize the traffic between them, these systems typically use the previous frame depth buffer results and, therefore, cannot guarantee conservative culling.
An alternative is to rasterize a simplified representation of the occluder geometry into a smaller resolution depth buffer on the CPU. To obtain conservative culling, the geometry must not exceed the raster coverage of the real result. Usually the content artists manually create these occlusion models, side-by-side with the real models.
Potentially visible sets (PVS)
When the runtime cost of early depth buffer generation and occlusion testing is not feasible and the occluder geometry is mostly static, a viable alternative is to determine and store the visibility relations between view cells and renderable entities in a preprocess phase. The set of entities determined visible from a view cell is known as the potentially visible set. The runtime operation involves simply finding the view cell of the current camera location and looking up the set from memory. In simple cases, the visibility relations can be constructed manually by the level designer, but the usual method is to sample the visibility from a view cell either by ray casting or rasterizing in all directions. It is difficult to guarantee conservative culling in either case. By increasing the number of samples per view cell, the amount of error can be managed at the expense of time required for the computation. In addition to static target objects, volumetric visibility information in the form of target cells can be stored in the set for the ability to also cull non-static entities.
The generation of potentially visible sets can be automated, but a large amount of data must be generated to obtain reasonable culling results. The sampling time in the preprocess phase slows down the content iteration cycle and the sheer amount of data for representing the visibility relations can quickly become unmanageable. This is particularly so because visibility relations are global in nature; a small change in the occluder geometry can cause changes in visibility relations far away and on all sides of the original change and, therefore, necessitate recomputing the potentially visible set of a large area.
Portals and cells
A third category of occlusion culling systems is based on dividing the static game world into cells and capturing the visibility between adjacent cells with 2D portals. The runtime operation is to find the cell where the camera is and to traverse the formed cell graph, restricting the visibility on the way by clipping the view frustum to the portals traversed. Objects are assigned to cells in a preprocessing phase and their visibility is determined upon by visiting the cell that they are in. This approach works best when there are obvious hot spots in the world for the level designer to place portals at, such as doors or windows connecting rooms in an indoor setting.
The portals-and-cells data is a simplified occlusion model of the world that stores non-occluders and their connectivity, instead of occluders. While accurate and conservative occlusion results can be obtained with a relatively lightweight runtime cost, the work of manually placing cells and portals is very labor intensive, error prone and it dramatically increases the cost of content modification.