
A
Real-Time Procedural Universe, Part Two: Rendering Planetary Bodies
By
Sean O'Neil
Gamasutra
August
10, 2001
URL: http://www.gamasutra.com/features/20010810/oneil_01.htm
In Part One of
this series I explained how to use a function based on Perlin noise and fractal
Brownian motion (fBm) to dynamically generate any point on a planetary body
from a set of coordinates. This article will focus on how to render those planetary
bodies at real-time speeds using my own spherical version of the ROAMing (Real-time
Optimally Adapting Mesh) terrain algorithm. For those of you who read Part One
and downloaded the demo I provided, make sure you download the latest demo.
I've made some significant performance improvements since then, and they are
explained in this article.
There have been a number of articles published in the past few years on adaptive
mesh algorithms, which are also called dynamic or continuous LOD (Level-Of-Detail)
algorithms. Some good examples are "Real-Time Dynamic Level of Detail Terrain
Rendering with ROAM" and "Continuous LOD Terrain Meshing Using Adaptive
Quadtrees". However, all the articles I've read so far have drawn flat
landscapes based mainly on a pre-built 2D height map. Using a little creative
thinking, I managed to come up with a spherical version that dynamically generates
height values as needed.
As a side note, when I wrote the initial version of this spherical ROAM algorithm
(more than a year ago), I couldn't find anything on the Internet regarding spherical
DLOD algorithms. Since then, a number of projects have popped up using various
techniques. I still haven't seen anyone do it quite the same way I did, but
I'll provide a list of reference links at the end of the article and leave it
up to you to compare the pros and cons of various methods.
The Traditional ROAM
Algorithm
If you haven't read anything about ROAM, I recommend that you read the online
paper: "ROAMing Terrain: Real-time Optimally Adapting Meshes". For
those who want to skip the gory details, I'll go over the concept more briefly
here. The ROAM algorithm attempts to generate an optimal triangle mesh for quickly
rendering large-scale terrain maps. It starts with a single square, which consists
of two right triangles, covering the entire map. If more detail is needed, a
vertex is added to the center of the square, splitting the two right triangles
into four. Based on a view-dependent priority calculation, triangles are split
recursively in the same fashion until the desired level of detail is reached.
ROAM implementations typically implement two main functions that are called
every frame, one called Update()
to update the mesh and one called Draw()
or Render() to draw it.
To determine when more detail is needed in the mesh, each triangle is assigned
a split priority. A priority threshold is set, and any triangle that exceeds
the threshold is split. A triangle's priority is calculated by determining the
amount of visible error it contains. Visible error is essentially the actual
amount of error in the triangle divided by the distance from the camera to that
triangle. For any triangle that can be culled from the rendering process (i.e.
outside the view frustum or facing away from the camera), the visible error
should be 0 because the triangle isn't visible.
When splitting triangles in a 3D mesh, care must be taken to ensure that no
cracks or seams appear in the mesh. The ROAM algorithm handles this by following
one simple rule: except at the edges of the map, only a square can be split.
In this case a square is defined as two right triangles of the same size sharing
their longest edge. When you need to split a triangle whose neighbor along the
longest edge is not the same size, you must split its neighbor first. If the
neighbor isn't part of a square, then its neighbor must be split before that.
This check continues recursively until a square is reached or the edge of the
map is reached, and all triangles along the path are split as the recursion
unwinds. The figure below illustrates how this works:
![]() |
|
Figure
1: Splitting triangle T requires multiple splits to avoid creating cracks
in the mesh.
|
Attempting to split triangle T requires four new vertices to be added to the
mesh. Those vertices and the new edges created are highlighted in red and numbered
in order. The first vertex makes the bottom-right corner of the map into a square.
The second vertex makes a diagonal square in the bottom center. The third vertex
makes triangle T part of a square. Last but not least, the fourth vertex splits
triangle T.
Because the mesh starts with two triangles and each is split recursively into
two smaller triangles, a binary tree is often used to store the ROAM mesh in
memory. In this case, each node in the tree contains a triangle at a certain
level of detail, and each frame all the leaf nodes in the tree are rendered.
Along with containing pointers to its vertices, the triangle object usually
contains pointers to its neighbors and members dealing with its split priority.
To keep the mesh optimal as the camera moves around, we need to be able to remove
triangles that are no longer needed. The rule for merging triangles back into
their parent triangles is that, except at the edges of the map, only a diamond
can be merged. A diamond is defined as four right triangles with their 90-degree
angles sharing the same vertex, which is equates to a square that has just been
split. To merge the diamond, just remove the center vertex and turn it back
into a square. Unlike splitting a triangle through recursion, you must wait
until a triangle becomes part of a diamond before you can merge it.
In addition, it's not very efficient to figure out if a particular triangle
is part of a diamond and then figure out if that diamond needs to be merged.
The best way to handle this seems to be to maintain a list of current diamonds
at all times. Every split creates 1 diamond and destroys 0-2 diamonds, and every
merge does the opposite. The diamond structure should contain pointers to its
triangles and members dealing with its merge priority. Every frame merge priorities
get checked, and all diamonds that fall below the split threshold get merged.
One of the best things about ROAM is the amount of control you have over the
performance vs. visual quality trade-off. You can control it by fine-tuning
the priority calculations, changing the priority threshold, setting limits on
the frame rate, setting limits on the triangle count, or setting limits on the
number of split/merge operations to perform per frame.
Probably the largest problem with ROAM is that mesh changes from one frame to
the next might cause a visible "pop" in the terrain. This artifact
can be reduced by a process called "geomorphing", where you gradually
move vertices from the old position to the new position over a number of frames,
smoothing the pop and making the transition less noticeable. You can also slowly
change the vertex normal, color, and texture coordinates if necessary.
Mapping ROAM to
a Sphere
Because the traditional ROAM algorithm only allows you to split a square, you
have to start with a solid 3D shape that consists only of squares. To prevent
cracks from forming in the mesh, the edges of these squares must fit together
seamlessly in all directions. Each square in this shape will be treated like
a top-level square, and each triangle like the root of a separate binary tree.
Neighbor pointers have to be set up between these top-level triangles properly
so that splitting at the edge of one top-level square can split the corresponding
triangle on the adjacent edge.
The obvious 3D shape that consists of all squares with edges wrapping seamlessly
in all directions is a cube. Because a subdivided cube doesn't give a well-proportioned
sphere, I spent a lot of time trying to come up with a more appropriate shape,
but I couldn't find anything that fit the requirements. Later on, I realized
that I was wasting my time because the split priorities would quickly take care
of the sphere's proportions. If there is too much error in one triangle because
its longest edge is out of proportion, it will be split.
![]() |
|
Figure
2 - Screenshots of the spherical ROAM algorithm at different levels of
detail.
|
Once you set up the 12 root triangles of the cube with the proper neighbor pointers,
everything falls into place and the traditional ROAM algorithm works beautifully.
In fact, it's even easier to implement than the flat ROAM algorithm because
you don't have to test for conditions at the edges of the map. The only thing
you need to treat differently is how you implement your height values. Instead
of simply adjusting a vertex's y coordinate, you need to treat your height value
as an offset from the sphere's radius. Mathematically, treat each new vertex
as a direction vector from the center of the sphere. Normalize the vector and
multiply it by the sphere's radius plus the height offset.
One other simplification I made to the ROAM algorithm was to use a linked list
of triangles instead of a binary tree. The tree structure takes up more memory,
slows the routine down, and is more complicated to implement. A linked list
is sufficient as long as each triangle keeps a pointer to its edge neighbors
and the triangle it was split from (its parent). The list is initialized with
the 12 top-level triangles, which have their edge pointers set up properly and
their parent pointers set to NULL. When you split a square, add two new triangles
to the list with their parent pointers set to the two existing triangles, then
add a vertex and rearrange the vertices and edge pointers of the four triangles.
Do the reverse when merging a diamond.
To simplify the split and merge operations, it is a good idea to maintain the
same order of the vertices and neighbor pointers in the triangle object. I chose
to order my vertices (0, 1, 2) counter-clockwise with vertex 1 always being
opposite the longest edge. My edge neighbor pointers are ordered based on the
edge shared, 0-1, 1-2, and 2-0 respectively. Since vertex 1 is opposite the
longest edge, this means that the last edge pointer (i.e. array index 2) always
shares the longest edge.
![]() |
|
Figure
3: Vertex and edge pointer indexes (edges in red). When splitting, go
from left to right. When merging, go from right to left.
|
Note from Figure 3 that it's not that hard to determine whether a triangle is part of a square or diamond using edge pointers. If triangle1->edge[2]->edge[2] = triangle1, then triangle1 is part of a square. Otherwise, you need to split triangle1->edge[2] to make triangle1 part of a square. If triangle1->edge[0]->edge[0]->edge[0]->edge[0] = triangle1 (you can also replace those 0's with 1's), then the triangle is part of a diamond.
Optimizing the
Update() function
Now that you have the basic idea behind my spherical ROAM algorithm, you need
to know how to optimize it for a dynamic planet generation algorithm like the
one I explained in Part One. The first key to making it fast enough for real-time
rendering is to realize that the fractal function you're calling is very slow.
Even if you change it to use some other type of algorithm, it is relatively
safe to assume that it won't be a cheap function to call. Your only hope to
get it to run fast enough is to minimize the number of calls to that function.
Keep in mind that you have to update the priority of every triangle in the mesh
each frame to determine whether it needs to be split or not. That priority depends
on the height of the new vertex, which is determined by calling the fractal
function. So to keep from having to call the fractal function for every triangle
every frame, you need cache the new height value in your triangle object. So
as each triangle is created from a split, you should call the fractal function
to get the height value for the next potential split.
Once I had that implemented, I realized I was still calling the fractal routine
twice as often as necessary. Because each triangle shares its longest edge with
another triangle in a square, the second triangle to be created for that square
was making the same call to the fractal routine. It's easy enough to determine
whether a triangle is completing a square at the time of its creation, and if
so borrow the calculated height value from its neighbor. Also note that there
is also no need to call the fractal routine when you merge a diamond because
the vertex you are removing has the information you need. Just take the vertex's
height value and use it to compute the offset to put into the merged triangle
objects.
Next I focused on the priority calculations. The first thing to keep in mind
is that if the merge priority is not calculated in the same way the split priority
is calculated for a triangle's parent, you quickly get into a situation where
each frame you will split several triangles only to have them merged together
again right away. You can also get into a situation where unneeded triangles
are kept around long after they're needed. Because I'm not using a binary tree,
I'm not keeping track of the original state of the parent triangles. This was
easy enough to solve by adding a few members to my diamond object to keep track
of the necessary information.
There are a number of other changes I can think of off the top of my head that
could theoretically speed it up even more. The simplest way is to come up with
some logic to avoid doing a priority check on every triangle every frame. At
50,000 triangles, this can be quite a burden on the CPU. I recently read an
interesting article that described an improved split priority concept. In this
article, the priority calculations were pre-calculated to determine the distance
at which a triangle would be split or a diamond merged, and updates were driven
mainly off of distance checks. The author then grouped polygons together into
a tree using bounding spheres to greatly reduce the number of distance checks
that needed to be made each frame. I'm not clear on the details surrounding
their implementation, but they reported a dramatic improvement in their Update()
function.
Optimizing the Draw() function
Next I concentrated on the Draw()
function, and switched from direct calls to glVertex,
glNormal, and glTexCoord
to a vertex array. I created a dynamically sizing array class, and made it global
so that all ROAM objects use the same vertex array. This optimization gave a
noticeable boost in frame rate, which illustrated to me that my optimizations
to the Update() function had
paid off and the bottleneck had moved over to the rendering pipeline. If the
rest of the code had been poorly optimized, I would've seen a very small improvement
optimizing the rendering pipeline.
Then I looked at the Nvidia documentation to find the fastest way to push non-static
triangles to the video card. I found that because I had already implemented
a vertex array, it took very little effort to implement the GL_NV_vertex_array_range
and GL_NV_fence extensions.
These extensions allow you to copy vertices directly to AGP or video memory,
and when you render with the standard glDrawElements
function, it renders them extremely quickly. I believe the effect is the same
as using DirectX's vertex buffers stored in video memory.
Given all these optimizations, I get about 20 FPS rendering 50,000 triangles
using a 750 MHz Duron with a GeForce2 MX video card. When I toggle the mesh
updates (i.e. skipping the Update()
call) on the same machine, it speeds up to 45 FPS with 50,000 triangles, so
there's still room for improvement in the update routine. The only further optimizations
I can think of making to the Draw()
function would be to see if I could convert my triangle lists to a set of strips
or fans, but that would require even more changes to the Update()
function to maintain the list of strips and fans. Given the frame rates I'm
currently getting, I can't imagine it would improve performance much unless
I came up with a simple way to maintain them.
Looking at the
Code
I want to go over the classes I wrote for this ROAM algorithm so you can get
a good idea of how they work. To keep it concise, I will only list and explain
key classes, members, and methods here. This is just to explain some of the
finer points of what I did to make it easier for you to go through the demo
source and modify it or come up with your own implementation based on it.
CROAMVertex:
class CVertex
{
public:
static CVertexArray Array;
CVector m_vPosition;
// The vertex position
CVector m_vNormal; //
The vertex normal
float m_fTexCoord[4]; //
2 sets of texture coordinates
};
The CVertex class doesn't have
any ROAM-specific members or methods in it, it's just a regular vertex with
space for two sets of texture coordinates for multi-texturing. Note the static
CVertexArray object, which
is currently used as a global OpenGL vertex array shared by all ROAM instances.
This needs to be global because I'm only allocating room for 65536 vertices
in video memory (Nvidia-specific), and I need to make sure one ROAM object doesn't
step on another. If necessary I could make a second global array, perhaps to
share non-ROAM vertices, and it would allocate its own block of video memory
and work in a similar fashion.
CVertexArray
#define VERTEX_ARRAY_SIZE 65535
class CVertexArray
{
protected:
// Members for the vertex array
unsigned short m_nStackPosition; //
Index into the stack
unsigned short *m_pStack; //
A stack of free elements
unsigned char *m_pUpdated; //
An array of updated flags
CVertex *m_pBuffer;
// Contains the actual vertices
//
Members for GL_NV_vertex_array_range and GL_NV_fence extensions
bool m_bFence; //
true if extensions are enabled
CVertex *m_pArray; //
pointer to allocated video memory
GLuint m_nFence; //
utilize fence synchronization
//
Members for GL_ARB_multitexture extension
int m_nMaxTextureUnits; //
Is multi-texturing allowed?
public:
// Init and Cleanup methods
void Init(); //
Initializes extensions, allocates video memory
void Cleanup(); //
Frees video memory
unsigned
short GetElement(); //
Call to get a new array index
void UpdateElement(unsigned short); //
Call after updating a vertex
void ReleaseElement(unsigned short); //
Call to release a vertex
//
OpenGL vertex array methods
void InitRender(); //
Call before rendering each frame
void FinishRender(); //
Call after rendering each frame
int GetMaxTextureUnits();
void EnableVertexArray();
void DisableVertexArray();
void EnableNormalArray();
void DisableNormalArray();
void EnableTextureCoordArray(int, int);
void DisableTextureCoordArray(int);
};
The CVertexArray class implements
the OpenGL vertex array I use for my ROAM algorithm. To properly encapsulate
an OpenGL vertex array, it must handle the GL_NV_vertex_array_range
and GL_NV_fence extensions,
and it must contain wrappers to call functions like glVertexPointer(),
glNormalPointer(), and glTexCoordPointer().
As you can see here, it keeps a stack of free array indexes into the vertex
array. When the ROAM split function needs to add a new vertex to its mesh, it
calls GetElement() to get
a new index into the array. When the ROAM merge function needs to remove a vertex
from its mesh, it calls ReleaseElement()
to put that index back on the free stack.
CROAMTriangle
class CROAMTriangle : public CListNode<CROAMTriangle>
{
public:
// Triangle members
unsigned short m_nVertex[3];
// Indices into CVertex::Array
CROAMTriangle *m_pEdge[3]; //
Pointers to edge neighbors
CROAMTriangle *m_pParent; //
Pointer to the parent triangle
CROAMDiamond *m_pDiamond; //
Set if this triangle is in a diamond
//
Split priority members
CVector m_vMidpoint; //
The midpoint of the longest edge
float m_fOffset; //
The offset in height
float m_fLength; //
The length of the longest edge
unsigned char m_nFlags;
float
GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
void Draw(GLenum nMode, unsigned short *nArray,
unsigned short &nIndex);
};
This class requires a little more explanation. As I mentioned above, my ROAM
implementation uses a linked list of triangles instead of a binary tree. The
list is initialized with the 12 top-level triangles, each with its parent pointer
set to NULL. When a square is split I create two new triangles and set their
parent pointers to the two existing triangles in the square, then I rearrange
the vertices and edges of all four triangles. This means that the true parent
triangle doesn't really exist in the list anymore. But as long as I have a pointer
linking a triangle with the triangle it was split from, rebuilding the parent
is easy when it needs to be merged.
Every time I create a new triangle during a split operation, I cache the midpoint
and length of its longest edge (the edge to be split). I also cache the offset
from that midpoint to the vertex's correct location (i.e. the error). If the
new triangle is completing a square, then the values are copied from its neighbor.
Otherwise, the values are calculated. All three values are used in the priority
calculation, which also takes into account the camera position and heading and
the distance to the horizon.
Initially the GetPriority()
method returned 0 if the distance to the vertex was greater than the horizon
or it was out of the current angle of view, but this caused problems. Essentially,
if you turned the camera away from a planet while close to it, all polygons
would be merged and the sphere would be turned into a cube. When you turned
the camera back to face the planet, the cube would not always be split because
the vertices checked were so far away from the camera that they were either
beyond the horizon or outside the angle of view. So now a priority is always
calculated if the camera is closer to the vertex than the length of the edge
to be split.
The Draw() method simply fills
in an array of indices for a future call to
glDrawElements(). I've toyed with adding a check to skip drawing triangles
that aren't in the current view, but I've got that commented out right now so
I can turn off updates from a specific viewpoint, then fly around the planet
in wire-frame mode to see how well the mesh is optimized. You may get a small
improvement in performance by un-commenting the lines dealing with the TriangleDrawMask
flag.
CROAMDiamond
class CROAMDiamond : public CListNode<CROAMDiamond>
{
public:
// Triangles that make up the diamond
CROAMTriangle *m_pParent[2];
CROAMTriangle *m_pChild[2];
//
Merge priority members
CVector m_vMidpoint; //
The midpoint of the longest edge
float m_fOffset; //
The offset in height
float m_fLength; //
The length of the longest edge
unsigned char m_nFlags;
float
GetPriority(CVector &vPosition, CVector &vHeading, float fHorizon);
};
This class is a good bit simpler than CROAMTriangle,
but it is just as important. A diamond is made up of four triangles. Because
two of them are always parents of the other two, the parent and child pointers
are kept separately to make the merge routine simpler. Note how the priority
members stored in CROAMTriangle
are duplicated in this class. These values are what would've been cached in
the original parent triangle that gets destroyed during a split. Caching them
in CROAMDiamond allows us
to calculate a merge priority that matches the parent's split priority exactly.
Note that the diamonds are in a linked list, each contains a pointer to its
4 triangles, and a triangle contains a pointer to the diamond its currently
in. This is pretty redundant, but it is necessary to get good performance. When
splitting a triangle, you need to be able to find and destroy the diamond it's
in (if any) very quickly. I used to scan the list of diamonds, thinking that
the list would stay rather small, but this killed my performance when a lot
of splits needed to be done each frame.
CROAMSphere
class CROAMSphere : public C3DObject
{
protected:
CROAMTriangleList m_llTriangle; //
List of triangles in the mesh
CROAMDiamondList m_llDiamond; //
List of diamonds in the mesh
CROAMAlgorithm *m_pAlgorithm; //
A planet algorithm
int m_nTriangles; //
The current triangle count
void
Init(CROAMAlgorithm *pAlgorithm);
bool CollisionCheck(C3DBase *pCamera);
void Split(CROAMTriangle *pTriangle);
void Merge(CROAMDiamond *pDiamond);
bool Update(C3DBase *pCamera, float fMaxError);
void Draw(C3DBase *pCamera, bool bTexture=true);
};
We finally get to the main ROAM class. The key member variables should be about
what you expected at this point: a triangle list, a diamond list, and an algorithm
object for retrieving the height and texture coordinates of new vertices. The
Init() method
takes an initialized algorithm object, and builds the 12 top-level triangles
of the cube. The CollisionCheck()
method calls the fractal function to determine whether the camera's position
is inside the planet or not. This is a faster and more accurate way to do collision
detection than to check against the current mesh.
The Split()
method looks messy right now and could probably be cleaned up a bit, but it
is fairly easy to explain with some pseudo-code:
void CROAMSphere::Split(CROAMTriangle
*pTriangle)
{
if(pTriangle->neighbor does not make a square with pTriangle)}
Split(pTriangle->neighbor)
pOpposite = pTriangle->neighborpVertex = CVertex::Array.GetNewElement()
Initialize pVertex's position, normal, and texture coordinatesCreate a new triangle and add it to the triangle list
Split pTriangle into the new triangle
Create a new triangle and add it to the triangle list
Split pOpposite into the new triangleUpdate the pVertex's normal based on the new triangles
if(pTriangle was part of a diamond)
delete pTriangle->m_pDiamond;
if(pOpposite was part of a diamond)
delete pOpposite->m_pDiamond;
Add the new diamond created by the new split
Final Notes
Now that you've seen how to generate and render a full-size planet at real-time
speed with dynamic level of detail, I encourage you to play around with the
source code and see what you can come up with. It should run pretty well on
most systems with OpenGL-accelerated drivers, and it should fly on NVidia GeForce
cards. Because it only generates and uses memory for vertices the camera is
close to, it should be fairly easy to expand this demo to generate and render
an entire solar system, or an entire galaxy of star systems, without a significant
impact on memory or the CPU. The next article in the series will focus on doing
that and will explain how to fix some of the problems you will run into working
with such a large game world.
http://www.gamasutra.com/features/20010810/oneil_01.htm
Copyright © 2003 CMP Media Inc. All rights reserved.