|
Features

A
Real-Time Procedural Universe, Part Two: Rendering Planetary Bodies
Mapping ROAM to
a Sphere
Because the traditional ROAM algorithm only allows you to split a square,
you have to start with a solid 3D shape that consists only of squares.
To prevent cracks from forming in the mesh, the edges of these squares
must fit together seamlessly in all directions. Each square in this shape
will be treated like a top-level square, and each triangle like the root
of a separate binary tree. Neighbor pointers have to be set up between
these top-level triangles properly so that splitting at the edge of one
top-level square can split the corresponding triangle on the adjacent
edge.
The obvious 3D shape that consists of all squares with edges wrapping
seamlessly in all directions is a cube. Because a subdivided cube doesn't
give a well-proportioned sphere, I spent a lot of time trying to come
up with a more appropriate shape, but I couldn't find anything that fit
the requirements. Later on, I realized that I was wasting my time because
the split priorities would quickly take care of the sphere's proportions.
If there is too much error in one triangle because its longest edge is
out of proportion, it will be split.
Once you
set up the 12 root triangles of the cube with the proper neighbor pointers,
everything falls into place and the traditional ROAM algorithm works beautifully.
In fact, it's even easier to implement than the flat ROAM algorithm because
you don't have to test for conditions at the edges of the map. The only
thing you need to treat differently is how you implement your height values.
Instead of simply adjusting a vertex's y coordinate, you need to treat
your height value as an offset from the sphere's radius. Mathematically,
treat each new vertex as a direction vector from the center of the sphere.
Normalize the vector and multiply it by the sphere's radius plus the height
offset.
One other simplification I made to the ROAM algorithm was to use a linked
list of triangles instead of a binary tree. The tree structure takes up
more memory, slows the routine down, and is more complicated to implement.
A linked list is sufficient as long as each triangle keeps a pointer to
its edge neighbors and the triangle it was split from (its parent). The
list is initialized with the 12 top-level triangles, which have their
edge pointers set up properly and their parent pointers set to NULL. When
you split a square, add two new triangles to the list with their parent
pointers set to the two existing triangles, then add a vertex and rearrange
the vertices and edge pointers of the four triangles. Do the reverse when
merging a diamond.
To simplify the split and merge operations, it is a good idea to maintain
the same order of the vertices and neighbor pointers in the triangle object.
I chose to order my vertices (0, 1, 2) counter-clockwise with vertex 1
always being opposite the longest edge. My edge neighbor pointers are
ordered based on the edge shared, 0-1, 1-2, and 2-0 respectively. Since
vertex 1 is opposite the longest edge, this means that the last edge pointer
(i.e. array index 2) always shares the longest edge.
Note from
Figure 3 that it's not that hard to determine whether a triangle is part
of a square or diamond using edge pointers. If triangle1->edge[2]->edge[2]
= triangle1, then triangle1 is part of a square. Otherwise, you need to
split triangle1->edge[2] to make triangle1 part of a square. If triangle1->edge[0]->edge[0]->edge[0]->edge[0]
= triangle1 (you can also replace those 0's with 1's), then the triangle
is part of a diamond.
Optimizing
the Update() function
Now that you have the basic idea behind my spherical ROAM algorithm, you
need to know how to optimize it for a dynamic planet generation algorithm
like the one I explained in Part One. The first key to making it fast
enough for real-time rendering is to realize that the fractal function
you're calling is very slow. Even if you change it to use some other type
of algorithm, it is relatively safe to assume that it won't be a cheap
function to call. Your only hope to get it to run fast enough is to minimize
the number of calls to that function.
Keep in mind that you have to update the priority of every triangle in
the mesh each frame to determine whether it needs to be split or not.
That priority depends on the height of the new vertex, which is determined
by calling the fractal function. So to keep from having to call the fractal
function for every triangle every frame, you need cache the new height
value in your triangle object. So as each triangle is created from a split,
you should call the fractal function to get the height value for the next
potential split.
Once I had that implemented, I realized I was still calling the fractal
routine twice as often as necessary. Because each triangle shares its
longest edge with another triangle in a square, the second triangle to
be created for that square was making the same call to the fractal routine.
It's easy enough to determine whether a triangle is completing a square
at the time of its creation, and if so borrow the calculated height value
from its neighbor. Also note that there is also no need to call the fractal
routine when you merge a diamond because the vertex you are removing has
the information you need. Just take the vertex's height value and use
it to compute the offset to put into the merged triangle objects.
Next I focused on the priority calculations. The first thing to keep in
mind is that if the merge priority is not calculated in the same way the
split priority is calculated for a triangle's parent, you quickly get
into a situation where each frame you will split several triangles only
to have them merged together again right away. You can also get into a
situation where unneeded triangles are kept around long after they're
needed. Because I'm not using a binary tree, I'm not keeping track of
the original state of the parent triangles. This was easy enough to solve
by adding a few members to my diamond object to keep track of the necessary
information.
There are a number of other changes I can think of off the top of my head
that could theoretically speed it up even more. The simplest way is to
come up with some logic to avoid doing a priority check on every triangle
every frame. At 50,000 triangles, this can be quite a burden on the CPU.
I recently read an interesting article that described an improved split
priority concept. In this article, the priority calculations were pre-calculated
to determine the distance at which a triangle would be split or a diamond
merged, and updates were driven mainly off of distance checks. The author
then grouped polygons together into a tree using bounding spheres to greatly
reduce the number of distance checks that needed to be made each frame.
I'm not clear on the details surrounding their implementation, but they
reported a dramatic improvement in their Update()
function.
Optimizing the Draw()
function
Next I concentrated on the Draw()
function, and switched from direct calls to glVertex,
glNormal, and glTexCoord
to a vertex array. I created a dynamically sizing array class, and made
it global so that all ROAM objects use the same vertex array. This optimization
gave a noticeable boost in frame rate, which illustrated to me that my
optimizations to the Update()
function had paid off and the bottleneck had moved over to the
rendering pipeline. If the rest of the code had been poorly optimized,
I would've seen a very small improvement optimizing the rendering pipeline.
Then I looked at the Nvidia documentation to find the fastest way to push
non-static triangles to the video card. I found that because I had already
implemented a vertex array, it took very little effort to implement the
GL_NV_vertex_array_range
and GL_NV_fence extensions.
These extensions allow you to copy vertices directly to AGP or video memory,
and when you render with the standard glDrawElements
function, it renders them extremely quickly. I believe the effect is the
same as using DirectX's vertex buffers stored in video memory.
Given all these optimizations, I get about 20 FPS rendering 50,000 triangles
using a 750 MHz Duron with a GeForce2 MX video card. When I toggle the
mesh updates (i.e. skipping the Update()
call) on the same machine, it speeds up to 45 FPS with 50,000 triangles,
so there's still room for improvement in the update routine. The only
further optimizations I can think of making to the Draw()
function would be to see if I could convert my triangle lists to a set
of strips or fans, but that would require even more changes to the Update()
function to maintain the list of strips and fans. Given the frame rates
I'm currently getting, I can't imagine it would improve performance much
unless I came up with a simple way to maintain them.
______________________________________________________
|