|
Features

Building a Million-Particle System
Particle systems have long been
recognized as an essential building block for detail-rich
and lively visual environments. Current implementations
can handle up to 10,000 particles in realtime simulations
and are mostly limited by the transfer of particle data
from the main processor to the graphics hardware (GPU)
for rendering.
This article introduces a full GPU
implementation of both the simulation and rendering
of a dynamically-growing particle system. Such an
implementation can render up to 1 million particles
in real-time on recent hardware. It helps you to dramatically
increase the level of detail and allows you to simulate
much smaller particles. Thus it goes back again towards
the original idea of a particle being a minimal geometry
element.
The massively parallel simulation
of particle physics on a GPU can be a flexible combination
of a multitude of motion and position operations,
e.g. gravity, local forces and collision with primitive
geometry shapes or texture-based height fields. Additionally,
a parallel sorting algorithm is introduced that can
be used to perform a distance-based sorting of the
particles for correct alpha-blended rendering.
Why Particle Systems?
Reality is full of motion, full
of chaos and full of fuzzy objects. Physically correct
particle
systems (PS) are designed to add these essential properties
to the virtual world. Over the last decades they have
been established as a valuable technique for a variety
of volumetric effects, both in real-time applications
and in pre-rendered visual effects of motion pictures
and commercials.
Particle systems have a long history
in video games and computer graphics. Very early video
games in the 1960s already used 2D pixel clouds to
simulate explosions. The first publication about the
use of dynamic PS in computer graphics was written
after the completion of the visual effects for the
motion picture Star Trek II at Lucasfilm (cf.
[Reeves1983]).
Reeves describes basic motion operations and basic
data representing a particle - neither have
been altered much since. An implementation on parallel
processors of a super computer has been done by [Sims1990].
He and [McAllister2000]
also describe many of the velocity and position operations
of the motion simulation that are used below. The
latest description of CPU-based PS for use in video
games has been done by [Burg2000].
Real-time PS are often limited by
the fill rate or the CPU to graphics hardware (GPU)
communication. The fill rate, the number of pixels
the GPU can draw for each frame, is often a limiting
factor when there is a high overdraw, i.e. single
particles are relatively large and a lot of them overlap
each other. Since the realism of a particle system
simulation increases when smaller particles are used,
the fill rate limitation loses importance. The second
limitation, the transfer bandwidth of particle data
from the simulation on the CPU to the rendering on
the GPU, now dominates the system. Sharing the graphics
bus with many other rendering tasks allows CPU-based
PS to achieve only up to 10,000 particles per frame
in typical game applications. Therefore it is desirable
to minimize the amount of communication of particle
data. This can be achieved by integrating both parts,
simulation and rendering, of this visualization problem
on the GPU.
To simulate particles on a GPU you
can use stateless or state-preserving PS. Stateless
PS require a particle's data to be computed from its
birth to its death by a closed form function which
is defined by a set of start values and the current
time. State-preserving PS allow using numerical, iterative
integration methods to compute the particle data from
previous values and a changing environmental description
(e.g. moving collider objects). Both simulation methods
have their areas of applications and are to be chosen
based on the requirements of the desired effect.
Stateless PS have been introduced
on the first generation of programmable PC GPUs (cf.
[NVIDIA2001])
and are described in "Stateless particle systems".
The state-preserving simulation introduced here is
described in "Particle simulation on graphics
hardware". Besides the particle system itself
additional innovations are the usage of simulated
pixel data as geometry input (cf. "Transfer texture
data to vertex data") and the sorting of this
data with a parallel sorting algorithm (cf. "Sort
for alpha blending"). These innovations are applicable
to other algorithms as well.
Several other forms of physical
simulation have recently been developed for modern
GPUs. [Harris2003]
has used GPUs to perform fluid simulations and cellular
automata with similar texture-based iterative computation.
[Green2003]
describes a cloth simulation using simple grid-aligned
particle physics, but does not discuss generic particle
systems' problems, like allocation, rendering and
sorting. The photon mapping algorithm described by
[Purcell2003]
uses a sorting algorithm similar to the odd-even merge
sort in "Sort for alpha blending". However
their algorithm does not show the necessary properties
to exploit the high frame-to-frame coherence of the
particle system simulation.
Prior work
This section describes two basis
techniques for particle systems: stateless particle
simulation and general-purpose computation on graphics
hardware related to this work.
Stateless Particle Systems
Some PS have been implemented with
vertex shaders (also called vertex programs) on programmable
GPUs [NVIDIA2001].
These PS are however stateless, i.e. they do not store
the current positions and other attributes of the
particles. To determine a particle's position you
need to find a closed form function for computing
the current position only from initial values and
the current time. As a consequence such PS can hardly
react to a dynamic environment.
Particles that are not meant to
collide with the environment and that are only influenced
by global gravity acceleration can be simulated quite
easily with a simple function. But simple collisions
or forces with local influence however lead to rather
complex functions.
Particle attributes besides velocity
and position, e.g. the particle's orientation, size
and texture coordinates, have generally much simpler
computation rules. It is often sufficient to calculate
them from a start value and a constant factor of change
over time, which makes them ideal for a stateless
simulation. This holds true even if the position is
determined with the statepreserving simulation as
described below (cf. "Particle simulation on
graphics hardware").
The strengths of the stateless PS
make it ideal for simulating small and simple effects
without influence from the local environment. In action
video games these might be a weapon impact splash
or the sparks of a collision. Larger effects that
require interaction with the environment are less
suitable for the technique.
General-Purpose Computation on
Graphics Hardware
With the broad availability of programmable
graphics hardware, much research has been done to
explore non-graphical uses of graphics hardware. Besides
the work mentioned in "Introduction", a
good overview about recent research can be found at
[GPGPU2003].
A common abstraction of the programming
model available in graphics hardware is called “stream
programming” (cf. [Buck2003]):
An input data stream is transformed by an autonomous
processing kernel that then produces an output data
stream. The processing kernel itself has read-only
access to the input stream and global data, but it
can only write one output data record.
In graphics hardware terms the input
data stream can be represented by a texture, the output
data stream by a render target. Output data is often
re-used as input in a further processing step. In
that case the data streams are textures as well as
render targets. The processing kernel is represented
by a pixel shader (also called fragment program).
By drawing a full-screen rectangle, the graphics hardware
is instructed to call the pixel shader once for each
output data record, reading from the input stream
in the pixel shader.
______________________________________________________
|