Particle systems have long been recognized as an essential building block for detail-rich and lively visual environments. Current implementations can handle up to 10,000 particles in realtime simulations and are mostly limited by the transfer of particle data from the main processor to the graphics hardware (GPU) for rendering.
This article introduces a full GPU implementation of both the simulation and rendering of a dynamically-growing particle system. Such an implementation can render up to 1 million particles in real-time on recent hardware. It helps you to dramatically increase the level of detail and allows you to simulate much smaller particles. Thus it goes back again towards the original idea of a particle being a minimal geometry element.
The massively parallel simulation of particle physics on a GPU can be a flexible combination of a multitude of motion and position operations, e.g. gravity, local forces and collision with primitive geometry shapes or texture-based height fields. Additionally, a parallel sorting algorithm is introduced that can be used to perform a distance-based sorting of the particles for correct alpha-blended rendering.
Why Particle Systems?
Reality is full of motion, full
of chaos and full of fuzzy objects. Physically correct
systems (PS) are designed to add these essential properties to the virtual world. Over the last decades they have been established as a valuable technique for a variety of volumetric effects, both in real-time applications and in pre-rendered visual effects of motion pictures and commercials.
Particle systems have a long history in video games and computer graphics. Very early video games in the 1960s already used 2D pixel clouds to simulate explosions. The first publication about the use of dynamic PS in computer graphics was written after the completion of the visual effects for the motion picture Star Trek II at Lucasfilm (cf. [Reeves1983]). Reeves describes basic motion operations and basic data representing a particle –- neither have been altered much since. An implementation on parallel processors of a super computer has been done by [Sims1990]. He and [McAllister2000] also describe many of the velocity and position operations of the motion simulation that are used below. The latest description of CPU-based PS for use in video games has been done by [Burg2000].
Real-time PS are often limited by the fill rate or the CPU to graphics hardware (GPU) communication. The fill rate, the number of pixels the GPU can draw for each frame, is often a limiting factor when there is a high overdraw, i.e. single particles are relatively large and a lot of them overlap each other. Since the realism of a particle system simulation increases when smaller particles are used, the fill rate limitation loses importance. The second limitation, the transfer bandwidth of particle data from the simulation on the CPU to the rendering on the GPU, now dominates the system. Sharing the graphics bus with many other rendering tasks allows CPU-based PS to achieve only up to 10,000 particles per frame in typical game applications. Therefore it is desirable to minimize the amount of communication of particle data. This can be achieved by integrating both parts, simulation and rendering, of this visualization problem on the GPU.
To simulate particles on a GPU you can use stateless or state-preserving PS. Stateless PS require a particle's data to be computed from its birth to its death by a closed form function which is defined by a set of start values and the current time. State-preserving PS allow using numerical, iterative integration methods to compute the particle data from previous values and a changing environmental description (e.g. moving collider objects). Both simulation methods have their areas of applications and are to be chosen based on the requirements of the desired effect.
Stateless PS have been introduced on the first generation of programmable PC GPUs (cf. [NVIDIA2001]) and are described in "Stateless particle systems". The state-preserving simulation introduced here is described in "Particle simulation on graphics hardware". Besides the particle system itself additional innovations are the usage of simulated pixel data as geometry input (cf. "Transfer texture data to vertex data") and the sorting of this data with a parallel sorting algorithm (cf. "Sort for alpha blending"). These innovations are applicable to other algorithms as well.
Several other forms of physical simulation have recently been developed for modern GPUs. [Harris2003] has used GPUs to perform fluid simulations and cellular automata with similar texture-based iterative computation. [Green2003] describes a cloth simulation using simple grid-aligned particle physics, but does not discuss generic particle systems' problems, like allocation, rendering and sorting. The photon mapping algorithm described by [Purcell2003] uses a sorting algorithm similar to the odd-even merge sort in "Sort for alpha blending". However their algorithm does not show the necessary properties to exploit the high frame-to-frame coherence of the particle system simulation.
This section describes two basis techniques for particle systems: stateless particle simulation and general-purpose computation on graphics hardware related to this work.
Stateless Particle Systems
Some PS have been implemented with vertex shaders (also called vertex programs) on programmable GPUs [NVIDIA2001]. These PS are however stateless, i.e. they do not store the current positions and other attributes of the particles. To determine a particle's position you need to find a closed form function for computing the current position only from initial values and the current time. As a consequence such PS can hardly react to a dynamic environment.
Particles that are not meant to collide with the environment and that are only influenced by global gravity acceleration can be simulated quite easily with a simple function. But simple collisions or forces with local influence however lead to rather complex functions.
Particle attributes besides velocity and position, e.g. the particle's orientation, size and texture coordinates, have generally much simpler computation rules. It is often sufficient to calculate them from a start value and a constant factor of change over time, which makes them ideal for a stateless simulation. This holds true even if the position is determined with the statepreserving simulation as described below (cf. "Particle simulation on graphics hardware").
The strengths of the stateless PS make it ideal for simulating small and simple effects without influence from the local environment. In action video games these might be a weapon impact splash or the sparks of a collision. Larger effects that require interaction with the environment are less suitable for the technique.
General-Purpose Computation on Graphics Hardware
With the broad availability of programmable graphics hardware, much research has been done to explore non-graphical uses of graphics hardware. Besides the work mentioned in "Introduction", a good overview about recent research can be found at [GPGPU2003].
A common abstraction of the programming model available in graphics hardware is called “stream programming” (cf. [Buck2003]): An input data stream is transformed by an autonomous processing kernel that then produces an output data stream. The processing kernel itself has read-only access to the input stream and global data, but it can only write one output data record.
In graphics hardware terms the input data stream can be represented by a texture, the output data stream by a render target. Output data is often re-used as input in a further processing step. In that case the data streams are textures as well as render targets. The processing kernel is represented by a pixel shader (also called fragment program). By drawing a full-screen rectangle, the graphics hardware is instructed to call the pixel shader once for each output data record, reading from the input stream in the pixel shader.