Floating point numbers permeate almost every area of game programming. They are used to represent everything from position, velocity, and acceleration, to fuzzy AI variables, texture coordinates, and colors. Yet, despite their ubiquitous role, few programmers really take the time to study the underlying mechanics of floating point numbers, their inherent limitations, and the specific problems these can bring to games.
This article explores some of the problems with floats, illustrating certain examples in the hope that programmers will be somewhat less surprised when these problems crop up midproject. With any luck, you will be better equipped to visualize and deal with these and other related problems.
The term "floating point number" can be used to describe many different kinds of number representation. But for game programmers, there are really only two that we need to be concerned with: single and double precision floating point numbers.
By far the most common is the single precision 32bit floating point number, commonly referred to by its C keyword "float." Due to the convenient size and the requirements of the hardware, this is the most popular format for storing and manipulating numbers on all modern gaming platforms (although some platforms use 24bit floats in part of their hardware graphics pipeline, which can greatly magnify the problems discussed below).
A float consists of 32 bits: a sign bit, an 8bit exponent (e), and a 23bit significand (s). For precise details, see References.
To visualize the problems with floats, it's useful to visualize the differences between floats and integers. Consider how the 32bit integers represent space. There are 2^32 integers; each one can be thought of as representing a region between two points on a line. If each integer represents 1 millimeter, then you can represent any distance using integers from 1mm to 2^32mm. That's any distance up to about 4,295km, about 2,669 miles, with a resolution of 1mm.
Now picture how one might represent 2D space with integers. If you again consider a resolution of 1mm, you can represent any position in a 4,295x4,295 kilometer square area to a resolution of 1mm. Imagine zooming in closely and seeing the actual grid of integers.
Now take it one more step and use the same setup to represent 3D space. This time each individual position can be thought of as the space within tiny 1mm cubes, so full 3D space is made up of a grid of these identically sized cubes.
You can't represent anything smaller than 1mm, and objects that are only a few millimeters in size will have a blocky appearance. Figure 1 represents the general idea.
The important thing to remember about these integerdefined cubes is that they are all the same size. In 3D space, the cubes of space near the origin are the same as the cubes of space a mile away from the origin.
Let's compare the 3D integer arrangement to floats. First off, note that both integers and floats (in practice) are stored as 32bit words. Since there are only 2^32 possible bit patterns, that means the number of possible floats is the same as the number of possible integers. Yet floating point numbers can represent numbers in a range from 0 to 2^128. [Note: There are actually a few less floats, as some float bit patterns are "not a number" (NaN), but we'll ignore that for simplicity's sake. For the purpose of this article, I will also simplify the treatment of signed quantities.]
How this larger range of numbers works is fairly obvious if you study the representation of a float. Still, it's useful to look into this to gain an understanding of what's going on.
The key thing to note is that there is the same number of floating point numbers between each power of two. So from 1 to 2 there are 8,388,608 (or 2^23) possible different floating point numbers, and from 2 to 4 there is the same total number. There's also the same number of possible floats between 32,768 and 65,536, or 0.03125 and 0.0625.
Here's another way of thinking about it: If you represent a position with a floating point number, then there are more possible points between the origin and a point 1mm away than there are possible points between the origin and a point on the other side of the planet. This means the precision of your floating point representation of a position depends on where you're standing and what units you're using.
If, again, a floating point value of 1.0 represents 1mm, then when you stand near the origin (meaning your represented position is close to 0,0,0) your position can be represented to an accuracy of about 0.0000001mm, which is incredibly precise.
However, as you move away from the origin, your accuracy begins to decrease. At only 1 kilometer away from the origin (1,000,000mm), the accuracy drops to 0.125mm, which is still pretty good. But if you move even farther to a distance of 64km from the origin, the accuracy drops precipitously to 4mm, which means you can only represent a position with an accuracy of 4mma quarter of the resolution the integers could detect.
It gets worse. If you travel farther out to the edge of the space that could be represented with integers, at 4,295km (roughly the distance from Los Angeles to New York), you are at 2^32mm; yet, since we can only represent 2^23bits of precision, our accuracy drops to 29mm, or 512mmabout half a meter.
So if you used 32bit floats to represent positions in a game that spanned the continental U.S., then on one coast your positions can only be represented with an accuracy of half a meter (1.5 feet), and clearly, that is unacceptable.
Thibault Jochem 
7 Jan 2009 at 9:06 am PST

232 bit patterns in a 32bit ?? wtf ....
I guess something went wrong with the "^" 


AnnMarie Ratcliffe 
Ah, I see someone got there first, so I'll just second the comment.
Not sure if the formatting got corrupted by the Content Management, but all your '2 to the power of' numbers got squashed into one, so 2^32 became 232, and so on. 


Peter Freese 
This is a great article, reposted from Mick's blog: http://cowboyprogramming.com/2007/01/05/visualizingfloats/commen
tpage1/ One point of correction (which I noted on Mick's blog page) is that changing the scale of units doesn't solve (or change) the accuracy. Using the example of Los Angeles to New York, it doesn’t make a bit of difference whether 1.0 represents 1 meter, 1 kilometer, or 1 parsec. Accuracy will still only be +/ 0.5 meters at the distance of New York to LA from the origin. The issue here is that accuracy is limited by precision at that range, regardless of the units. Another way of thinking about this is that changing from meters to kilometers may decrease the numerical error from 0.5 to 0.0005, but the units of the error change as well, i.e. 0.5 meters is still equivalent to 0.0005 kilometers. 


Simon Carless 
Thanks, guys  this is actually originally reposted from Game Developer magazine (where it first appeared). We fixed the typo with 2^32, incidentally, and the other inaccuracy.



Raymond Grier 
I don' think this article properly explained the actual source of the problems with floating point arithmetic:
There are only a limited number of values that can be represented by 32 bits but an infinite number of decimal values between 0.0 and 1.0, therefore each value is some decimal value less than the next biggest one with the same bit combination in it's 8bit power section. To make things worse, the distance between 2 consecutive values is different depending on what the 8bit combination is, which makes it hard toi anticipate the amount of error when performing an operation on 2 floats. Everyone whoi studied science in school knows that the source of error at the end of an operation is at least as big as the biggest source of error of the 2 operands, if not bigger. if you then use that result in another operation with a another float (and so on....), the source of error keeps compounding and this is why flaoting point math has accuracy problems. The examples in the article are ultimately a result of this problem. 


Vojislav Stojkovic 
Another typical problem with floats  one that stems from the issues addressed in this article  is the comparison. If we're talking about results of computation with several steps in it, then comparing two such floats directly is always risky business. Likewise, comparing a result of floating point computation to zero is also risky. Instead, of direct comparisons, programmers should calculate the error and compare it to a predefined epsilon value.



Matt Booth 
Great article, I realise I'm pretty late to the party but I'm a little confused on how a couple of those accuracy values were calculated. As errors are bounded by the machine epsilon, it looks like they're being used to calculate the accuracy (worst case) at different distances.
In my C++ distribution FLT_EPSILON is defined as 1.192092896e07F, applying that as relative error to the values in the article... Max error @: 1.0f = FLT_EPSILON, which matches 0.0000001mm, 1e6f = 0.119f which is pretty close to 0.125mm, 64e6f = 7.629f which is nearly double 4mm, std::powf(2.0f,32.0f) = 512.f which matches 512mm. Am I missing something relevant, or is the 4mm value incorrect? 

