Writing computer and video games is an exciting job. Apart from the parties, flexible working conditions, freebie trips to Japan and the money, the work itself can be very satisfying. At its best it involves solving problems in groundbreaking new ways, more efficient in terms of time or space than was previously thought possible. But a good grounding in the fields of mathematics and science relevant to gaming is useful if you want to build on the knowledge that has been built up over the years. There are many techniques picked up by game coders over the years that have proved useful. If you look into them more deeply, you may even find new ways to apply them!
Many books have been written about most areas of math, science and computer science, but all too many of the advanced books have this in common: they were written by academics to get tenure. As such, the authors seemingly do not want normal people to read their work! Many of the introductory books are well written, but do not focus on using their principles in real life programs. Fortunately there are exceptions to this rule and a few books I found interesting are listed at the end of this article.
The growing power of computers has meant that mathematics and computer science are living subjects, with new ways of looking at problems being discovered each year. But still the ideas you use most often in games (which I am covering here) were written down decades (if not centuries ago).
What turns game coders off about mathematics (apart from a difficult writing style) is the apparent lack of relevance to games. This doesn't have to be the case! In this article I will try to show how some simple mathematical ideas are applied to game coding.
When I started writing computer games on the school computer in Fiji back in 1982 the games available commercially were very rudimentary, and didn't use most of these simple ideas. As a result, they could look jerky and simplistic.
In 1989 when I started writing games commercially most of these methods were being used by good programmers at the time. In fact that is how I learned some of them - from other coders rather than out of a textbook.
But now everything is in 3D. People are writing mainly in C instead of assembly language and the production values of games are a lot higher. A foundation in maths is almost taken for granted in new games programmers and yet some of the basic tricks we used to play around with in the 1980s seem to have been forgotten in the courses these bright young sparks have taken.
Enough history already! Let's get on to explaining how that stuff you fell asleep in class over is relevant to helping you write classy efficient game code. The material in one topic is often related to another topic, so you may find it useful to read through the article as a whole. Some topics are so huge they have had whole books written about them, so I have had to give them only a brief discussion.
Numbers are the foundation of mathematics - you can't do much of any practical use without them. And everyone knows computers are great with numbers. So why are there so many different types of them and what are they useful for?
Integers: Whole numbers are easy
The numbers which range in steps of one unit both up and down from zero (·,-5,-4,-3,-2,-1,0,1,2,3,4,5,·) are called integers. Computers love integers. They can be stored exactly in binary and have efficient methods for addition and subtraction, as well as slightly less efficient methods of multiplication and division.
When used in computers, integers have upper and lower limits. For 16 bit signed integers stored in twos complement format (the format used for nearly all signed integers these days), the limit is -32768 to 32767. This limit is important when you are performing certain operations as overflow can occur (see the section on Pitfalls for details).
For many years integers have been favoured for storing variables used in games because operations on them (such as addition) were much faster than on other types of numbers. However these days the speed advantage of integers over floating point numbers (described later) has been somewhat diminished by fast co-processors on modern chips such as the Intel Pentium.
Scaling: Making the most of integers
If you store object co-ordinates or other values as integers, you may find that they lack precision. You may want to animate a sprite at less than one per frame, or move it by less than one pixel per frame. Do not do this by putting pauses in your code! You can allow much more flexibility by scaling the system of numbers you are using. For instance, you could divide the horizontal position of your main game sprite by 100 before plotting it on the screen. This allows you to give a horizontal velocity of 50 to the sprite to get it to move one pixel every second frame. It also allows you to give a wide range of velocities to it that cannot be emulated simply by pausing the movement for a number of frames.
Fixed point numbers: Efficient scaling
Dividing an integer by a scale factor is an expensive operation! So a more efficient way of doing this should be used. When the scaling factor is a power of two the division can be easily performed by using binary shifts (the >> operation in C). When a number is stored in this manner it is called a fixed point number, as the integer represents a number with both integer and fractional parts, separated by a binary point (like a decimal point but in binary) at a fixed position.
When games programmers say they use fast integer maths throughout their graphics routines, they often mean they are using fixed point numbers.
The most commonly used powers of two are 256 and 65536, but other ones are in use in many games (and in the operating systems of some next generation consoles). They can prove to be particularly efficient running on some machines.
Naming conventions for fixed point numbers specify the number of bits on both sides of the binary point. For example a 12.4 fixed point number has 12 bits of integral value and 4 bits of fractional value. If it is unsigned it can range from 0 to 4095.9375 in steps of 0.0625 (1/16 or 2-4).
Addition and subtraction are simple on fixed point numbers: just perform the operation on them as if they were integers (of course, they must be compatible by having the same scale factor). Multiplication on the other hand requires you to shift the result to the right before using it.
Floating point numbers: Flexible ranges
The main disadvantage of fixed point numbers is that they lack precision over a large range: they cannot hold very large or very small numbers, and when they do hold small numbers these are not very accurate. The way to get around this is to use floating point numbers. These are formed of two parts: a signed exponent (the range shift) and signed mantissa (the raw value). You can turn a floating point number into a fixed point number by shifting the mantissa by the number of bits specified in the exponent: left for positive exponents and right for negative exponents.
Floating point numbers are supported in hardware on some new processor chips like the Intel Pentium. The C and C++ languages support them as primitive types: float (32 bit), double (64 bit) and long double (80 bit). In machines which do not support them in hardware their operations have to be emulated in software, which is much slower.
Multiplication on floating point numbers is easier than addition! This is the opposite to the situation with integer and fixed point numbers. So when you are using floating point maths you may find algorithms different from the ones you use with integers appropriate.
Discreteness is a way of saying that values in computer games often have integer values. Time is not continuous - it is split up into frames of 1/60th of a second (for NTSC video consoles). Sprite positions are not continuous - they are displayed on pixel boundaries. This can be very useful as an approximation to a smooth universe.
Spatial discreteness: Aligning objects to grid boundaries
On nearly all video games consoles sprites must be placed on integral pixel boundaries. The exception is the Nintendo 64, which has hardware anti-aliasing built-in. This should allow it to display sprites on half-pixel or quarter-pixel boundaries. We will need to wait and see if 2D sprite based games for that machine are able to take advantage of this.
The use of fixed point (or floating point) numbers for game objects means that the objects are held with greater precision than is shown on the screen. When you are smooth scrolling a screen you need to stop its motion if it gets too small (e.g. less than 1/8th pixel per frame) to avoid jitter. In this case, to take account of spatial discreteness you should halt the scroll at the half-way point of a pixel rather than the edge. When the scroll restarts it will then have a fair chance of moving in either direction.
Temporal discreteness: Game frames
The concept of game frames is fundamental to traditional games programming. While it is possible to have an asynchronous event driven game, most games are based on the ticking of a clock which happens a fixed number of times per second.
Every time this clock signal is processed the game executes one cycle of thought. Each game object may move by a slight amount, some decisions may be made, and the display may be updated. Operations which you may have been taught to calculate all at once (such as the trajectory of a rocket) have to be done one step at a time.
What happens when you "miss a frame" and the game takes longer than expected to execute a game frame cycle (this is usually the fault of the graphics routines being overloaded by the current view being too complex)? How do you catch up and avoid the game slowing down? There are two main methods: running the game engine multiple times and scaling velocities.
The "multiple times" method is simple: you time how many frames you missed and then run the game code again that many times to catch up. This can work only because the game code is usually much faster than the display code - the display code does not need to be run multiple times. If the game code is very complex or the game is running on a slow machine, this method does not work. This method is often used on console games.
The other method (which works on slow machines) is to time how long each game frame is taking to process and then multiply each game speed by the appropriate number. This is rather harder for accelerations than velocities. This is the method most often used on PC games, because the machines have such a wide range of speeds.
In modern games the game frame is not the same as the display frame. The game is run in a separate "thread" from the display code, and so should be immune to the display code being bogged down by too many objects or a scene being too complex. This is a variation of the "run the game multiple times" solution, but requires some more thought to co-ordinate the game and the display thread. The shared variables that are written by the game code and read by the display code (such as sprite position and animation requests) should be double-buffered to prevent synchronisation problems.
Classic or Euclidean geometry (named after the Greek mathematician, Euclid), is based on right angled triangles. It is customary to use Greek letters for angles, and in the following example I use q (theta) for the main angle and f (phi) for the other angle.
There are two points named, A and B. There are three sides, the adjacent, the opposite and the hypotenuse. The hypotenuse is the longest side in the triangle. The opposite is the side which is not near the angle you are interested in. The adjacent is the other side. There are two angles named in the triangle, f and q , as well as the 90¡ right angle. The sides are labelled with q being the angle we are looking at. In any triangle, the sums of the interior angles adds up to 180¡ . Therefore, q + f + 90¡ = 180¡ , so f = 90¡ - q . This means that one angle can be calculated from the other. The triangle can be defined by an angle and the length of one of the sides, or by the length of two of the sides. This leads to polar and Cartesian co-ordinates, but we will discuss those later.
Trigonometry: Those sine and cosine functions
What do the trigonometric functions mean? Well, sin q is the length of the opposite side divided by the length of the hypotenuse. Cos q is the length of the adjacent side divided by the hypotenuse. Tan q is the length of the opposite side divided by the length of the adjacent side. Since these functions take a lot of time to calculate, in a computer game you would generally store them in a look-up table. They are useful in converting between polar and Cartesian co-ordinate systems (this will be covered later in the article).
Pythagoras: Calculating distances
The Pythagorean theorem states that the square of the hypotenuse of a right angled triangle is equal to the sums of the square of the other two sides. This is useful in computer games because it lets you calculate the distance between two points accurately.
Now back to games. Imagine that there is an object at point A, co-ordinates (x1, y1). There is also an object at point B, co-ordinates (x2, y2). By the Pythagorean theorem, the distance between A and B is Ö ( (x2-x1)2 + (y2-y1)2 ). If you know how to calculate square roots, you can use this method to accurately find the distance between two points.
It is messy to calculate the angle between two points because there are four cases to consider, depending on which quadrant the right angle is in. This is discussed later.
If you know the angle q (you may have calculated it for use by your intelligence routines), you can calculate the distance between the two points by dividing the length of the adjacent side by cos q (if q < 45¡ ) or dividing the length of the opposite side by sin q (if q > 45¡ ).
The concept of space in computer games requires numerical measures with which to set positions and orientations. There are many ways to specify co-ordinates, but only a few of them are in common use.
Cartesian co-ordinates: Grid lines across the world
This is the most often used system. Cartesian co-ordinates consist of a horizontal value (called x) a vertical value (called y) and for 3D games a depth value (called z). These three axes are orthogonal, which means they are at right angles to each other. They can be represented as a vector [x, y, z], which is displayed vertically when it is used in matrix multiplication (covered later).
Polar co-ordinates: Angles and distance
Polar co-ordinates consist of an angle and a scalar. They can be very useful for storing velocities of objects in two dimensional games (such as sports games). A person can be running in a particular direction and then change angle by a certain amount each game frame, while maintaining the same speed.
You can convert them into Cartesian co-ordinates by multiplying the scalar by the sine and negative cosine of the angle to get the horizontal and vertical co-ordinates respectively (this treats angle zero as being North and angle ¹ circle as being East. You can use other conventions in your code. The mathematical convention is for angle zero to be East and angle ¹ circle to be North).
Angles can be stored in several ways. Mathematicians say a circle is made up by 2p radians. Sailors use 360 degrees. French dictators rule that it is 400 gradients. But game coders generally use 256 units, as this is stored nicely in one byte.
Quaternions: 3d rotations that work
This is a great big topic of its own, so I must be brief. Quaternions are four dimensional numbers (in the same way that the complex numbers you learned about at school are two dimensional). They have a set of mathematical operations defined on them, such as multiplication and addition.
They are very useful for processing rotations in three dimensions. If you are controlling a spaceship like the one in Descent, you want a roll to be a roll whichever way you are facing. Quaternions allow you to do this. They are also useful for interpolating motion capture animations.
Frames of Reference: Screen (local) and World (global) co-ordinates
This is a fundamental idea that many programmers in the early 1980s had not heard of. Fortunately these days it is taught quite extensively because of its use in 3D graphics.
The basic idea is that the objects in the game have an existence in a game world which exists even when you can't see it. Their co-ordinates are all given relative to the game world, not the screen. In the old days some people would store all enemy creatures as screen offsets and change them directly when scrolling the screen. This was not a good way to code!
You should treat the view of the screen as a window into the game world. In 2D games this is a rectangle which covers part of the world. When a view is set to a particular area, those objects which lie within that region of the world will be displayed on screen. Those outside the area are invisible, as they are off-screen.
You can translate positions between world and screen co-ordinates by applying a transformation matrix to them. In the case of simple platform games with no scaling or rotation, this simplifies to adding a scroll offset to both co-ordinates.
Note that the "panel" objects can be treated differently, and referred to in screen co-ordinates, as they relate to your view rather than the game world itself.
Orthogonal systems: Treating co-ordinates independently
Cartesian co-ordinates are "orthogonal". The beauty of this is that they can be treated completely independently of each other. This means that often something we do (such as apply friction) in one dimension (such as x) we can do the same way to the others (y and z) without worrying how they interact.
Transformation matrices: Converting between frames of reference
In two dimensions, you can translate between frames of references by multiplying by a 2 by 2 matrix and adding a 2 element vector. This is not so hard as it looks, as this is just a mathematical way of showing two simple formulae.
screenx = a*x+b*y+e
screeny = c*x+d*y+f
What is interesting is the value of the constants a, b, c, d, e and f. For translation (just scrolling) a and c are one and b and d are zero (this is called an identity matrix) and e and f are the scroll offsets. For scaling, b and c are zero and a and d are the scale factors. For rotation the following formulae can be used, where angle is the angle of rotation of the screen relative to the world and scale is the relative scale of screen co-ordinates to world co-ordinates. E and f still represent scroll offsets, but you have to adjust them to take account of the rotation if you do not Îpre-adjust' the x and y co-ordinates before plugging them into the equation.
The nice thing about using matrices to transform co-ordinates is that various transformations can be concatenated (made to occur one after the other) by multiplying the two matrices. With three dimensional graphics, a four by four matrix can hold everything you want to know about the transformation, including the angle of the viewing frustum (i.e. whether or not the camera is fish-eyed) and the way the perspective is calculated, as well as the translation, scaling and rotation required to render the scene.