The famous Moore's law, which states in rough terms that every 18 months the speed of computers doubles, has an evil twin: every 18 months software becomes twice as slow. A similar relationship can be formulated for RAM and game data: no matter how big the memory budget of your nextgeneration game may seem, your art team can probably fill it up faster than you can say "disk thrashing." The appetite for art megabytes grows faster than the publisher's willingness to raise the minimum platform requirements.
Until
we start seeing games with a serious amount of geometry, the greatest
slice of the memory pie will belong to textures. Nobody wants to ship
a game with small, blurry, obviously tiling textures—and it's
up to the programmers to alleviate texture limitations. The hundreds
of megabytes of stuff coming from the art quarters must be compressed.
What Is Vector
Quantization?
Strictly speaking, quantization is the procedure of approximating
continuous with discrete values; in practice, the input values to
the quantization procedure are often also discrete, but with a much
finer resolution than that of the output values. The goal of quantization
usually is to produce a more compact representation of the data while
maintaining its usefulness for a certain purpose. For example, to
store color intensities you can quantize floatingpoint values in
the range [0.0, 1.0] to integer values in the range 0255, representing
them with 8 bits, which is considered a sufficient resolution for
many applications dealing with color. In this example, the spacing
of possible values is the same over the entire discrete set, so we
speak of uniform quantization; often, a nonuniform spacing is more
appropriate when better resolution is needed over some parts of the
range of values. Floatingpoint number representation is an example
of nonuniform quantization—you have the as many possible FP values
between 0.1 and 1 as you have between 10 and 100.
Both these are examples of scalar quantization—the input and output values are scalars, or single numbers. You can do vector quantization (VQ) too, replacing vectors from a continuous (or dense discrete) input set with vectors from a much sparser set (note that here by vector we mean an ordered set of N numbers, not just the special case of points in 3D space). For example, if we have the colors of the pixels in an image represented by triples of red, green, and blue intensities in the [0.0, 1.0] range, we could quantize them uniformly by quantizing each of the three intensities to an 8bit number; this leads us to the traditional 24bit representation.
By quantizing each component of the vector for itself, we gain nothing over standard scalar quantization; however, if we quantize the entire vectors, replacing them with vectors from a carefully chosen sparse nonuniform set and storing just indices into that set, we can get a much more compact representation of the image. This is nothing but the familiar paletted image representation. In VQ literature the "palette," or the set of possible quantized values for the vectors is called a "codebook," because you need it to "decode" the indices into actual vector values.
Why Does VQ Work?
It turns out that VQ is a powerful method for lossy compression of data such as sounds or images, because their vector representations often occupy only small fractions of their vector spaces. We can illustrate this distribution in the case of a simple representation of a grayscale image in a 2D vector space. The vectors will be composed by taking in pairs the values of adjacent pixels. If the input image has 256 shades of gray, we can visualize the vector space as the [0,0][255,255] square in the plane. We can then take the two components of the vectors as XY coordinates and plot a dot for each vector found in the input image.
Figure 2 shows the result of this procedure applied to a grayscale version of the famous "Lena" (Figure 1), a traditional benchmark for imagecompression algorithms.


FIGURE 1. Lena in grayscale. 
The diagonal line along which the density of the input vectors is concentrated is the x = y line; the reason for this clustering is that "Lena," like most photographic images, consists predominantly of smooth gradients. Adjacent pixels from a smooth gradient have similar values, and the corresponding dot on the diagram is close to the x = y line. The areas on the diagram which would represent abrupt intensity changes from one pixel to the next are sparsely populated.


FIGURE 2. Distribution of pairs of adjacent pixels from grayscale Lena. 
If we decide to reduce this image to 2 bits/pixel via scalar quantization, this would mean reducing the pixels to four possible values. If we interpret this as VQ on the 2D vector distribution diagram, we get a picture like Figure 3.


FIGURE 3. Scalar quantization to 2 bits/pixel interpreted as 2D VQ. 
The big red dots on the figure represent the 16 evenly spaced possible values of pairs of pixels. Every pair from the input image would be mapped to one of these dots during the quantization. The red lines delimit the "zones of influence," or cells of the vectors—all vectors inside a cell would get quantized to the same codebook vector.
Now we see why this quantization is very inefficient: Two of the cells are completely empty and four other cells are very sparsely populated. The codebook vectors in the six cells adjacent to the x = y diagonal are shifted away from the density maxima in their cells, which means that the average quantization error in these cells will be unnecessarily high. In other words, six of the 16 possible pairs of pixel values are wasted, six more are not used efficiently and only four are O.K.
Let's perform an equivalent (in terms of size of resulting quantized image) vector quantization. Instead of 2 bits/pixel, we'll allocate 4 bits per 2D vector, but now we can take the freedom to place the 16 vectors of the codebook anywhere in the diagram. To minimize the mean quantization error, we'll place all of these vectors inside the dense cloud around the x = y diagonal.


FIGURE 4. Vector quantization to 4 bits per 2Dvector. 
Figure 4 shows how things look with VQ. As in Figure 3, the codebook vectors are represented as big red dots, and the red lines delimit their zones of influence. (This partitioning of a vector space into cells around a predefined set of "special" vectors, such as for all vectors inside a cell the same "special" vector is closest to them, is called a Voronoi diagram; the cells are called Voronoi cells. You can find a lot of resources on Voronoi diagrams on the Internet, since they have some interesting properties besides being a good illustration of the merits of VQ.)
You can see that in the case of VQ the cells are smaller (that is, the quantization introduces smaller errors) where it matters the most—in the areas of the vector space where the input vectors are dense. No codebook vectors are wasted on unpopulated regions, and inside each cell the codebook vector is optimally spaced with regard to the local input vector density.
When you go to higher dimensions (for example, taking 4tuples of pixels instead of pairs), VQ gets more and more efficient—up to a certain point. How to determine the optimal vector size for a given set of input data is a rather complicated question beyond the scope of this article; basically, to answer it, you need to study the autocorrelation properties of the data. It suffices to say that for images of the type and resolution commonly used in games, four is a good choice for the vector size. For other applications, such as voice compression, vectors of size 4050 are used.