|
The
famous Moore's law, which states in rough terms that every 18 months
the speed of computers doubles, has an evil twin: every 18 months
software becomes twice as slow. A similar relationship can be formulated
for RAM and game data: no matter how big the memory budget of your
next-generation game may seem, your art team can probably fill it
up faster than you can say "disk thrashing." The appetite
for art megabytes grows faster than the publisher's willingness to
raise the minimum platform requirements.
Until
we start seeing games with a serious amount of geometry, the greatest
slice of the memory pie will belong to textures. Nobody wants to ship
a game with small, blurry, obviously tiling textures—and it's
up to the programmers to alleviate texture limitations. The hundreds
of megabytes of stuff coming from the art quarters must be compressed.
Conventional image-compression algorithms are not very well suited
to the specific requirements of art storage in games. They are designed
for relatively fast compression, which is not an issue, since art
assets are preprocessed offline; their decompression speed leaves
much to be desired. Also, it is usually hard to access a specific
portion of the image.
For fixed textures used in hardware-rendered games, the texture compression
schemes such as DXTn present a solution; however, for supporting older
hardware, for (gasp!) software renderers, and doing more complicated
stuff with textures they aren't perfect. Sure, you could decompress
DXTn in software and process it, but those formats aren't really meant
for this—it would probably be quite slow. There is a better solution
in terms of both decompression speed and image quality.
Image-compression algorithms based on vector quantization (VQ) techniques
have been researched for years. Recently, such algorithms have been
implemented in hardware by several graphics chip vendors. Unlike DXTn,
VQ decompression is as easy to do in software as it is in hardware,
and might be just what you need to slash the memory requirements of
your project in half.
This article provides an introduction to the field of VQ, presents
two algorithms for performing VQ, and goes into the details of a successful
real-world application for VQ texture compression.
What Is Vector
Quantization?
Strictly speaking, quantization is the procedure of approximating
continuous with discrete values; in practice, the input values to
the quantization procedure are often also discrete, but with a much
finer resolution than that of the output values. The goal of quantization
usually is to produce a more compact representation of the data while
maintaining its usefulness for a certain purpose. For example, to
store color intensities you can quantize floating-point values in
the range [0.0, 1.0] to integer values in the range 0-255, representing
them with 8 bits, which is considered a sufficient resolution for
many applications dealing with color. In this example, the spacing
of possible values is the same over the entire discrete set, so we
speak of uniform quantization; often, a nonuniform spacing is more
appropriate when better resolution is needed over some parts of the
range of values. Floating-point number representation is an example
of nonuniform quantization—you have the as many possible FP values
between 0.1 and 1 as you have between 10 and 100.
Both
these are examples of scalar quantization—the input and output
values are scalars, or single numbers. You can do vector quantization
(VQ) too, replacing vectors from a continuous (or dense discrete)
input set with vectors from a much sparser set (note that here by
vector we mean an ordered set of N numbers, not just the special case
of points in 3D space). For example, if we have the colors of the
pixels in an image represented by triples of red, green, and blue
intensities in the [0.0, 1.0] range, we could quantize them uniformly
by quantizing each of the three intensities to an 8-bit number; this
leads us to the traditional 24-bit representation.
By quantizing
each component of the vector for itself, we gain nothing over standard
scalar quantization; however, if we quantize the entire vectors, replacing
them with vectors from a carefully chosen sparse nonuniform set and
storing just indices into that set, we can get a much more compact
representation of the image. This is nothing but the familiar paletted
image representation. In VQ literature the "palette," or
the set of possible quantized values for the vectors is called a "codebook,"
because you need it to "decode" the indices into actual
vector values.
Why Does VQ
Work?
It turns
out that VQ is a powerful method for lossy compression of data such
as sounds or images, because their vector representations often occupy
only small fractions of their vector spaces. We can illustrate this
distribution in the case of a simple representation of a grayscale
image in a 2D vector space. The vectors will be composed by taking
in pairs the values of adjacent pixels. If the input image has 256
shades of gray, we can visualize the vector space as the [0,0]-[255,255]
square in the plane. We can then take the two components of the vectors
as XY coordinates and plot a dot for each vector found in the input
image.
Figure
2 shows the result of this procedure applied to a grayscale version
of the famous "Lena" (Figure 1), a traditional benchmark
for image-compression algorithms.
The
diagonal line along which the density of the input vectors is concentrated
is the x = y line; the reason for this clustering is that "Lena,"
like most photographic images, consists predominantly of smooth gradients.
Adjacent pixels from a smooth gradient have similar values, and the
corresponding dot on the diagram is close to the x = y line. The areas
on the diagram which would represent abrupt intensity changes from
one pixel to the next are sparsely populated.
If we
decide to reduce this image to 2 bits/pixel via scalar quantization,
this would mean reducing the pixels to four possible values. If we
interpret this as VQ on the 2D vector distribution diagram, we get
a picture like Figure 3.
The
big red dots on the figure represent the 16 evenly spaced possible
values of pairs of pixels. Every pair from the input image would be
mapped to one of these dots during the quantization. The red lines
delimit the "zones of influence," or cells of the vectors—all
vectors inside a cell would get quantized to the same codebook vector.
Now
we see why this quantization is very inefficient: Two of the cells
are completely empty and four other cells are very sparsely populated.
The codebook vectors in the six cells adjacent to the x = y diagonal
are shifted away from the density maxima in their cells, which means
that the average quantization error in these cells will be unnecessarily
high. In other words, six of the 16 possible pairs of pixel values
are wasted, six more are not used efficiently and only four are O.K.
Let's
perform an equivalent (in terms of size of resulting quantized image)
vector quantization. Instead of 2 bits/pixel, we'll allocate 4 bits
per 2D vector, but now we can take the freedom to place the 16 vectors
of the codebook anywhere in the diagram. To minimize the mean quantization
error, we'll place all of these vectors inside the dense cloud around
the x = y diagonal.
Figure
4 shows how things look with VQ. As in Figure 3, the codebook vectors
are represented as big red dots, and the red lines delimit their zones
of influence. (This partitioning of a vector space into cells around
a predefined set of "special" vectors, such as for all vectors
inside a cell the same "special" vector is closest to them,
is called a Voronoi diagram; the cells are called Voronoi cells. You
can find a lot of resources on Voronoi diagrams on the Internet, since
they have some interesting properties besides being a good illustration
of the merits of VQ.)
You
can see that in the case of VQ the cells are smaller (that is, the
quantization introduces smaller errors) where it matters the most—in
the areas of the vector space where the input vectors are dense. No
codebook vectors are wasted on unpopulated regions, and inside each
cell the codebook vector is optimally spaced with regard to the local
input vector density.
When
you go to higher dimensions (for example, taking 4-tuples of pixels
instead of pairs), VQ gets more and more efficient—up to a certain
point. How to determine the optimal vector size for a given set of
input data is a rather complicated question beyond the scope of this
article; basically, to answer it, you need to study the autocorrelation
properties of the data. It suffices to say that for images of the
type and resolution commonly used in games, four is a good choice
for the vector size. For other applications, such as voice compression,
vectors of size 40-50 are used.
|
To speed up the whole thing a lot, I found it possible to create a 16 dimensional kd-tree and use approximate nearest neighbor finding algorithms. This makes it feasible to use larger codebooks with large images and only wait a few minutes instead of hours.