Gamasutra: The Art & Business of Making Gamesspacer
Image Compression with Vector Quantization

Printer-Friendly VersionPrinter-Friendly Version
View All     RSS
April 23, 2014
arrowPress Releases
April 23, 2014
PR Newswire
View All





If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:


 
Image Compression with Vector Quantization

April 16, 2001 Article Start Page 1 of 3 Next
 

The famous Moore's law, which states in rough terms that every 18 months the speed of computers doubles, has an evil twin: every 18 months software becomes twice as slow. A similar relationship can be formulated for RAM and game data: no matter how big the memory budget of your next-generation game may seem, your art team can probably fill it up faster than you can say "disk thrashing." The appetite for art megabytes grows faster than the publisher's willingness to raise the minimum platform requirements.

Until we start seeing games with a serious amount of geometry, the greatest slice of the memory pie will belong to textures. Nobody wants to ship a game with small, blurry, obviously tiling textures—and it's up to the programmers to alleviate texture limitations. The hundreds of megabytes of stuff coming from the art quarters must be compressed.

Conventional image-compression algorithms are not very well suited to the specific requirements of art storage in games. They are designed for relatively fast compression, which is not an issue, since art assets are preprocessed offline; their decompression speed leaves much to be desired. Also, it is usually hard to access a specific portion of the image.

For fixed textures used in hardware-rendered games, the texture compression schemes such as DXTn present a solution; however, for supporting older hardware, for (gasp!) software renderers, and doing more complicated stuff with textures they aren't perfect. Sure, you could decompress DXTn in software and process it, but those formats aren't really meant for this—it would probably be quite slow. There is a better solution in terms of both decompression speed and image quality.

Image-compression algorithms based on vector quantization (VQ) techniques have been researched for years. Recently, such algorithms have been implemented in hardware by several graphics chip vendors. Unlike DXTn, VQ decompression is as easy to do in software as it is in hardware, and might be just what you need to slash the memory requirements of your project in half.

This article provides an introduction to the field of VQ, presents two algorithms for performing VQ, and goes into the details of a successful real-world application for VQ texture compression.

What Is Vector Quantization?

Strictly speaking, quantization is the procedure of approximating continuous with discrete values; in practice, the input values to the quantization procedure are often also discrete, but with a much finer resolution than that of the output values. The goal of quantization usually is to produce a more compact representation of the data while maintaining its usefulness for a certain purpose. For example, to store color intensities you can quantize floating-point values in the range [0.0, 1.0] to integer values in the range 0-255, representing them with 8 bits, which is considered a sufficient resolution for many applications dealing with color. In this example, the spacing of possible values is the same over the entire discrete set, so we speak of uniform quantization; often, a nonuniform spacing is more appropriate when better resolution is needed over some parts of the range of values. Floating-point number representation is an example of nonuniform quantization—you have the as many possible FP values between 0.1 and 1 as you have between 10 and 100.

Both these are examples of scalar quantization—the input and output values are scalars, or single numbers. You can do vector quantization (VQ) too, replacing vectors from a continuous (or dense discrete) input set with vectors from a much sparser set (note that here by vector we mean an ordered set of N numbers, not just the special case of points in 3D space). For example, if we have the colors of the pixels in an image represented by triples of red, green, and blue intensities in the [0.0, 1.0] range, we could quantize them uniformly by quantizing each of the three intensities to an 8-bit number; this leads us to the traditional 24-bit representation.

By quantizing each component of the vector for itself, we gain nothing over standard scalar quantization; however, if we quantize the entire vectors, replacing them with vectors from a carefully chosen sparse nonuniform set and storing just indices into that set, we can get a much more compact representation of the image. This is nothing but the familiar paletted image representation. In VQ literature the "palette," or the set of possible quantized values for the vectors is called a "codebook," because you need it to "decode" the indices into actual vector values.

Why Does VQ Work?

It turns out that VQ is a powerful method for lossy compression of data such as sounds or images, because their vector representations often occupy only small fractions of their vector spaces. We can illustrate this distribution in the case of a simple representation of a grayscale image in a 2D vector space. The vectors will be composed by taking in pairs the values of adjacent pixels. If the input image has 256 shades of gray, we can visualize the vector space as the [0,0]-[255,255] square in the plane. We can then take the two components of the vectors as XY coordinates and plot a dot for each vector found in the input image.

Figure 2 shows the result of this procedure applied to a grayscale version of the famous "Lena" (Figure 1), a traditional benchmark for image-compression algorithms.

FIGURE 1. Lena in grayscale.

The diagonal line along which the density of the input vectors is concentrated is the x = y line; the reason for this clustering is that "Lena," like most photographic images, consists predominantly of smooth gradients. Adjacent pixels from a smooth gradient have similar values, and the corresponding dot on the diagram is close to the x = y line. The areas on the diagram which would represent abrupt intensity changes from one pixel to the next are sparsely populated.

FIGURE 2. Distribution of pairs of adjacent pixels from grayscale Lena.

If we decide to reduce this image to 2 bits/pixel via scalar quantization, this would mean reducing the pixels to four possible values. If we interpret this as VQ on the 2D vector distribution diagram, we get a picture like Figure 3.

FIGURE 3. Scalar quantization to 2 bits/pixel interpreted as 2D VQ.

The big red dots on the figure represent the 16 evenly spaced possible values of pairs of pixels. Every pair from the input image would be mapped to one of these dots during the quantization. The red lines delimit the "zones of influence," or cells of the vectors—all vectors inside a cell would get quantized to the same codebook vector.

Now we see why this quantization is very inefficient: Two of the cells are completely empty and four other cells are very sparsely populated. The codebook vectors in the six cells adjacent to the x = y diagonal are shifted away from the density maxima in their cells, which means that the average quantization error in these cells will be unnecessarily high. In other words, six of the 16 possible pairs of pixel values are wasted, six more are not used efficiently and only four are O.K.

Let's perform an equivalent (in terms of size of resulting quantized image) vector quantization. Instead of 2 bits/pixel, we'll allocate 4 bits per 2D vector, but now we can take the freedom to place the 16 vectors of the codebook anywhere in the diagram. To minimize the mean quantization error, we'll place all of these vectors inside the dense cloud around the x = y diagonal.

FIGURE 4. Vector quantization to 4 bits per 2D-vector.

Figure 4 shows how things look with VQ. As in Figure 3, the codebook vectors are represented as big red dots, and the red lines delimit their zones of influence. (This partitioning of a vector space into cells around a predefined set of "special" vectors, such as for all vectors inside a cell the same "special" vector is closest to them, is called a Voronoi diagram; the cells are called Voronoi cells. You can find a lot of resources on Voronoi diagrams on the Internet, since they have some interesting properties besides being a good illustration of the merits of VQ.)

You can see that in the case of VQ the cells are smaller (that is, the quantization introduces smaller errors) where it matters the most—in the areas of the vector space where the input vectors are dense. No codebook vectors are wasted on unpopulated regions, and inside each cell the codebook vector is optimally spaced with regard to the local input vector density.

When you go to higher dimensions (for example, taking 4-tuples of pixels instead of pairs), VQ gets more and more efficient—up to a certain point. How to determine the optimal vector size for a given set of input data is a rather complicated question beyond the scope of this article; basically, to answer it, you need to study the autocorrelation properties of the data. It suffices to say that for images of the type and resolution commonly used in games, four is a good choice for the vector size. For other applications, such as voice compression, vectors of size 40-50 are used.


Article Start Page 1 of 3 Next

Related Jobs

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[04.23.14]

Programmers
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[04.23.14]

Production Coordinator (temporary) - Treyarch
Vicarious Visions / Activision
Vicarious Visions / Activision — Albany, New York, United States
[04.23.14]

Senior Software Engineer-Vicarious Visions
Chukong Technologies
Chukong Technologies — Menlo Park, California, United States
[04.22.14]

Developer Relations Engineer






Comments


Adrian Pflugshaupt
profile image
Thanks for a great article! I believe I found a bug in the code. When the GLAQuant struct is initialized (GLAQuant::Init), the comment says "Fill with all-grey" vectors, but it really fills the vector with values going from 0.f - 1.f. I believe this should fill the vector with values 0.f - 255.f. Changing that makes the process use far less iterations. I ended up using 0.f - 254.f because 255.f got rounded to 0 from the conversion to BYTE.



To speed up the whole thing a lot, I found it possible to create a 16 dimensional kd-tree and use approximate nearest neighbor finding algorithms. This makes it feasible to use larger codebooks with large images and only wait a few minutes instead of hours.

Leandro Wainberg
profile image
The link to the source code looks dead, can you re-upload it? Thanks!

Christian Nutt
profile image
Unfortunately, as the article was posted in 2001, some links may now be dead. Our apologies!


none
 
Comment: