Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Asset Recovery: What to do When the Data is Gone!
View All     RSS
January 16, 2021
arrowPress Releases
January 16, 2021
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Asset Recovery: What to do When the Data is Gone!

August 3, 2001 Article Start Previous Page 3 of 5 Next

Raw Audio Data Reading in Audio Tools

Most of the techniques I have mentioned so far have been for graphics, but what about audio data. Well, many popular audio programs also support a raw mode for reading and writing audio data. By looking at and listening to the output, you can determine where the audio data is located and what format it is in.

As you can see, there aren't too many options. The header and trailer aren't important when you are just listening to the whole file to find data. However, the Byte order is very important because the data will just sound like static if it is incorrect. Fortunately, you know the ordering for your platform. If the Format is wrong, you can still roughly make out the data if you listen to it, but it's really loud and obnoxious, and will take up the whole amplitude range.

Sample Type and Channels are the most difficult to determine. Fortunately, you probably have some idea of whether your data is mono or stereo. If you do not, there are a few tricks you can try. If 16 bit data is loaded as eight bit stereo, one channel will be static, while the other may look O.K. If 16 bit data is loaded as eight bit mono data, you will see vertical stripes in the sample data upon zooming in.

Many popular audio programs also support a
raw mode for reading and writing audio data

Loading stereo data as mono has a similar appearance, even if the bit depth is correct. In general, a certain amount of trial and error is required, but if the audio data is stored in a raw format, it can be extracted. If your data is in a compressed format, these techniques generally won't help you.

A signed wav file loaded as unsigned.

16 bit stereo data loaded as eight-bit mono.

The Hex Dump

The best method for observing your data is the venerable hexadecimal dump. There are a number of programs out there that can give you both a hex and ASCII dump. Many development editors and environments support a hex dump display. Visual studio can open binary files as hex data, as can Multiedit and several other editors. I generally use an editing or browsing program. 4DOS's "list" command will display hex if the "X" key is pressed. However, my favorite utility is Breakpoint Software's Hex Workshop, which allows hex and ASCII editing, searches, Unicode support, and many other fine features.

General Rules for Identifying Data
Depending on what kind of data you have, there are a lot of different tricks you can use to identify the data in the hex dump. Of course, it is always important to use what you already know about the data from what source code and documentation you already have, but even when you have next to nothing, you may still be able to find what you need and change it. The important thing to remember is to have a goal in mind. Just staring at a hex dump won't help you unless you are looking for something in particular in the data. If you have expectations of what to find, you can consider the attributes of that data and determine whether the file you are looking at meets or breaks those expectations.

Data Files vs. Memory Structures
Often, you will have the structure definitions for the memory versions of the data, but there may be differences in the data in the file that has to be parsed in order to fill the structures that are in RAM. First off, although structures are dynamically allocated at load time using malloc/new, data in files is often stored as arrays of data like structures. Therefore, you might notice multiple occurrences of similar data in repeating structures. If you can set the width of the columns, this data repetition becomes more apparent.

A hex dump at a width of 32 bytes shows table
entries by repeating data elements in each column

Another important difference between data files and memory structures is that data, referring to other elements in the file, will use indexes or offsets rather than memory pointers. An index will be from the array start and in units of the structure size, while an offset is from the head of the file, the head of the table, or from some other point. An offset might be in bytes, two byte words, or quad-words. By identifying the locations of tables using the above technique, you can then search for offsets or indices that get you close to those locations in the file. This requires some eagle-eyed attention to the data and some trial and error work with a calculator, but it's not impossible if you know at least a little bit about the data you are trying to decode.

Sometimes, the different types of data are identified by a chunk header ID, usually a 4 letter ASCII identifier that is read as a long word. The ID might be reversed, depending on the byte ordering of the target platform, and the compiler's support for string identifiers, etc. This is where the ASCII dump portion of your hex editor is very useful… the chunk ID is quite easy to spot.

The Rule of Zeros—Finding the Data Size
When you are trying to identify data in a table or structure, there is a useful rule you can use to identify what size a particular data element is. The rule of zeros relies on the fact that the data doesn't actually use the entire range of values available for the number of bytes allotted to it, so the high bytes will often be zero, if the value is negative. So on a system with an intel byte ordering, there will be zeros on the right side of the element, while on a system using the motorola ordering, the zeros will be on the left.

A look at the earlier hex dump example, shows that the first ten values are four byte Intel format values. Following that, we see a table of 32 byte structures that is made up of single byte and double byte values. An exception to the rule of zeros is when a byte or number of bytes is used to store flags where each bit is significant.

Article Start Previous Page 3 of 5 Next

Related Jobs

Sucker Punch Productions
Sucker Punch Productions — Bellevue, Washington, United States

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan

Experienced Game Developer

Loading Comments

loader image