CD Images
Sometimes,
the data you need has been formatted as CD sectors. CD sectors have a
number of different flavors, but raw mode sectors generally have periodic
header information every 2336 (0x920) or 2352 (0x930) bytes (or some multiple
of 16 close to that). These sectors have the sector number encoded in
BCD as part of the header.
The purpose
of all this, is so that you can determine if you have CD sector data (which
may indicate a compressed video or audio file), and also determine which
sections of that data to strip out in order to retrieve the real data
without gaps.
A CD sector
is about 2048 (0x800) bytes, if the headers are omitted (so called "cooked"
sectors). In this case, there is usually no additional data per sector.
Cooked sectors are ISO 9660 tracks, whereas raw mode sectors are usually
CDDA, video, or other track data.
Archive
files
On
consoles, where all data must be accessed from CD, it is common practice
to combine all data into one archive data file with each sub-file aligned
on sector size boundaries. In doing so, storing (or reading) the directory
can be eliminated and any file can be accessed directly by a single CD
seek to a given sector. However, you still have to know the sector number
in order to find the data. Sometimes, the sector numbers are hard-coded
into the source, but more often than not, there is a simple table loaded
that stores the offsets, and possibly the sizes of the data files. Archive
files are also used to reduce load times when you want all of the data
for a given game level to be loaded into memory at once, requiring no
decompression in the process.
Archive
files often have the offset data encoded in the first few sectors of the
file data, although they are sometimes in a completely separate file.
A game is encoded with hard sector offsets, but keeps a separate table
for external tool usage. Generally, the files in the offset table are
in the same order as in the data file. This means that we can tell if
we are looking at offset table data given the fact that the numbers are
increasing. They may be packed oddly though, such as one game I worked
on used that used 24 bit offset values.
In addition,
the units may not be obvious. Even if the data is aligned to CD sectors,
the offsets may be in other units. Common units for offset tables are
CD sectors, 1k, 16 bytes, four bytes or one byte, however, other units
may be possible. There may be other data interleaved with the offset data,
such as the file size (again, they may be in different units), file name,
attribute data, compression header, etc. The best way to determine what
the units are, is to guess and then look at the data to see if it makes
sense. Generally, the end of each file is padded with zeros or 0xFF's,
so you can easily identify the units when you find the start of a file.
Some files
only have size data instead of offset data. In that case, you can't count
on the numbers to increase always. You have to do a bit more math to add
up the offsets, but by guessing units and checking them against the data,
you can find the correct format.
The data
file you find may have sub-files in it with their own offset data, sometimes
with a completely different format. Just remember to document everything
so that you can put the data back together again.
Compressed
data
Most
of these rules are only useful for uncompressed data. When the data is
compressed, your options are considerably limited. Once again, go back
to what you have, and look to see if you have source code or tools that
you can use. If you have source code for a compressor, you can reverse
engineer a de-compressor.
Most companies use proprietary internal tools for data compression, but
they will use a third party tool or library for video or audio. Many PC
games actually use PKZip for their data compression. Compare your file
against file formats that might fit the bill. Check the file extension
against possible compressed file types. If you can determine that a commercially
available utility produced the data, get that utility—it's probably
less expensive than your time.
Putting it back
together
Sometimes,
extracting the data isn't enough—you need to insert replacement data
in its place. Once again, it's important to keep your goals in mind. There
is no point in replacing all of the data in an archive if just one file
needs to change, but if all of the data needs to change, it's easier to
reconstruct it from scratch.
If you only
need to change some of the data, and the replacement data is staying the
same size, you can take a shortcut by extracting the data before and after
the data that needs to change, and then recombining the sections with
the new data. Extracting the data can be accomplished with a good hex
editor, but I wrote a simple custom utility that performs the same function
from the command line, and can be placed in a batch file. Once you have
the extracted sections, and your new replacement data file in the correct
format, recombining is easy. To recombine, simply use the DOS command
line: COPY before.bin+middle.bin+after.bin final.bin. You can also use
cut and paste in your hex editor, if it's a one shot you wish to do by
hand. However, I like to use batch files in case the data changes again
later.
Sometimes
you need to generate header table information with size data in it. Write
a simple program to combine the files together, but if you are lazy like
me, you can just use an assembler. Some assemblers have the ability to
include binary files directly. Just insert labels at the beginning and
end of each included data file, add the appropriate alignment directives,
and create the data tables using pointer math. Then link to a non-executable
file and you have a nice binary file. Even if your assembler doesn't support
binary includes, you can convert the binary data to ASCII hex format,
and then use an include directive to incorporate it into your assembler
file. It may sound cumbersome, but it works, and it saves you writing
a custom tool to put together just one file.
If all else
fails, you can always write a custom tool. Some companies have made special
libraries to make this task easier, which isn't such a bad idea if you
expect to do this kind of work.
Recommended Tools
Here are
a few commercial and shareware tools that I've found useful over the years:
- Hex Workshop
from Breakpoint Software is a great hex editor that supports cut and
paste, variable width listing, Motorola and Intel byte ordering, interpretation
of data as different size entries, and many other useful features.
- 4DOS/4NT/4OS2
from JP Software, is an excellent replacement for "command.com".
It offers command line completion and history, directory coloring and
file comments, a great file list feature that supports hex output, and
a whole suite of functions and extended commands for writing batch files.
- Exe Scope
from Toshifumi Yamamoto is a nifty tool that allows you to edit the
resource data of windows executables, including graphical dialog editing.
You can't beat this tool if you need to change menus for windows programs
without the source.
- Adobe's
Photoshop and JASC's Paint Shop Pro are two of the best graphics editing
programs—Photoshop is better in general, but is more expensive.
Both programs have their strengths.
- Sonic
Foundary's Sound Forge and Syntrillium's Cool Edit are the two best
audio editing programs. Sound Forge is very powerful, but Cool Edit
is reasonable for most tasks, and is available as a free demo.
- NJ Star
and NJ Communicator from NJ Star Software Corp. If you need to display
text in Japanese, Chinese, or Korean, these programs are for you.
Conclusion
Asset recovery
will always be an unfortunate but necessary task. However, if you are
careful and keep focused on exactly what data needs to be recovered for
what purpose, and you make proper use of the resources at hand, the task
can be accomplished economically. Management may balk at buying tools
for your asset recovery task, but they are almost always worth it! How
much is your time as an engineer worth? How much does the delay in releasing
your product cost you? Recovery
almost always beats abandoning a moneymaking project! Remember, the best
recovery is the recovery you don't have to do, so keep good backups of
your own data!