Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Asset Recovery: What to do When the Data is Gone!
View All     RSS
January 16, 2021
arrowPress Releases
January 16, 2021
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Asset Recovery: What to do When the Data is Gone!


August 3, 2001 Article Start Previous Page 5 of 5
 

CD Images
Sometimes, the data you need has been formatted as CD sectors. CD sectors have a number of different flavors, but raw mode sectors generally have periodic header information every 2336 (0x920) or 2352 (0x930) bytes (or some multiple of 16 close to that). These sectors have the sector number encoded in BCD as part of the header.

CD Sector header. Notice that 0x2DF0 is a multiple of 0x930 (it's 5 times 0x930). So the 5 at the end of that line is part of the BCD encoded sector number.

The purpose of all this, is so that you can determine if you have CD sector data (which may indicate a compressed video or audio file), and also determine which sections of that data to strip out in order to retrieve the real data without gaps.

A CD sector is about 2048 (0x800) bytes, if the headers are omitted (so called "cooked" sectors). In this case, there is usually no additional data per sector. Cooked sectors are ISO 9660 tracks, whereas raw mode sectors are usually CDDA, video, or other track data.

Archive files
On consoles, where all data must be accessed from CD, it is common practice to combine all data into one archive data file with each sub-file aligned on sector size boundaries. In doing so, storing (or reading) the directory can be eliminated and any file can be accessed directly by a single CD seek to a given sector. However, you still have to know the sector number in order to find the data. Sometimes, the sector numbers are hard-coded into the source, but more often than not, there is a simple table loaded that stores the offsets, and possibly the sizes of the data files. Archive files are also used to reduce load times when you want all of the data for a given game level to be loaded into memory at once, requiring no decompression in the process.

Archive files often have the offset data encoded in the first few sectors of the file data, although they are sometimes in a completely separate file. A game is encoded with hard sector offsets, but keeps a separate table for external tool usage. Generally, the files in the offset table are in the same order as in the data file. This means that we can tell if we are looking at offset table data given the fact that the numbers are increasing. They may be packed oddly though, such as one game I worked on used that used 24 bit offset values.

In addition, the units may not be obvious. Even if the data is aligned to CD sectors, the offsets may be in other units. Common units for offset tables are CD sectors, 1k, 16 bytes, four bytes or one byte, however, other units may be possible. There may be other data interleaved with the offset data, such as the file size (again, they may be in different units), file name, attribute data, compression header, etc. The best way to determine what the units are, is to guess and then look at the data to see if it makes sense. Generally, the end of each file is padded with zeros or 0xFF's, so you can easily identify the units when you find the start of a file.

Some files only have size data instead of offset data. In that case, you can't count on the numbers to increase always. You have to do a bit more math to add up the offsets, but by guessing units and checking them against the data, you can find the correct format.

The data file you find may have sub-files in it with their own offset data, sometimes with a completely different format. Just remember to document everything so that you can put the data back together again.

Compressed data
Most of these rules are only useful for uncompressed data. When the data is compressed, your options are considerably limited. Once again, go back to what you have, and look to see if you have source code or tools that you can use. If you have source code for a compressor, you can reverse engineer a de-compressor.

Most companies use proprietary internal tools for data compression, but they will use a third party tool or library for video or audio. Many PC games actually use PKZip for their data compression. Compare your file against file formats that might fit the bill. Check the file extension against possible compressed file types. If you can determine that a commercially available utility produced the data, get that utility—it's probably less expensive than your time.

Putting it back together

Sometimes, extracting the data isn't enough—you need to insert replacement data in its place. Once again, it's important to keep your goals in mind. There is no point in replacing all of the data in an archive if just one file needs to change, but if all of the data needs to change, it's easier to reconstruct it from scratch.

If you only need to change some of the data, and the replacement data is staying the same size, you can take a shortcut by extracting the data before and after the data that needs to change, and then recombining the sections with the new data. Extracting the data can be accomplished with a good hex editor, but I wrote a simple custom utility that performs the same function from the command line, and can be placed in a batch file. Once you have the extracted sections, and your new replacement data file in the correct format, recombining is easy. To recombine, simply use the DOS command line: COPY before.bin+middle.bin+after.bin final.bin. You can also use cut and paste in your hex editor, if it's a one shot you wish to do by hand. However, I like to use batch files in case the data changes again later.

Sometimes you need to generate header table information with size data in it. Write a simple program to combine the files together, but if you are lazy like me, you can just use an assembler. Some assemblers have the ability to include binary files directly. Just insert labels at the beginning and end of each included data file, add the appropriate alignment directives, and create the data tables using pointer math. Then link to a non-executable file and you have a nice binary file. Even if your assembler doesn't support binary includes, you can convert the binary data to ASCII hex format, and then use an include directive to incorporate it into your assembler file. It may sound cumbersome, but it works, and it saves you writing a custom tool to put together just one file.

If all else fails, you can always write a custom tool. Some companies have made special libraries to make this task easier, which isn't such a bad idea if you expect to do this kind of work.

Recommended Tools

Here are a few commercial and shareware tools that I've found useful over the years:

  • Hex Workshop from Breakpoint Software is a great hex editor that supports cut and paste, variable width listing, Motorola and Intel byte ordering, interpretation of data as different size entries, and many other useful features.
  • 4DOS/4NT/4OS2 from JP Software, is an excellent replacement for "command.com". It offers command line completion and history, directory coloring and file comments, a great file list feature that supports hex output, and a whole suite of functions and extended commands for writing batch files.
  • Exe Scope from Toshifumi Yamamoto is a nifty tool that allows you to edit the resource data of windows executables, including graphical dialog editing. You can't beat this tool if you need to change menus for windows programs without the source.
  • Adobe's Photoshop and JASC's Paint Shop Pro are two of the best graphics editing programs—Photoshop is better in general, but is more expensive. Both programs have their strengths.
  • Sonic Foundary's Sound Forge and Syntrillium's Cool Edit are the two best audio editing programs. Sound Forge is very powerful, but Cool Edit is reasonable for most tasks, and is available as a free demo.
  • NJ Star and NJ Communicator from NJ Star Software Corp. If you need to display text in Japanese, Chinese, or Korean, these programs are for you.

Conclusion

Asset recovery will always be an unfortunate but necessary task. However, if you are careful and keep focused on exactly what data needs to be recovered for what purpose, and you make proper use of the resources at hand, the task can be accomplished economically. Management may balk at buying tools for your asset recovery task, but they are almost always worth it! How much is your time as an engineer worth? How much does the delay in releasing your product cost you? Recovery almost always beats abandoning a moneymaking project! Remember, the best recovery is the recovery you don't have to do, so keep good backups of your own data!

 

 

 


Article Start Previous Page 5 of 5

Related Jobs

Sucker Punch Productions
Sucker Punch Productions — Bellevue, Washington, United States
[01.15.21]

Producer
Remedy Entertainment
Remedy Entertainment — Espoo, Finland
[01.14.21]

Senior Development Manager (Gameplay)
Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[01.12.21]

Experienced Game Developer





Loading Comments

loader image