Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
Book Excerpt: The Game Asset Pipeline: Managing Asset Processing
View All     RSS
July 14, 2020
arrowPress Releases
July 14, 2020
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Book Excerpt: The Game Asset Pipeline: Managing Asset Processing

February 21, 2005 Article Start Page 1 of 3 Next

Dependency-Based Processing

The idea behind this dependency-based strategy is quite simple. A dependency represents a link between a source asset and processed output file, indicating that the latter contains data that is provided (or affected by) the former. So, we say that the output file depends on the source asset, and that the asset is a prerequisite of the output file. This is a many-to-many relationship; one output file may depend on many source assets (consider, for example, a model file containing a mesh, textures, and animation data), and one source asset may generate many output files.

Dependencies are not limited to being simple links between pairs of files, either; if some files are built using intermediate files, or depend on other output files, then a dependency chain emerges, where each of the dependencies of a file may in turn have dependencies of their own. Figure 1 shows a simple dependency chain for a character model. If the entire asset pipeline was viewed, elements of this chain (for example, the run animation) might be used in other characters as well, and therefore have additional dependent resources.

Walking along the dependency chain for an output file, therefore, provides a list of all of the source (and intermediate) files that affect it, and hence may cause it to be rebuilt if they change. However, while this is a useful view conceptually, in practical terms it is usually more useful to look at dependency chains the other way around: for a given source asset, walking along the chain for its dependents will give a list of output files that must be updated if it is changed.

Figure 1: A simple asset dependency chain.

Dependency chains do not generally exist in isolation, either; chains frequently meet and overlap (for example, if one intermediate file or asset is used by many processes). This is actually another very useful property because in doing so, they provide all the information needed to minimize the amount of effort required to perform a single set of updates.

Consider the case where a number of source assets have all been changed. If each change is processed independently, and the individual dependents of the source asset updated, then some output and intermediate files may be updated several times. This is particularly problematic in the case where there are several "layers" of intermediate files depending on one another; in these cases, it is hard to remove the unnecessary updates because only the last update of any given asset is guaranteed to have a complete set of up-to-date intermediate files!

Figure 2 shows an example of this type of more complex dependency chain. The knight and paladin models share the same run animation, but have different base meshes. However, they both use the same texture page and therefore, the intermediate texture page file is shared between them.

Figure 2: Dependencies on shared assets.

The dependency chains contain the solution to this, as they store all of the necessary information about the relationships between the files to ensure that every file (both intermediate and output) is updated once only, but in the correct order to ensure that old data is never used. This is done by walking through all of the dependency chains simultaneously, and building a queue of the files that must be processed.

One very straightforward way to do this is to exploit the fact that the dependency chains themselves encode the order that operations must be performed in. To build the queue of operations is a simple iterative process, using a list of potentially modified files as the basis.

The first step of the procedure is to take every source file that has changed, and recursively walk down to all its dependents, adding each to the list (if it is not already present). After this step, the complete set of files that must be updated is stored on the list and the processing order can be determined.

This is done by repeatedly walking through the list and checking each file to see if it is ready to be processed. This is done by examining the files it immediately depends on; that is, those prerequisites that are directly linked to it. If any of those files is still on the list, then it cannot be processed yet, and is skipped. However, if none is present, then the file is moved to the end of the queue. This process is then repeated until there are no files left on the list. With this done, the queue contains an ordered list of the files for processing, so that every file is only updated once, and all of the prerequisite files are updated before each.

While in the majority of cases the files will be processed in a linear manner, and therefore this queue is all that is required for the operation to begin, it is also possible to produce output in a form suitable for processing many assets in parallel, for example, using a distributed network of machines, or a multi-CPU system. To do this, the same procedure is used, but with a marker added to the items on the list. When an item with no prerequisites is found, instead of being moved to the queue immediately it is marked and left in place. Then, when the end of the list is reached, all of the marked files are moved into the queue as a "batch." Each of these batches consist of files that are ready for processing but are also guaranteed to be independent of each other, so they can all be handled simultaneously if needed.

While in many cases analyzing the dependency information once and then processing the resulting queue of output files is enough; in cases where there are large numbers of changes being made to the source assets, it may be desirable to update the processing queue as changes are made. This can be done very simply by taking the current outstanding queue entries and adding the dependents of the newly-modified assets to them, creating a new list of files that need updating. Then, the dependency analysis procedure can be repeated using this input list to generate a new queue for continued processing, thereby ensuring that any changes caused by the new updates are correctly inserted into the processing order.

This technique can be very useful if the asset processing system allows multiple tasks to run concurrently, as it means that a single processing operation does not block the entire system until it completes, though unrelated operations may still be executed in parallel with it.

Determining Asset Dependencies

One of the key problems faced when implementing a system of this nature is how to actually construct the dependency information for the assets in the first place. The mechanisms for doing this will depend to a large degree on the processing tools and files being used, but there are some general areas that most techniques fall into:

Explicitly Stored Dependency Information

In some systems, such as the make tool which will be described in more detail later, the dependency information is stored as part of the script that describes all the desired processing operations. In general, this file is human generated, although dependencies can be specified for groups or types of files as well as individual assets, reducing the amount of maintenance required. This approach has the advantage that it is very easy to see and edit the dependency information, especially if it is necessary to add some special case entries for certain assets.

However, there are several fairly significant disadvantages of this system. Dependencies must be consistent across fairly large groups of files, otherwise a lot of manual editing is required. It is also impossible to encode dependency information that is based on the contents of the assets. So, for example, making a model file dependent on the textures it uses is impossible unless a human (or another tool) updates the dependency information by hand.

Dependency Information Stored in Assets

Another approach is to store the dependencies of asset files in the file itself. This way, the dependency information can be built by the exporter or tool that creates the file, based on the information it has about the contents. This makes this approach very suitable for handling assets such as models which may be formed from several separate files. It is also generally quite straightforward to implement, although a unified format for storing this information (either as part of the asset file, or in a separate metadata file) is required.

The main disadvantage of this approach is that it is only suitable in circumstances where the dependency chain for an asset can be easily predicted ahead of the processing itself, and is not likely to change often. This is because the information is generally needed to form dependencies for files other than the one it is actually stored in. For example, storing a list of textures used in a model file does not actually define prerequisites for the model file itself, as it is a source asset and has none. Instead, this information is used to construct the prerequisites for the processed file(s) created from this model.

Dependency Information Generated by the Processing Code

The final approach is to generate the dependency information "on the fly" by using the processing code (or a subset of it) to read each asset and build the dependency tree. This approach has the major advantage that it can easily handle very complex interdependencies between assets based on their contents, and it is relatively easy to maintain, once the initial framework is in place. Also, by building the dependency information this way, changes in the structure can be easily implemented, without having to edit external files, re-export, or reprocess assets to update their stored data.

However, the process of building the dependency information can be quite slow, and must be repeated whenever an asset changes. It also means that the dependency information is not easily visible for debugging purposes, or editable in the event that a special case change is required.

Of course, there is no requirement that only one of these approaches is taken; it is not uncommon to use a combination, picking the most appropriate technique for different types of assets or processing requirements. Dependency information from a number of different sources can be easily integrated into a single dependency tree for processing, and it is even relatively straightforward to remove all of the dependency information for a given asset or assets and re-insert it if changes to the asset that affect its dependencies occur during processing.

Determing When Assets Have Changed

The procedure for actually determining when an asset has been modified depends largely on the structure of the asset management system in use. If a version control system of some description is employed, then it is simply a case of either comparing the version numbers of each asset in the database with the last processed copy, or just retrieving the list of modifications in every changelist since the last update was performed.

On a flat file system, it is slightly more difficult to detect changes, although there are some methods that work relatively well. The most commonly used system is simply to compare the "last modified" date on each file, and check if it is newer than the last version that was processed (or newer than the processed output file, in some systems). This is not particularly robust, though, as it can be easily confused by actions such as "rolling back" files (by copying a previous version over the top), or if a machine's internal clock is wrong! It does have the major advantage of being very fast, and requiring little or no external information about the files.

Another, more stable method is to take a checksum of the files each time they are processed, and compare that against the stored copy. If a strong checksum or hashing system (the MD5 algorithm is a popular choice for this) is used, then the possibility of a collision, where two different files generate the same checksum value, occurring is infinitesimally small. Therefore, the check is a very robust way to determine if a file has changed. However, using this system requires that the entire source asset be read and the checksum calculated every time it needs to be checked, a fairly slow procedure.

If the file formats of the files being used are all under the control of the pipeline developer, or separate metadata storage is available, then one way to avoid this problem is to store the checksum in the file itself, thereby requiring only a handful of bytes to be read and compared to check for updates. However, it is comparatively rare that it is possible to do this for all types of asset files.

Another common compromise is to use both techniques, employing a simple timestamp based test for day-to-day updates, but performing a full checksum comparison on an overnight or weekend basis. This way, any assets that become "stale" as a result of an invalid modification date will be caught and fixed the next time a complete update is performed.

A less widely-employed, but useful in some circumstances, approach is to delegate the task of checking asset versions to the specific tools that perform the processing (sometimes after first checking the timestamp or checksum as an "early out" test). This allows the tool to perform much more fine-grained checking on the file, and determine which sections, if any need updating. For example, in the case of a game where levels are stored as a single large map file, it may be desirable for the map building tool to determine which sections have been modified and only update dependent files related to those, rather than the entire map.


Article Start Page 1 of 3 Next

Related Jobs

Mountaintop Studios
Mountaintop Studios — Los Angeles, California, United States

Engine/Systems Engineer (remote)
Mountaintop Studios
Mountaintop Studios — Los Angeles, California, United States

Graphics Engineer (remote)
Yacht Club Games
Yacht Club Games — Los Angeles, California, United States

Senior 3D Technical Artist
Mountaintop Studios
Mountaintop Studios — Los Angeles, California, United States

Network Engineer (remote)

Loading Comments

loader image