|
Features

The Game Asset Pipeline:
Managing Asset Processing
Dependency-Based Processing
The idea behind this dependency-based strategy is
quite simple. A dependency represents a link between a source asset
and processed output file, indicating that the latter contains data
that is provided (or affected by) the former. So, we say that the
output file depends on the source asset, and that the asset
is a prerequisite of the output file. This is a many-to-many
relationship; one output file may depend on many source assets (consider,
for example, a model file containing a mesh, textures, and animation
data), and one source asset may generate many output files.
Dependencies
are not limited to being simple links between pairs of files, either;
if some files are built using intermediate files, or depend on other
output files, then a dependency chain emerges, where each
of the dependencies of a file may in turn have dependencies of their
own. Figure 1 shows a simple dependency chain for a character model.
If the entire asset pipeline was viewed, elements of this chain
(for example, the run animation) might be used in other characters
as well, and therefore have additional dependent resources.
|
|
|
 |
 |
 |
Figure
1: A simple asset dependency chain.
|
Walking
along the dependency chain for an output file, therefore, provides
a list of all of the source (and intermediate) files that affect
it, and hence may cause it to be rebuilt if they change. However,
while this is a useful view conceptually, in practical terms it
is usually more useful to look at dependency chains the other way
around: for a given source asset, walking along the chain for its
dependents will give a list of output files that must be updated
if it is changed.
Dependency
chains do not generally exist in isolation, either; chains frequently
meet and overlap (for example, if one intermediate file or asset
is used by many processes). This is actually another very useful
property because in doing so, they provide all the information needed
to minimize the amount of effort required to perform a single set
of updates.
Consider
the case where a number of source assets have all been changed.
If each change is processed independently, and the individual dependents
of the source asset updated, then some output and intermediate files
may be updated several times. This is particularly problematic in
the case where there are several "layers" of intermediate
files depending on one another; in these cases, it is hard to remove
the unnecessary updates because only the last update of any
given asset is guaranteed to have a complete set of up-to-date intermediate
files!
Figure
2 shows an example of this type of more complex dependency chain.
The knight and paladin models share the same run animation, but
have different base meshes. However, they both use the same texture
page and therefore, the intermediate texture page file is shared
between them.
|
|
|
 |
 |
 |
Figure
2: Dependencies on shared assets.
|
The
dependency chains contain the solution to this, as they store all
of the necessary information about the relationships between the
files to ensure that every file (both intermediate and output) is
updated once only, but in the correct order to ensure that old data
is never used. This is done by walking through all of the dependency
chains simultaneously, and building a queue of the files that must
be processed.
One
very straightforward way to do this is to exploit the fact that
the dependency chains themselves encode the order that operations
must be performed in. To build the queue of operations is a simple
iterative process, using a list of potentially modified files as
the basis.
The
first step of the procedure is to take every source file that has
changed, and recursively walk down to all its dependents, adding
each to the list (if it is not already present). After this step,
the complete set of files that must be updated is stored on the
list and the processing order can be determined.
This
is done by repeatedly walking through the list and checking each
file to see if it is ready to be processed. This is done by examining
the files it immediately depends on; that is, those prerequisites
that are directly linked to it. If any of those files is still on
the list, then it cannot be processed yet, and is skipped. However,
if none is present, then the file is moved to the end of the queue.
This process is then repeated until there are no files left on the
list. With this done, the queue contains an ordered list of the
files for processing, so that every file is only updated once, and
all of the prerequisite files are updated before each.
While
in the majority of cases the files will be processed in a linear
manner, and therefore this queue is all that is required for the
operation to begin, it is also possible to produce output in a form
suitable for processing many assets in parallel, for example, using
a distributed network of machines, or a multi-CPU system. To do
this, the same procedure is used, but with a marker added to the
items on the list. When an item with no prerequisites is found,
instead of being moved to the queue immediately it is marked and
left in place. Then, when the end of the list is reached, all of
the marked files are moved into the queue as a "batch."
Each of these batches consist of files that are ready for processing
but are also guaranteed to be independent of each other, so they
can all be handled simultaneously if needed.
While
in many cases analyzing the dependency information once and then
processing the resulting queue of output files is enough; in cases
where there are large numbers of changes being made to the source
assets, it may be desirable to update the processing queue as changes
are made. This can be done very simply by taking the current outstanding
queue entries and adding the dependents of the newly-modified assets
to them, creating a new list of files that need updating. Then,
the dependency analysis procedure can be repeated using this input
list to generate a new queue for continued processing, thereby ensuring
that any changes caused by the new updates are correctly inserted
into the processing order.
This
technique can be very useful if the asset processing system allows
multiple tasks to run concurrently, as it means that a single processing
operation does not block the entire system until it completes, though
unrelated operations may still be executed in parallel with it.
Determining
Asset Dependencies
One
of the key problems faced when implementing a system of this nature
is how to actually construct the dependency information for the
assets in the first place. The mechanisms for doing this will depend
to a large degree on the processing tools and files being used,
but there are some general areas that most techniques fall into:
Explicitly
Stored Dependency Information
In
some systems, such as the make tool which will be described
in more detail later, the dependency information is stored as part
of the script that describes all the desired processing operations.
In general, this file is human generated, although dependencies
can be specified for groups or types of files as well as individual
assets, reducing the amount of maintenance required. This approach
has the advantage that it is very easy to see and edit the dependency
information, especially if it is necessary to add some special case
entries for certain assets.
However,
there are several fairly significant disadvantages of this system.
Dependencies must be consistent across fairly large groups of files,
otherwise a lot of manual editing is required. It is also impossible
to encode dependency information that is based on the contents
of the assets. So, for example, making a model file dependent on
the textures it uses is impossible unless a human (or another tool)
updates the dependency information by hand.
Dependency
Information Stored in Assets
Another
approach is to store the dependencies of asset files in the file
itself. This way, the dependency information can be built by the
exporter or tool that creates the file, based on the information
it has about the contents. This makes this approach very suitable
for handling assets such as models which may be formed from several
separate files. It is also generally quite straightforward to implement,
although a unified format for storing this information (either as
part of the asset file, or in a separate metadata file) is required.
The
main disadvantage of this approach is that it is only suitable in
circumstances where the dependency chain for an asset can be easily
predicted ahead of the processing itself, and is not likely to change
often. This is because the information is generally needed to form
dependencies for files other than the one it is actually
stored in. For example, storing a list of textures used in a model
file does not actually define prerequisites for the model file itself,
as it is a source asset and has none. Instead, this information
is used to construct the prerequisites for the processed file(s)
created from this model.
Dependency
Information Generated by the Processing Code
The
final approach is to generate the dependency information "on
the fly" by using the processing code (or a subset of it) to
read each asset and build the dependency tree. This approach has
the major advantage that it can easily handle very complex interdependencies
between assets based on their contents, and it is relatively easy
to maintain, once the initial framework is in place. Also, by building
the dependency information this way, changes in the structure can
be easily implemented, without having to edit external files, re-export,
or reprocess assets to update their stored data.
However,
the process of building the dependency information can be quite
slow, and must be repeated whenever an asset changes. It also means
that the dependency information is not easily visible for debugging
purposes, or editable in the event that a special case change is
required.
Of
course, there is no requirement that only one of these approaches
is taken; it is not uncommon to use a combination, picking the most
appropriate technique for different types of assets or processing
requirements. Dependency information from a number of different
sources can be easily integrated into a single dependency tree for
processing, and it is even relatively straightforward to remove
all of the dependency information for a given asset or assets and
re-insert it if changes to the asset that affect its dependencies
occur during processing.
Determing
When Assets Have Changed
The
procedure for actually determining when an asset has been modified
depends largely on the structure of the asset management system
in use. If a version control system of some description is employed,
then it is simply a case of either comparing the version numbers
of each asset in the database with the last processed copy, or just
retrieving the list of modifications in every changelist since the
last update was performed.
On
a flat file system, it is slightly more difficult to detect changes,
although there are some methods that work relatively well. The most
commonly used system is simply to compare the "last modified"
date on each file, and check if it is newer than the last version
that was processed (or newer than the processed output file, in
some systems). This is not particularly robust, though, as it can
be easily confused by actions such as "rolling back" files
(by copying a previous version over the top), or if a machine's
internal clock is wrong! It does have the major advantage of being
very fast, and requiring little or no external information about
the files.
Another,
more stable method is to take a checksum of the files each time
they are processed, and compare that against the stored copy. If
a strong checksum or hashing system (the MD5 algorithm is a popular
choice for this) is used, then the possibility of a collision,
where two different files generate the same checksum value, occurring
is infinitesimally small. Therefore, the check is a very robust
way to determine if a file has changed. However, using this system
requires that the entire source asset be read and the checksum calculated
every time it needs to be checked, a fairly slow procedure.
If
the file formats of the files being used are all under the control
of the pipeline developer, or separate metadata storage is available,
then one way to avoid this problem is to store the checksum in the
file itself, thereby requiring only a handful of bytes to be read
and compared to check for updates. However, it is comparatively
rare that it is possible to do this for all types of asset files.
Another
common compromise is to use both techniques, employing a simple
timestamp based test for day-to-day updates, but performing a full
checksum comparison on an overnight or weekend basis. This way,
any assets that become "stale" as a result of an invalid
modification date will be caught and fixed the next time a complete
update is performed.
A
less widely-employed, but useful in some circumstances, approach
is to delegate the task of checking asset versions to the specific
tools that perform the processing (sometimes after first checking
the timestamp or checksum as an "early out" test). This
allows the tool to perform much more fine-grained checking on the
file, and determine which sections, if any need updating. For example,
in the case of a game where levels are stored as a single large
map file, it may be desirable for the map building tool to determine
which sections have been modified and only update dependent files
related to those, rather than the entire map.
______________________________________________________
|