|
Features

The Game Asset Pipeline:
Managing Asset Processing
Building
Robust Tools
One
of the key requirements of any asset processing system is that it
must be robust under as many conditions as possible. Various mechanisms
for dealing with broken source assets were discussed previously,
but little mention was made of the steps the tools themselves can
take to make sure that they fail as infrequently as possible, and
that failures are handled sensibly.
Be
Lenient in What You Accept, but Strict in What You Output
When
writing any system that must interoperate with others outside your
control, this is a good mantra to adopt. While your internal file
formats will only be seen by a small number of programs, quite likely
all written by one person or based on the same source code and libraries,
when handling files created by or for the use of external applications,
it is necessary to allow for a wide variation in the interpretations
of the format specifications.
Most
common file formats have been reasonably well documented, but even
the best documentation still leaves vague areas or places where
the precise behavior is deliberately left undefined for some reason.
In some cases there are several sets of (often conflicting) documentation,
or even worse, none at all. In these cases, a useful addendum to
the above is "expect the unexpected." If it's at all possible
within the basic structure of the format, chances are someone will
have done it.
In
recent years, many specification documents have adopted an "RFC-like"
("Request For Comments" documents are a set of publicly
available technical notes, mostly defining protocols and standards
for Internet use) style when describing the behavior expected from
applications. Many RFC notes use a common set of strict definitions
of the words "must," "should," and "may"
to avoid any possibility of misunderstandings. These definitions
are as follows:
MUST:
indicates something that is an absolute requirement. For example,
"the index field MUST be an unsigned 32-bit integer."
SHOULD: indicates that there may be valid reasons that
this requirement can be ignored, but applications should not do
this without first considering the consequences of doing so. For
example "the header SHOULD include the name of the source
file."
MAY: indicates that this requirement is optional, and it
is up to the application to decide if it should implement it or
not. For example "this block MAY be followed by one containing
additional metadata."
From
the perspective of an application reading a file that has been specified
in such a manner, it is "safe" to assume that any compliant
application will have implemented any "MUST" requirement,
but the possibility that "SHOULD" or "MAY" requirements
have not been met must be taken into account. However, as the description
states, when writing to a file, unless there is a very good reason
to do otherwise, "SHOULD" requirements should be met.
This ensures the maximum possibility that another application (which
potentially may have ignored these rules) can read the file correctly.
Regardless
of these specifications, it is good practice to perform sanity checks
on values read in from any file if there is the potential for them
to cause significant harm (for example, indices that are used to
reference arrays or the sizes of structures. In particular, one
point worthy of special mention is that many formats do not explicitly
state if values are signed or unsigned (and even when they do, this
is often ignored). This can lead to serious problems if a negative
value is inserted, as it will appear to be a very large positive
number when read in an unsigned fashion (and, indeed, vice versa).
Performing bounds checking on input values can help catch these
problems quickly.
Handling
Tool Failure
In
addition to performing input validation, some mechanism for reporting
failure in tools is also required, to handle situations where the
data is sufficiently broken that recovery is impossible. Ideally,
this should allow the system to take a suitable response to the
broken data by rolling back to a previous version of the source
asset file and retrying the processing step.
When
reporting an error, it is generally best for the tool to supply
as much descriptive information as possible about the problem. This
can then be logged by the system, and used to diagnose the fault.
If an e-mail server is available, then e-mailing this information
to the pipeline or tool's maintainer is often a good idea. This
way, they can often immediately see what the cause of the problem
is, without having to actually track down the offending file's log
entry.
The
error report should also include, in machine-readable form, the
names of the input files that caused the problem, and the state
of any output files that were altered by the processing. This will
enable the pipeline to perform the necessary recovery actions, removing
or replacing the (potentially) corrupted output, and finding an
alternative set of input files if they exist. It is particularly
important that this is done whenever possible on more complex processing
operations, as they may involve a large number of files, and in
the absence of this information the pipeline may have to assume
that all of the input or output files are potentially invalid.
When
actually logging the error, this information can be added to by
the pipeline; the most useful additional information is that which
allows the problem to be recreated. In general, this will include
the names and versions of all of the input files for the tool (and
the tool itself), the command line it was executed with, and any
other relevant system information such as environment variable values,
available memory, and so on. With this, when a problem occurs there
is a good chance that the fault can easily be recreated in a controlled
environment (such as under a debugger).
In
circumstances where tools are frequently being called with ephemeral
intermediate input files, it can even be useful to store a copy
of all the input data for the task that caused the error along with
the report. This way, it is not necessary to perform all of the
preceding processing steps to recreate the problem, and if the failure
was due to a fault in an intermediate tool the broken data will
be available to examine.
Redirecting
Output
Another
technique that can make debugging asset problems much simpler is
if all the output from the tools is archived in a consistent location,
for example, in a database or directory structure that mirrors the
layout of the files in the pipeline. This way, for any given intermediate
or output file (even if the pipeline detected no errors), the debug
output can be quickly located. This can be very useful when trying
to diagnose problems the pipeline has missed, such as "why
is this model ten times smaller than it should be?" With some
care, it can also be used to gather detailed statistics about various
parts of the pipeline, such as the average performance of the triangle
stripification or the distribution of mesh compression schemes.
This
output redirection can be done on an individual tool level, but
it is generally more useful to implement it as part of the overall
pipeline functionality. This can be easily done by redirecting the
standard I/O streams, and, if necessary, hooking the debug output
functions (OutputDebugString()
on Windows systems). Handling the redirection in this high-level
manner both reduces the amount of code required in each tool, and
provides redirection for third-party utilities or other similar
"black box" components.
Fatal
Errors
Regardless
of the amount of protection in place, however, there will always
be cases where either as the result of an invalid file or simply
due to a bug, one of the processing tools crashes completely. These
situations can be very difficult to deal with, because in the absence
of detailed debugging information, the only solution is to run the
offending tool under a debugger to find out where the crash occurred.
This is quite time consuming, and in mature toolsets often leads
not to an actual bug, but rather to an unexpected set of conditions
in the input data. Having as much debugging information as possible
can help considerably here.
Infinite
Loops
One
particularly nasty class of fatal error is that where a tool enters
an infinite loop. This is quite hard to detect, as no actual error
occurs, but instead the processing never completes.
The
basic mechanism for detecting infinite loops is to implement a timeout,
whereby the tool is forcibly terminated if the processing takes
more than a specified amount of time. However, this time can vary
wildly among different tools. For example, if a texture resizing
operation takes more than a few minutes it has almost certainly
crashed, but calculating lighting information for a large level
may easily take a few hours to normally complete. Therefore,
some amount of manual tweaking of timeouts will usually be necessary
to avoid terminating tools prematurely.
Another
mechanism that can be used to assist in detecting infinite loops
is to allow the tool to expose a "progress meter" to the
pipeline. Essentially, this is just a value between 0 and 100 (or
any other arbitrary value) that indicates how far through the processing
the tool has got. The pipeline can then implement a timeout that
triggers if no progress has been made for a certain period
of time, or if the progress meter goes backwards (a fairly sure
sign of a bug!). This approach is more efficient, both because less
manual tweaking of timeouts is required, and it is capable of detecting
crashes that occur early in long processing operations without having
to wait until the full time limit expires.
A
progress meter can also be a useful tool for other purposes, such
as judging how long a process is likely to take, and as a means
of preventing human induced crashes when someone decides that a
process has taken "too long" and kills it manually!
Debugging
the Pipeline
If
detailed logs are kept of tool execution, then many problems can
be diagnosed simply by examining these, especially if the failure
was caused by an assert()
statement or some other situation that the code was able to catch
and respond to. However, there will always be cases where a crash
in a tool must be debugged "directly," by examining the
code executed up to the point of the failure.
Regardless
of the circumstances, whenever a crash in a tool is detected, if
at all possible, either the tool itself or the pipeline should attempt
to write out a stack track, register list, and possibly a memory
dump; on most operating systems there are relatively straightforward
functions provided for doing these. This can be absolutely invaluable
in debugging hard to recreate problems, because all the information
that can be obtained from viewing the crash in a normal debugger
can be gleaned (with a greater or lesser degree of effort) from
the dump information. Some debuggers even allow crash dumps to be
loaded and viewed directly as though the crash had occurred locally,
making the process even more efficient.
If
a crash dump is not available, or the problem cannot be diagnosed
from it, then it will be necessary to recreate the circumstances
that led to the failure. This is where the detailed execution environment
information the pipeline should report in the log file comes in
useful. By retrieving the versions of the input files specified,
and re-running the tool with the same command line and options,
it should be possible to cause the crash to happen again. This is
essential both for diagnosing the problem and then verifying that
it has indeed been fixed.
Un-reproducible
Bugs
As
with any complex system, any asset pipeline will always exhibit
a few bugs that cannot be reproduced in a controlled environment,
or may even disappear when the pipeline runs exactly the same processing
operation a second time! These are often due to the precise timing
between events (this is particularly an issue if multiple processing
tasks are being executed simultaneously), the layout of memory at
the time of the failure, or simply hardware or OS faults.
As
recreating them is nearly impossible, debugging such problems is
almost always possible only with detailed logs and crash dump information.
Even worse, it can be very hard to prove that such a bug has been
fixed: sometimes changing another unrelated section of the code
can cause it to disappear, simply because the sequence of events
that revealed the problem now occurs more rarely.
There
is little that can be done to mitigate these problems, except for
ensuring that all of the tool code is as robust as possible, and
that the maximum amount of available information is gathered when
a crash does happen. In the worst case scenario, it may be necessary
to run the entire pipeline in debug mode or under a debugger to
find the problem. Although, it is worth noting that in some rare
cases this additional instrumentation can prevent the fault from
occurring!
Maintaining
Data Integrity
Aside
from producing as much information as possible to help locate the
problem, the other main task of the pipeline when a crash occurs
is to recover as safely as possible, and continue in as normal a
manner as possible. A critical part of this process is ensuring
that any data files that were modified by the tool that crashed
are safely removed or reverted to known good versions. Otherwise
a single error can cause a cascade of failures as each successive
step in the pipeline tries to use the corrupt data output by the
first tool!
This
"clean up" process mainly involves removing any temporary
files that were created and deleting or invalidating output data
that may be truncated or corrupt. Having a dirty flag in the file
headers can be a big help here, as it allows partially written files
to be easily detected. If checksums of files are being stored for
the purposes of detecting changes, then these, too, can be used
to detect modifications.
Modified
intermediate files can either be deleted entirely and then recreated
by re-running the tool with the previous set of input data, or by
replacing them with the last versions directly (assuming that these
are stored somewhere). Either approach works well, although the
latter is generally preferable where possible as it reduces the
amount of time needed for the recovery operation.
Another
possible approach to take to ensure the integrity of data in the
pipeline is to "sandbox" each tool's execution. In this
case, the files the tool may modify are copied prior to its execution,
and the tool operates on those copies. Only once the task has been
successfully completed do the original files get overwritten with
the updated versions.
This
approach makes sure that an errant tool cannot corrupt files when
it fails (clearly, no such guarantee can be made if the tool claims
to have executed successfully), and for further safety all of the
tool's input and output files can be moved to another directory
before processing, thereby ensuring that no files other than those
specified as outputs can be accidentally modified. In this case,
the truly paranoid can even take the step of making the rest of
the pipeline data unwritable to the tools if desired. Sandboxing
the execution in this manner is a very effective safeguard, but
it does introduce additional overheads in the execution of each
processing step.
Conclusion
Dependency
analysis plays a vital role in improving the efficiency of asset
processing operations, by ensuring that only the files directly
affected by each change to the source assets are updated. As this
is a common problem, particular in source code compilation, there
are many existing tools available that perform this task well -
the make utility being the most popular.
Another
important prerequisite for building an effective asset pipeline
is a strong framework for tools, and well-defined file formats for
interchange of information. The effort expended on getting these
aspects of the system right is well worth it, as they will have
an effect on virtually every stage of the process. Wherever possible,
common functionality should be integrated into this framework, speeding
the development and improving the robustness of every tool based
on it. Isolating tools from each other as much as possible is also
a useful technique for ensuring that failures in one section of
the pipeline do not affect others.
--
This
article is excerpted from The Game Asset Pipeline. (ISBN
# 1-58450-342-4). For more information about the book, please visit
http://www.charlesriver.com/Books/BookDetail.aspx?productID=88993.
______________________________________________________
|