Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Book Excerpt: The Game Asset Pipeline: Managing Asset Processing
View All     RSS
September 16, 2019
arrowPress Releases
September 16, 2019
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Book Excerpt: The Game Asset Pipeline: Managing Asset Processing


February 21, 2005 Article Start Previous Page 3 of 3
 

Building Robust Tools

One of the key requirements of any asset processing system is that it must be robust under as many conditions as possible. Various mechanisms for dealing with broken source assets were discussed previously, but little mention was made of the steps the tools themselves can take to make sure that they fail as infrequently as possible, and that failures are handled sensibly.

Be Lenient in What You Accept, but Strict in What You Output

When writing any system that must interoperate with others outside your control, this is a good mantra to adopt. While your internal file formats will only be seen by a small number of programs, quite likely all written by one person or based on the same source code and libraries, when handling files created by or for the use of external applications, it is necessary to allow for a wide variation in the interpretations of the format specifications.

Most common file formats have been reasonably well documented, but even the best documentation still leaves vague areas or places where the precise behavior is deliberately left undefined for some reason. In some cases there are several sets of (often conflicting) documentation, or even worse, none at all. In these cases, a useful addendum to the above is "expect the unexpected." If it's at all possible within the basic structure of the format, chances are someone will have done it.

In recent years, many specification documents have adopted an "RFC-like" ("Request For Comments" documents are a set of publicly available technical notes, mostly defining protocols and standards for Internet use) style when describing the behavior expected from applications. Many RFC notes use a common set of strict definitions of the words "must," "should," and "may" to avoid any possibility of misunderstandings. These definitions are as follows:

MUST: indicates something that is an absolute requirement. For example, "the index field MUST be an unsigned 32-bit integer."
SHOULD: indicates that there may be valid reasons that this requirement can be ignored, but applications should not do this without first considering the consequences of doing so. For example "the header SHOULD include the name of the source file."
MAY: indicates that this requirement is optional, and it is up to the application to decide if it should implement it or not. For example "this block MAY be followed by one containing additional metadata."

From the perspective of an application reading a file that has been specified in such a manner, it is "safe" to assume that any compliant application will have implemented any "MUST" requirement, but the possibility that "SHOULD" or "MAY" requirements have not been met must be taken into account. However, as the description states, when writing to a file, unless there is a very good reason to do otherwise, "SHOULD" requirements should be met. This ensures the maximum possibility that another application (which potentially may have ignored these rules) can read the file correctly.

Regardless of these specifications, it is good practice to perform sanity checks on values read in from any file if there is the potential for them to cause significant harm (for example, indices that are used to reference arrays or the sizes of structures. In particular, one point worthy of special mention is that many formats do not explicitly state if values are signed or unsigned (and even when they do, this is often ignored). This can lead to serious problems if a negative value is inserted, as it will appear to be a very large positive number when read in an unsigned fashion (and, indeed, vice versa). Performing bounds checking on input values can help catch these problems quickly.

Handling Tool Failure

In addition to performing input validation, some mechanism for reporting failure in tools is also required, to handle situations where the data is sufficiently broken that recovery is impossible. Ideally, this should allow the system to take a suitable response to the broken data by rolling back to a previous version of the source asset file and retrying the processing step.

When reporting an error, it is generally best for the tool to supply as much descriptive information as possible about the problem. This can then be logged by the system, and used to diagnose the fault. If an e-mail server is available, then e-mailing this information to the pipeline or tool's maintainer is often a good idea. This way, they can often immediately see what the cause of the problem is, without having to actually track down the offending file's log entry.

The error report should also include, in machine-readable form, the names of the input files that caused the problem, and the state of any output files that were altered by the processing. This will enable the pipeline to perform the necessary recovery actions, removing or replacing the (potentially) corrupted output, and finding an alternative set of input files if they exist. It is particularly important that this is done whenever possible on more complex processing operations, as they may involve a large number of files, and in the absence of this information the pipeline may have to assume that all of the input or output files are potentially invalid.

When actually logging the error, this information can be added to by the pipeline; the most useful additional information is that which allows the problem to be recreated. In general, this will include the names and versions of all of the input files for the tool (and the tool itself), the command line it was executed with, and any other relevant system information such as environment variable values, available memory, and so on. With this, when a problem occurs there is a good chance that the fault can easily be recreated in a controlled environment (such as under a debugger).

In circumstances where tools are frequently being called with ephemeral intermediate input files, it can even be useful to store a copy of all the input data for the task that caused the error along with the report. This way, it is not necessary to perform all of the preceding processing steps to recreate the problem, and if the failure was due to a fault in an intermediate tool the broken data will be available to examine.

Redirecting Output

Another technique that can make debugging asset problems much simpler is if all the output from the tools is archived in a consistent location, for example, in a database or directory structure that mirrors the layout of the files in the pipeline. This way, for any given intermediate or output file (even if the pipeline detected no errors), the debug output can be quickly located. This can be very useful when trying to diagnose problems the pipeline has missed, such as "why is this model ten times smaller than it should be?" With some care, it can also be used to gather detailed statistics about various parts of the pipeline, such as the average performance of the triangle stripification or the distribution of mesh compression schemes.

This output redirection can be done on an individual tool level, but it is generally more useful to implement it as part of the overall pipeline functionality. This can be easily done by redirecting the standard I/O streams, and, if necessary, hooking the debug output functions (OutputDebugString() on Windows systems). Handling the redirection in this high-level manner both reduces the amount of code required in each tool, and provides redirection for third-party utilities or other similar "black box" components.

Fatal Errors

Regardless of the amount of protection in place, however, there will always be cases where either as the result of an invalid file or simply due to a bug, one of the processing tools crashes completely. These situations can be very difficult to deal with, because in the absence of detailed debugging information, the only solution is to run the offending tool under a debugger to find out where the crash occurred. This is quite time consuming, and in mature toolsets often leads not to an actual bug, but rather to an unexpected set of conditions in the input data. Having as much debugging information as possible can help considerably here.

Infinite Loops

One particularly nasty class of fatal error is that where a tool enters an infinite loop. This is quite hard to detect, as no actual error occurs, but instead the processing never completes.

The basic mechanism for detecting infinite loops is to implement a timeout, whereby the tool is forcibly terminated if the processing takes more than a specified amount of time. However, this time can vary wildly among different tools. For example, if a texture resizing operation takes more than a few minutes it has almost certainly crashed, but calculating lighting information for a large level may easily take a few hours to normally complete. Therefore, some amount of manual tweaking of timeouts will usually be necessary to avoid terminating tools prematurely.

Another mechanism that can be used to assist in detecting infinite loops is to allow the tool to expose a "progress meter" to the pipeline. Essentially, this is just a value between 0 and 100 (or any other arbitrary value) that indicates how far through the processing the tool has got. The pipeline can then implement a timeout that triggers if no progress has been made for a certain period of time, or if the progress meter goes backwards (a fairly sure sign of a bug!). This approach is more efficient, both because less manual tweaking of timeouts is required, and it is capable of detecting crashes that occur early in long processing operations without having to wait until the full time limit expires.

A progress meter can also be a useful tool for other purposes, such as judging how long a process is likely to take, and as a means of preventing human induced crashes when someone decides that a process has taken "too long" and kills it manually!

Debugging the Pipeline

If detailed logs are kept of tool execution, then many problems can be diagnosed simply by examining these, especially if the failure was caused by an assert() statement or some other situation that the code was able to catch and respond to. However, there will always be cases where a crash in a tool must be debugged "directly," by examining the code executed up to the point of the failure.

Regardless of the circumstances, whenever a crash in a tool is detected, if at all possible, either the tool itself or the pipeline should attempt to write out a stack track, register list, and possibly a memory dump; on most operating systems there are relatively straightforward functions provided for doing these. This can be absolutely invaluable in debugging hard to recreate problems, because all the information that can be obtained from viewing the crash in a normal debugger can be gleaned (with a greater or lesser degree of effort) from the dump information. Some debuggers even allow crash dumps to be loaded and viewed directly as though the crash had occurred locally, making the process even more efficient.

If a crash dump is not available, or the problem cannot be diagnosed from it, then it will be necessary to recreate the circumstances that led to the failure. This is where the detailed execution environment information the pipeline should report in the log file comes in useful. By retrieving the versions of the input files specified, and re-running the tool with the same command line and options, it should be possible to cause the crash to happen again. This is essential both for diagnosing the problem and then verifying that it has indeed been fixed.

Un-reproducible Bugs

As with any complex system, any asset pipeline will always exhibit a few bugs that cannot be reproduced in a controlled environment, or may even disappear when the pipeline runs exactly the same processing operation a second time! These are often due to the precise timing between events (this is particularly an issue if multiple processing tasks are being executed simultaneously), the layout of memory at the time of the failure, or simply hardware or OS faults.

As recreating them is nearly impossible, debugging such problems is almost always possible only with detailed logs and crash dump information. Even worse, it can be very hard to prove that such a bug has been fixed: sometimes changing another unrelated section of the code can cause it to disappear, simply because the sequence of events that revealed the problem now occurs more rarely.

There is little that can be done to mitigate these problems, except for ensuring that all of the tool code is as robust as possible, and that the maximum amount of available information is gathered when a crash does happen. In the worst case scenario, it may be necessary to run the entire pipeline in debug mode or under a debugger to find the problem. Although, it is worth noting that in some rare cases this additional instrumentation can prevent the fault from occurring!

Maintaining Data Integrity

Aside from producing as much information as possible to help locate the problem, the other main task of the pipeline when a crash occurs is to recover as safely as possible, and continue in as normal a manner as possible. A critical part of this process is ensuring that any data files that were modified by the tool that crashed are safely removed or reverted to known good versions. Otherwise a single error can cause a cascade of failures as each successive step in the pipeline tries to use the corrupt data output by the first tool!

This "clean up" process mainly involves removing any temporary files that were created and deleting or invalidating output data that may be truncated or corrupt. Having a dirty flag in the file headers can be a big help here, as it allows partially written files to be easily detected. If checksums of files are being stored for the purposes of detecting changes, then these, too, can be used to detect modifications.

Modified intermediate files can either be deleted entirely and then recreated by re-running the tool with the previous set of input data, or by replacing them with the last versions directly (assuming that these are stored somewhere). Either approach works well, although the latter is generally preferable where possible as it reduces the amount of time needed for the recovery operation.

Another possible approach to take to ensure the integrity of data in the pipeline is to "sandbox" each tool's execution. In this case, the files the tool may modify are copied prior to its execution, and the tool operates on those copies. Only once the task has been successfully completed do the original files get overwritten with the updated versions.

This approach makes sure that an errant tool cannot corrupt files when it fails (clearly, no such guarantee can be made if the tool claims to have executed successfully), and for further safety all of the tool's input and output files can be moved to another directory before processing, thereby ensuring that no files other than those specified as outputs can be accidentally modified. In this case, the truly paranoid can even take the step of making the rest of the pipeline data unwritable to the tools if desired. Sandboxing the execution in this manner is a very effective safeguard, but it does introduce additional overheads in the execution of each processing step.

Conclusion

Dependency analysis plays a vital role in improving the efficiency of asset processing operations, by ensuring that only the files directly affected by each change to the source assets are updated. As this is a common problem, particular in source code compilation, there are many existing tools available that perform this task well - the make utility being the most popular.

Another important prerequisite for building an effective asset pipeline is a strong framework for tools, and well-defined file formats for interchange of information. The effort expended on getting these aspects of the system right is well worth it, as they will have an effect on virtually every stage of the process. Wherever possible, common functionality should be integrated into this framework, speeding the development and improving the robustness of every tool based on it. Isolating tools from each other as much as possible is also a useful technique for ensuring that failures in one section of the pipeline do not affect others.

--

This article is excerpted from The Game Asset Pipeline. (ISBN # 1-58450-342-4). For more information about the book, please visit http://www.charlesriver.com/Books/BookDetail.aspx?productID=88993.

______________________________________________________


Article Start Previous Page 3 of 3

Related Jobs

HB Studios
HB Studios — Lunenburg/Halifax, Nova Scotia, Canada
[09.13.19]

Experienced Software Engineer
AfterThought
AfterThought — Henderson, Nevada, United States
[09.11.19]

Unreal Engine 4 Programmer
Disbelief
Disbelief — Chicago, Illinois, United States
[09.11.19]

Junior Programmer, Chicago
Disbelief
Disbelief — Chicago, Illinois, United States
[09.11.19]

Senior Programmer, Chicago





Loading Comments

loader image