Many
game projects are either significantly delayed or shipped in a rather
buggy state. Certainly, this situation isn't unique to the games
industry - for instance, according to the infamous "Extreme
Chaos" report released by The Standish Group in 2001, more
than 70% of all software projects are either cancelled or significantly
exceed their planned development time and budget. However, since
games represent a very complex case of software development where
people skilled in rather different disciplines have to cooperate,
one might argue that the development risks inherent in game projects
are particularly high.
The
reasons for delayed, bug-infested or even failed software projects
are manifold, but it seems that, besides feature creep and shifting
priorities, testing and quality assurance are recurring themes.
In our experience, a large number of development studios entirely
rely on manual testing of the underlying game engine, their production
tools, and the game code itself, while automated processes are only
rarely adopted. Similarly, in the 2002 GDC roundtable "By the
Books: Solid Software Engineering for Games", only 18 percent
of the attendees said that the projects they were working on employed
automated tests.
We
were first confronted with the notion of automated testing when,
in the year 2000, customers of our back then still very young middleware
company complained about stability issues and bugs in our 3D engine.
Until that time, we had relied on manual tests performed by the
developers after implementing a new feature, and on the reports
of our internal demo writers who were using these features to create
technology demos for marketing purposes. After thoroughly analyzing
the situation, we came to the conclusion that our quality problems
were mostly related to the way we were testing our software:
- Manual
testing wasn't performed thoroughly enough, because it simply
took too much time. Whenever some code was changed, or new code
was added, it would have been necessary to execute a defined set
of manual tests to make sure the modifications hadn't introduced
problems anywhere else. Manual testing took more and more time,
which lead to frustration on the side of the developers and reduced
their motivation to actually execute the tests. Additionally,
the amount of work involved in testing made developers reluctant
to improve or optimize existing code.
-
When developers manually tested their own code, they often showed
a certain (subconscious?) tendency to avoid the most critical
test cases, so the scenarios a problem was most likely to occur
in were also the situations least likely to be tested.
As
a result, we decided to adopt automated testing, starting with a
new component of our SDK which we had just started to develop. The
results were encouraging, so we finally expanded our practice of
automated testing to all SDK components.
Test
Frameworks
Automated
tests have become popular with eXtreme Programming, a collection
of methodologies and best practices developed by Kent Beck and Martin
Fowler. Generally, automated tests refer to code or data that is
used to verify the functionality of subsets of a software product
without any user interaction. This may range from tests for individual
methods of a specific class (commonly called unit tests) to integrated
tests for the functionality of a whole program (functional tests).
In
order to facilitate the creation of automated tests, there are a
number of open-source unit testing frameworks, such as CPPunit (for
C++ Code) or Nunit (for .NET Code). These testing frameworks provide
a GUI to select the tests to run and to provide feedback about the
test results. Depending on your project, it might be necessary to
extend these frameworks with additional functionality required for
your game, such as support for multiple target platforms.
In
the context of such a testing framework, a single unit test corresponds
to one function, and multiple unit tests are aggregated in test
classes along with methods for initialising and de-initializing
a test (e.g. loading and unloading a map). These test classes can
in turn either be located in a separate executable - for instance,
when the code to be tested resides in its own DLL - or in the main
project itself. Regardless of this, test classes should always be
stored in different files than your production code, so they can
conveniently be removed from builds intended for deployment.
What
should be tested?
Pragmatism
is a virtue when it comes to deciding what to test. Usually, it
doesn't make sense to write unit tests for functionality with minimal
complexity, such as the getter and setter methods for individual
properties of a class. In order for automated tests to pay off,
the code to be tested should have a certain probability of producing
incorrect results, like, for instance, a method that casts a ray
through a game level and returns whether this ray intersects any
level geometry (line of sight test). Such a test would then compare
the returned result with the expected result provided by the test
author.
A
recurring question is whether tests should be written only against
the public interface of a class (so-called black-box tests), or
whether they should take the inner workings of a class into account
by also covering private members (white-box tests). While black-box
tests are usually somewhat coarser than white-box tests - after
all, they can only check the final results of an operation, but
no internal intermediate states -, they are significantly less sensitive
to modifications of the tested code. The line of sight function
mentioned earlier may undergo significant internal changes up to
a complete re-write (e.g. because the original version simply was
not fast enough), but the results it returns remain the same. In
such a case, a white-box test almost always has to be re-written
or modified along with the tested code, whereas a black-box test
could immediately be used to check whether the modified code still
produces the same results. Thus, we have found it beneficial to
only include the public members of a class in automated tests, since
in most cases, the inner workings of a class change more frequently
than its external interface.
Regression
Tests
In
many cases, especially in the field of game development, it is not
feasible to compare results against data manually provided by the
test author. For instance, if a collision detection routine computes
intersection points with complex geometry, manually providing reference
data for the tests is hardly an option. Instead, test results can
be compared against previously generated data from earlier versions
of your code, a practice that is also known as "regression
testing". This reference data has to be reviewed by the test
author - for instance, using a simple visualization of the colliding
objects - , and once it has been approved, it can continuously be
used for testing. This way, automated tests help you ensure that
the new (e.g. optimized) version of your code still produces the
same results as previous implementations.
For
functional tests of code that generates highly complex output data,
such as a game's rendering engine, regression testing is often the
only feasible way of implementing automated tests. In the case of
the Vision rendering engine, we ended up generating platform-specific
reference images from all visual tests. Whenever the automated tests
are run, the rendered images are compared against the reference
images pixel by pixel, and if the images differ, the test fails.
In order to keep the memory impact of the reference images reasonably
low, you can bind comparison snapshots to certain events in the
tests.
Such
visual regression tests have the advantage that even minor visual
errors frequently overlooked in manual tests never remain unnoticed.
Unless they know the scene extremely well, few people will realize
that a shadow or a single object is missing in a complex scene,
or that the red and blue values of a light source's color are swapped.
Regression tests, however, will almost certainly detect bugs like
these.
In
any case, it is important that generating the reference data for
a regression test is an automated process. The reference data may
be platform-specific, especially when it comes to rendering output,
so it might have to be generated multiple times, even more so when
there may still be changes in the rendering pipeline that cause
intended differences in the rendered images. In order not
to discourage developers from writing regression tests, they should
be able to create new reference data ideally by simply clicking
a button in the test framework's user interface.
How
everything fits together
For
almost all applications - games included - a complete test suite
consists of both unit and regression tests. Unit tests are suitable
for low-level functionality and base libraries and ensure that you
have a solid foundation to build higher-level code on. Regression
tests can in turn be used to perform integrated functional checks
of higher-level features. As a result, you can refactor or optimize
complete functional groups of your game or engine code, and you
will immediately notice if something breaks during the process,
since the regression tests will fail. Furthermore, failing unit
tests will often give you a rather precise indication of what actually
goes wrong.
Since
it is always beneficial to know how much of your code is actually
covered by the automated tests you've written, you may want to use
a code coverage tool such as BullseyeCoverage or AQtime. A code
coverage analysis tells you which parts of the code have actually
been called, and thus also provides hints about "holes"
in the test suite. The question how high test coverage should be
cannot be answered easily, though, since it largely depends on the
code to be tested. Trivial methods do not have to be covered by
automated tests, and the same naturally goes for pure debug functionality.
Also, almost all projects contain "dead" code that is
never called, and such code naturally also doesn't have to be tested.
In total, the real-world game and middleware projects with automated
tests that we have seen had a test coverage of between 55 and 70
per cent.
Writing
test-friendly code
Admittedly,
automated tests are not equally easy to implement for all types
of code. As far as unit tests are concerned, a strictly object-oriented,
modular design with separate functionality encapsulated in separate
classes significantly facilitates testing. The more information
a class needs from the outside world, the more work it is to write
unit tests for that class. Also, excessive usage of the "friend"
modifier in C++ can make it difficult or even impossible to write
(black-box) unit tests for a class.
It
is always best to keep testability in mind already when writing
the code. Making code testable later on in the development process
is still possible, but it can be a rather tedious task, as it sometimes
requires quite a bit of restructuring. When it comes to games, there
are a number of important aspects that should be considered when
developing testable code:
- Allow
regression tests to rely on deterministic behavior. For instance,
a pathfinding system that uses randomness to make the decisions
of characters less predictable could provide a public method for
initializing the seed. This method could then be used by the tests
to ensure that the characters always take the same path.
-
Similarly, avoid frame-rate dependencies in regression tests;
otherwise, physics objects or rendering output may differ from
previously generated data, especially if that data was generated
on a different machine or with a different build configuration
(e.g. debug vs. release). One way to achieve this is to run rendering
and simulation loops with a constant virtual frame rate during
the automated tests.
- Software
that heavily relies on user input, such as an in-game editing
system or production tools, is usually rather difficult to test.
In such cases, a strict separation of UI and logic code can make
testing easier: In our production tools, for instance, every user
action in the UI executes one or more simple script commands.
A set of script commands can then simply be used to produce an
exact imitation of what the user originally did. A test can simply
execute this code and compare the result (e.g. exported file,
scene geometry) against existing reference data.
GUI capturing tools are another option for user interface testing,
but we generally don't find them particularly recommendable. User
Interfaces tend to change frequently, and since moving a button
by a few pixels may already invalidate previously captured user
input, tests using GUI capturing may hinder your workflow rather
than supporting it.
Common
concerns regarding tests: Do we really save time?
In
most cases where a development team is about to introduce automated
tests into their development process, there are at least a few people
in the team who are skeptical about it. After all, implementing
automated tests takes time that could otherwise be spent working
on game or engine code. According to our experience with automated
testing inside and outside of the games industry, the additional
time that a team spends writing test code indeed amounts to around
30 percent of the total implementation effort. At the first glance,
this might seem like a huge expense in time and money; however,
you have to count this against the time saved by not having to perform
the same manual tests again and again.
While
automated tests usually mean an investment in the beginning, they
pay off later in the development process. Most of the changes made
to existing code, including bug fixes, have a certain potential
for side effects that cause other functionality in a game to break.
Therefore, it would theoretically be necessary to thoroughly test
all potentially affected parts of the code whenever a change is
made. Automated tests can perform this validation as often as you
want without any user interaction, so they save time throughout
the whole development process. What is more, automated tests encourage
developers to optimize and improve existing code, since they have
a simple and quick way of finding out whether the modified code
still works correctly.
In
our experience, the introduction of automated tests helps developers
write more stable and reliable code. It provides early feedback,
which is usually highly appreciated even by team members who were
skeptical about automated tests at first, and it leads to bugs being
discovered earlier in the production of a game. Since the pressure
and workload on developers tends to increase as the project approaches
release, finding and removing bugs early avoids additional stress
in the most critical phase of development.
During
the development of the Vision engine, we collected some data to
monitor the effectiveness of our automated tests in the improvement
of code stability. When the first version of the engine was released
in early 2001, we relied entirely on manual tests, and even though
new versions were thoroughly checked, our customers reported more
than 100 issues every month in our online support database. In September
2001, we started implementing automated tests for the existing engine
functionality, and also added tests for all new features that were
implemented. As a result, the number of issues reported each month
dropped to a fraction of the initial value (now about five to ten),
even though there are now six times as many companies working with
the technology, and even though development has progressed in a
rather constant pace.
Admittedly,
these figures simply indicate a (negative) correlation between the
number of unit tests and number of support issues per month, and
don't have to be interpreted as causality. Certainly, our experience
in developing robust code has grown from 2001 to 2004, and the size
of the development team also varied within this time frame. However,
the differences are big enough to support the notion that at least
part of the gain in stability may be attributed to the introduction
of automated tests.
Limitations
of automated tests in Games
As
beneficial as automated tests are, there still are aspects of game
development that don't lend themselves well to automated tests.
Naturally, it is difficult to test whether a game is well-balanced,
and it's probably impossible to write an automated test which analyzes
whether a game is fun to play or whether it looks good. In the course
of the last few years, we have set up some internal guidelines for
writing automated tests, the most important of which are:
-
Concentrate on the most important (i.e., the most complex and
most frequently used) modules when introducing automated tests.
Start implementing automated tests where they are most likely
to provide a benefit, for instance because they're likely to fail,
or because they help you perform a refactoring task without breaking
anything.
- Focus
on testing the different subsystems of your application whenever
a higher-level functional test doesn't seem to be possible. For
example, you might not be able to automatically verify that the
complete AI system works properly, but it is well possible to
test whether a monster reaches the state "surrendered"
when its damage exceeds a certain value.
- Use
stress tests to verify the robustness of your code. If your game
runs stable under extreme conditions - for example, when 2000
monsters are spawned and destroyed every second, when 500 physics
objects are simultaneously thrown into a scene, or when a map
is reloaded 200 times in a row - it is also less likely to break
when players try something unusual.
- Implement
test cases for bugs before you start fixing them. Having such
test cases will make sure that bugs won't re-emerge in future
versions of the game.
- Regression
tests - for instance, image or state comparisons - are easiest
to maintain when they use special test scenes rather than production
maps. If you believe that the production data relevant for a test
may still be changed frequently, it is usually better to use a
small test scene instead. Otherwise, there is a certain risk of
reference data having to be generated and reviewed so frequently
that the development team loses motivation to execute the automated
tests.
- Keep
your tests as simple as possible instead of trying to achieve
extreme test coverage values. Setting up automated tests is a
long-term project where maintainability and extensibility are
crucial factors.
In
general, "low-level" code such as math, collision detection,
and even rendering is easier to test in an automated fashion than
gameplay, and even a game with a comprehensive suite of automated
tests will still have to go through the hands of QA people. However,
the focus of the QA department will most likely shift from technology-
to gameplay-related defects and shortcomings. Instead of "I
have these distorted triangles on screen whenever my character turns",
the issue reports may contain statements like "It is possible
to get into room A, but you can't get out of it again since the
crates in front of the ventilation shaft are too high."
Continuous
Integration
When
employing automated tests in the development of a complex software
project, you will soon realize that the execution of automated tests
takes time - up to a couple of hours in some real-world projects
we've seen. If developers have to execute these tests on their development
machines, they will quickly become reluctant to actually run the
automated tests, since it might stall their work while the tests
are running. Of course, tests that aren't executed are absolutely
worthless.
The
solution to this problem is to set up one or (preferably) multiple
computers dedicated to executing the automated tests. Such a machine
regularly polls the version control system (e.g., Subversion, CVS,
Perforce) for changes in the respective repositories, and if newly
committed changes are found, the code is checked out and compiled,
and the tests are run. Finally, the system sends an email with a
report containing the test results to the developer who has performed
the last commit operation.
This
concept of fully automated and reproducible build and testing processes
which typically run multiple times per day is called "continuous
integration" - once again a term that has its roots in eXtreme
Programming. In order to facilitate continuous integration, there
are open-source tools like Cruise Control or AntHill which take
care of the interaction with the version control system and additionally
provide an interface for build tools such as ANT. Using these tools,
a custom continuous integration system can quite easily be set up.
We've
found that setting up dedicated CI servers smoothed the development
processes in our organization significantly, and indeed gave developers
more time to work productively. Also, since they didn't have to
care about executing the automated tests any more, we could be sure
that the tests would always run successfully, since faulty code
would result in the CI systems complaining to the responsible developer
(and the project manager) by email.
More
Automation!
Our
positive experience with the introduction of automated tests and
continuous integration fuelled our search for further processes
in game and tools development that lend themselves for automation.
For instance, a CI server nowadays automatically generates our CHF
(windows help file) documentation from a Wiki every time a modification
in the Wiki is detected. Furthermore, creating distributable copies
of an arbitrary software product can easily be automated using ANT
and CruiseControl. This way, creating a full distributable copy
of the most recent code (or the last stable tag of it) becomes a
matter of minutes.
The
most recent addition to our collection of automated processes is
automated performance tests. They are based on the same test framework
as the regular unit and regression tests, but instead of checking
for correctness, they measure the engine's performance and compare
it to the best previous runs of the same test on the same machine
(the results of an arbitrary number of system configurations can
be stored in a version-controlled XML file). If the current result
is significantly slower than the reference result, the test fails;
if the current result is better than the reference result, it becomes
the new reference result.
Performance
tests actually are a special type of regression test. They allow
us to make sure that, when engine code is modified during refactoring
processes, these changes never cause any part of the engine to become
less efficient. This creates a certain pressure to keep code fast,
and also makes sure that when optimisations are performed, you don't
run into a scenario where other parts of the code suddenly become
slower.
Conclusion
In
our experience, the introduction of automated tests and continuous
integration makes development teams more efficient and results in
more reliable, and often quite simply, better software. Additionally,
it reduces the pressure and workload on development teams by reducing
the effort for manual testing, and allows bugs to be found earlier
in the development process.
Certainly,
automated tests alone won't make your game a hit. But almost as
certainly, they will make life easier for developers, artists, project
leaders, producers and even players.
References
______________________________________________________