|
Introduction
In
the games industry we are seeing a steady and continuous increase in
team size. Today, even a team of some ten programmers and thirty or
more artists (level designers included) is hardly considered large. The
artists, and especially the level designers, often need to check how
their art works in-game to make sure it looks and plays as expected.
This means that there are a large number of people dependent on having
a stable version of the game in order to do their job efficiently.
With
such numbers of people involved it seems obvious that lost time means
lots of lost money. If broken code makes it into the hands of the art
team, the man-hours can start ticking away rapidly.
This
doesn't only result in lost money but also in stress for the
programmers who will have to try to fix the problem quickly; sometimes
the entire programming team will more or less stop progress too as
everyone looks for the problem.
To
further compound the problems there are also more far-reaching effects
where the morale of the respective teams is affected. If, for example,
the artists have a low confidence in the code stability (and justly so)
then they will be more liable to blame problems on the programmers
instead of checking if perhaps their art is to blame, say something
like misnamed files etcetera (and of course when this is the case the
programmers get huffy in turn).
On
the other side of the wall, the programmers may become too paranoid to
check code in since they don't want to break the game, and end up
hogging files checked out, or ending up with nasty merge problems if
multiple checkouts are allowed. In all these cases, productivity is
lost.
Example of Disaster
The
evil effects above are not dreams born of a paranoid brain, but rather
examples from a real world project. In the particular case the protocol
was for everything to be put under source control: code, art and also
executables, the programmer were to check in compiled code whenever
checking in source. The artists then get these binaries whenever they
update from source control. On the surface this may sound like a
reasonable scheme, everything is in source control, ok, fine, that
ought to fix it, good.
Only
it's not that simple. Since the programmers have to build the source
before checking it in, they have to prevent anyone else from checking
source in while they are building. This means they have to lock out all
other programmers who then will have to queue for their turn to lock
everything. Building and checking that it all works can take some time,
having to wait an entire day just to check in some files has been
reported. In the meantime, of course, the programmer waiting won't just
sit idle; he'll work on something else. Thus when it is finally his
turn to check in he might be in the middle of something that isn't
ready to check in… and so on.
Meanwhile,
the art department is subjected to whatever code was last checked in,
and any mistake can spread in a matter of minutes.
Needless to say these problems usually are the worst at the most inconvenient time possible: near milestones.
Fortunately this is not a necessary evil, and a little bit of care and thought can take care of most of the bad effects.
Insulation
By
inserting a layer of soft foam between the programmers and the artists,
the likelihood of anyone getting hurt can be greatly reduced. One type
of foam that I suggest using is also known as a QA department. Many
people reading this will hopefully nod and think ‘Aye, useful they are,
them QA teams!', however even large studios developing well-known
titles have missed this vital institution, or are not using it for this
purpose (or in a structured way at all). The great thing is that if you
get some people test your products on a permanent basis you'll have to
figure out something for them to do, and here's one suggestion.
Carry
out nightly builds and have QA check the build. If they do not proclaim
it stable, then don't let the artists get their hands on it. It is then
up to the respective leads to decide if they are going to try to patch
the build or simply wait for the next nightly build, obviously the
specific requirements of the day will decide.
An
added benefit is that we have an established process that can also
produce release candidates and builds for other external purposes.
|
|

|
 |
 |
 |
Integration of Quality Assurance into the production pipeline can lead to more stable builds
|
The Process, step by step
We
are assuming that both the art and the source code are under version
control. It is not strictly necessary for the art to be, but it
simplifies things if it is.
In
this protocol QA receives a build (labeled with a version number) that
contains executables for the required platforms as well as a snapshot
of the art in suitable formats and packaging. At this time the art and
code under version control should be labeled with the build number as
well. The build is preferably done automatically over night, so that
the QA team can get in early and check automatic test results and
perform manual testing.
Next,
QA decides whether the build is stable and if so they put a copy where
the development team can get it. The release is then announced and the
artist team should update as soon as possible (to avoid people having
problems from using old executables).
The
art team should continuously check their art into the version control
system; they just need to make sure their art works with the
executables from the latest build.
What
of the programmers then? They should use source control and get updates
of the latest source code and data. Using this scheme they do not get
any insulation from the artist's check-ins. However, there are several
reasons why a buffer in this direction is less important. First the
programming team is usually smaller, thus word spreads faster if
something is wrong and the other programmers can avoid the broken data.
Second, they are also more skilled with version control tools, i.e. the
first programmer who gets broken data can usually see which file(s) it
is and warn people, or pin the file to an earlier version. Finally,
programmers are usually the ones who built the error reporting systems
that exist in the game, and will actually understand the error
messages. All in all, this means a much lower impact on the programming
team.
Insulating the Engine Room
If
there is a separate team developing an in-house engine (or equivalently
an externally developed engine is used), then the process can be
extended to require that the engine is subjected to QA before being
accepted by the game programming team.
To
do this each engine build must be numbered as well as the game build,
the engine should define a compiler time flag equaling the number. All
code needed to use the new engine must then be prepared by the game
programmers prior to accepting the build but conditionally
compiled out until the build number increases. In other words, the
programmers who work in an area where the engine will be changed will
have to work ahead of the last stable engine, but ensuring that all new
code is compiled out in the current version, see example 1 below.
Example 1:
#ifdef ENGINE_VERSION > 1023
…new code depending on new engine…
#else
…old code using old engine…
#endif
Then
when the whole hog is submitted to QA they can test the game build
using first the old engine, then they perform the same tests with the
new version of the engine. Correctly implemented this procedure shows
up bugs that were introduced in the new engine. Although there is
always uncertainty at the very least it will be known there are
problems and the programming team will avoid the engine until it clears
QA.
Clearly
this procedure can be extended for any library the game uses, although
most libraries stay the same during the course of a project.
Results
This
simple procedure creates a much more stable working environment for the
artist team. Now the artist can have a nice fixed morning procedure:
check for build update – get coffee – start working.
At
the same time it also makes life simpler for the programmers, as the
pressure of breaking the build is lessened. This means that programmers
can check code in frequently (without wasting time on superfluous,
paranoid testing) and spend their time on what they are paid for. A
misconception here is that removing this pressure should result in
sloppier code, however the mistakes are still caught by the QA team. So
the problem (if someone is frequently responsible) can be dealt with in
a civilized way instead of by lynch mob. Peer pressure also accounts
for keeping people wary of checking in broken code; since the rest of
the programming team will usually let the person who broke the code
know what they think.
Another
benefit is that having a QA team means that the bug reporting procedure
becomes more well-defined. The artists assign bugs to the QA
department, who in turn verify the bug and assign it to the relevant
programmer. This becomes the natural procedure since QA is responsible
for clearing a build, they must naturally be the first (and last) stop
on for any bug found in that build (this allows them to improve testing
procedures and also filter out duplicates). Furthermore since we have
introduced the concept of builds it becomes easy to record the version
(aka build number) a bug appeared and will be fixed.
Now
while we have introduced a stabilizing layer into the system, we have
also added a delay, slowing down the turnaround time (note that we have
not increased the development time but introduced some
latency). However the cases where an artist really needs the bleeding
latest code are relatively rare. In such cases they have to work
closely with a programmer anyhow (who will make those specific changes
needed) who can supply the artist with an intermediate build. For the
most part all content will be both forwards and backwards compatible
with the code and no problems occur, all the artist needs to know is
that when a fresh build is made available from QA it is time to
upgrade. Since we are dealing with large team sizes it should be
obvious that if we introduce some (potential) slowdown for a relatively
unusual case while making everyone else's lives simpler, it's an
overall win.
In
most production environments, it is also common to build the data into
a big archive file. Creating this can also take considerable time. This
problem is addressed since everyone gets one with the build each
morning (or however often it is done), thus they don't have to create
one for themselves. This archive can then usually be patched with
updated content as the day goes on, and next morning it is back to a
fresh copy. Most people will never have to build an entire archive
themselves, and again it's a win in production time.
Further Developments
The
process can also be developed further, for instance by specifying more
exactly how programmers and artists working ahead of the build should
behave. Perhaps insulating the programmers from the artists is really
worthwhile?
Since
the programmers (usually) control the tools, they can actually create
insulation for themselves by building a system that automatically
checks the art assets for required properties. The detailed description
of such a system is outside the scope for this article.
Summary
In
closing I want to point out that this is by no means the be-all end-all
scheme of build iteration. It is a simple yet robust procedure and is
presented in the hope that it can be of use to people. The process has
successfully been applied to large projects (in which the author has
participated) more or less in exactly the shape as presented.
It
should be noted that a QA team need not be large, there is an
immeasurable difference between zero and one person. On the other hand
between one and two there is only a measly 100 percent. Correctly
empowered with automatic testing tools a QA department can easily be a
one man institution.
For
smaller projects, the benefits are not as large, but the process still
provides an increased level of control over the development environment
and having a process in place makes it easier to scale up the team size
later.
______________________________________________________
|