|
Features

Conducting
In-house Play Testing
What is
"game play?" I've found that many people in the game development industry
cannot adequately define this term. When pressed, they usually describe
playability by listing some qualities of successful games, such as "addictive,"
"fun," "easy learn but slow to master," and "customizable." Paul Coletta,
producer and principal software engineer at GTE Interactive Media, feels
that a good game needs well-paced timing, as well as a "sense of atmosphere
where your imagination takes on bigger qualities than the actual game
presents." Wayne Cline, production manager with LucasArts, says that game
play is "an unknown quantity that we're all trying to know." In short,
there is no definitive magic formula for good game play.
Just because this quality is elusive, however, doesn't mean that it should
be shoved aside as irrelevant. The ultimate responsibility for great game
play usually rests in the hands of the producer. Still, before the code
is written, the designer is often the person most concerned with "playability."
More often than not these days, the producer and the designer aren't the
same person. Games can be tweaked, improved, and enhanced during the testing
phase, but if the game's basic design is flawed, it's already too late.
A good design should directly address components that allow the flexibility
of altering game play. One of the best examples of such a component was
designed and built by programmer Andy Caldwell (now with Screaming Pink
Inc.). Caldwell's belief that good tweaking tools "relieve the programmer's
time to work on programming while other people tweak" paid off in Street
Hockey '95, a 16-bit multitap SNES game that unfortunately was under-marketed
- it had great game play. Caldwell built the player attributes into a
table structure and gave the testers access to the table. Each tester
was assigned responsibility for the game play of certain characters. The
designer and producer determined what each character's strength should
be, and no other characters could be tweaked higher in that catagory.
The testers had the ability to open up the table and change each character's
attributes for shot accuracy, blocking ability, and so on. The testers
then fed their improved attributes to the producer, who made sure that
the testers weren't making "mega" players. The end result was that the
programmer was able to concentrate on bug chasing and code performance
improvements while the testers tweaked. Because of this tool, the game
came in on time, under budget, and passed through Nintendo's approval
system on the first pass - everyone on the team was empowered to do what
they do best.
Starting and Controlling the Test Process
Play testing is usually accomplished in one of two ways: bringing in consumers
(temporary play testers) and observing them while they use the product,
or sending out beta copies of the game and eliciting feedback via a questionaire.
Because conducting a wide-ranging beta test over the Internet is an article
in itself, I'll only discuss in-house testing here. However, I do want
to note that last fall I successfully used Internet Relay Chat (IRC) to
conduct question and answer sessions with my external beta testers.
Conducting in-house play testing requires formal observation of temporary
play testers playing the game over the course of several days. This type
of testing shouldn't be confused with focus testing, which is conducted
by your marketing team. The main purpose of in-house play testing is to
put the game into the hands of each player and obtain individual feedback;
marketing focus tests usually consist of showing the game to a group and
obtaining group feedback. Sometimes people from an earlier marketing focus
test might be invited back as temporary play testers, but usually these
positions are filled through a variety of sources, such as recruiting
friends of full-time testers, distributing flyers on local college campuses
or at local arcades, posting notices on local Internet gaming bulletin
boards, or advertising in local computer publications, such as The ComputerEdge
in San Diego. Occasionally, good candidates can be found through temporary
agencies, but most people don't boast of their gaming skills on résumés
or job applications.
Wherever you decide to look for testers, make sure that you interview
everyone before you hire anyone. Question interviewees about what types
of games they most like to play. Don't hire somebody who only plays sports
games to play test an RPG unless you want this individual to be one of
those few purposefully hired to be unfamiliar with the genre.
The timing of play testing needs to be planned carefully. The game needs
to be stable enough that the play tester doesn't spend too much time noting
operational bugs, yet immature enough that effective changes can still
be made to it. A minimum of one week's employment should be promised with
the possibility of more. Since the hours that some play testers are available
can vary, plan on double or late shifts for the regular testing staff
during the weeks of play testing so as to accommodate those testers' schedules
that only permit evening participation.
The ratio of temporary play testers to full-time staff testers monitoring
them should be no less than 1:1. Each staff tester should always be observing,
answering questions, and noting the temporary play testers' questions.
Here are some key things for staff members to look out for:
- Where
do play testers seem to get stuck and ask for help from the staff? The
staff testers working with the play testers need to rate each individual
based upon their game skills. Although somewhat subjective, if one play
tester can't even get the game installed and everyone else can, it would
appear that this particular play tester doesn't posess adequate skills
for the job. However, don't let this discourage you. Not everyone you
bring in is going to live up to expectations.
- What
kinds of features do the play testers have the most questions about?
In the case of a sports game, set the game at the shortest playing time
possible so that an entire game can be played in an hour or so. In the
case of graphical adventure games that have a variety of different environments,
be sure to spread the play testing across those various environments.
Be sure coverage for the whole game - and not just the first part of
the game's experience - is included in play testing. If there is a bonus
environment that players can only get to after solving all the puzzles
in other environments, provide shortcuts, jump codes, or previously
saved games so that testers can jump to that bonus environment without
having to solve everything else. Otherwise, what should be the best
part of the game could turn out to be weak and bug laden.
- Do
play testers get frustrated with the game easily? How closely does their
frustration level relate to their skill level? Benchmarks need to be
established prior to bringing in the play testers; additional benchmarks
will be added to as testing proceeds to measure key aspects of play
testing. If the game uses puzzles, establish a minimum and a maximum
amount of time for the play testers to solve each puzzle. If nobody
can solve a certain puzzle in the expected minimum amount of time, don't
stop the clock - let play testers continue until the maximum amount
of time has expired. Find out if players really want to solve the puzzle
or are becoming angry by their inability to solve it.
- Do
play testers like the game? If they like the game, they'll be able to
cite specific instances in the game that they liked or enjoyed. If they're
bluffing, most likely they'll be unable to say any more than, "I just
liked it." When you have a significant number of play testers begging
to have a copy to take home with them, you know you have a winner on
your hands. But what if everyone seems to dislike the game? At this
point in the schedule, too much money has been spent to throw it all
away. It's time for the quality assurance (QA) manager to call a strategy
meeting with testing, design, and production team members to review
the usability test results.
- Are
they complaining about the same things that earlier testers had noted
in suggestion bug reports? If the play testers echo sentiments made
during the earlier staff testing phase, and the items criticized were
not fixed or changed, not enough attention has been paid to staff testers.
These bugs will haunt you in product reviews after the game's been released.
- How
long before the play testers become as bored with the game as the staff
testers? A good testing schedule includes a lunch break after four hours
and at least one 15-minute break every two hours. If the testers want
to talk too much or need to take too many breaks, it could indicate
that they are hitting the boredom stage. After a week of play testing
(or at some other significant break during play testing), the play testers
and their staff leaders should hold a group session to discuss the game.
Prior to that, discussion between testers needs to be kept to a minimum
so as not to alter opinions. During testing, the testers should be observed
only - the producer and other "vested interests" shouldn't engage the
testers in conversation - other than to ask questions - lest the testers
be tainted by that interaction as well.
Play
Testing Goals
Play testing should provide the producer with as much information as possible
for making the necessary game play tweaks. Testing needs to provide more
information than just crash and lockup problems. The producer needs to
hear opinions such as, "I think the game is boring because
." Bug
reports should include a category for subjective feedback, perhaps in
headings titled "Opinion" or "Comment." Remember, the testing department
usually contains the highest ratio of gamers in the company. They are
the ones who sit and test games all day - many go home and play games
all night.
The QA manager's primary objective is staffing each project with the right
mix of play testing talent. Secondarily, the QA manager needs to assure
that the information flow remains constant - and pertinent - to the goals
of the project. Often, QA managers' biggest obstacle is losing their best
play testers to the production department.
Since turnover in the testing department can be fairly high, being able
to identify and hire skilled testers is critical. The QA manager should
look for excellent written and oral communication skills - the foremost
prerequisite. I once made the mistake of hiring someone who couldn't write
understandable bug reports. Even though this individual was a dedicated
gamer with great ideas, it just didn't work out because this person couldn't
communicate well.
Beyond communication skills, it helps for the tester to have a variety
of experience in your game's genre. Also, throw in a few testers who know
little or nothing about the genre, as this will broaden the insight you'll
obtain about your title. Testers with less genre experience are often
the ones who question the interface and yield improvements in areas where
genre experts take things for granted.
The QA manager and the producer together need to choreograph a system
of information sharing that will best help the project succeed. If you
ask a tester and a producer why a game doesn't have good game play, you're
liable to get two totally different answers. According to Paul Coletta,
when testing says some aspect of the game is "wrong," the producer needs
to interpret and evaluate whether that which is "wrong" affects the game's
fun, pacing, or addictive qualities. Wayne Cline adds that the producer's
biggest task is looking at testing reports and figuring out what will
make the most impact on the game with the least disruption of the schedule.
The Dos and Don'ts of Managing Play Testing
DON'T BE DEFENSIVE ABOUT CRITICISM. Some producers get too defensive about
their game design and concept, and they miss out on the best evaluations
testing can give. Every effort should be made to make the testers feel
that their opinions are important. Otherwise, they might fail to convey
that one comment that could make or break the playability of a game, simply
because they feel that their opinions don't matter or that they'll offend
someone by giving honest feedback.
On the other hand, there will always be testers who can't say anything
nice and advocate an entire revamp of the game. (Hopefully, the game didn't
get that far in production if it really is that bad.) Don't put up your
defenses too quickly, and try not to take these comments as insults. Glean
as much information as you can from these testers.
QA managers should instruct testers to be specific when wording their
feedback about a game. For instance, my favorite bug report was one where
the tester stated, "The pencil sucks." This was in reference to a puzzle
in a graphical adventure game where the player needed to move a piece
of paper over a rock and rub a pencil on it to get the clue. The real
problem was that the pencil was not easily manipulated to do the rubbing.
Had the tester been more specific, time wouldn't have been spent trying
to decypher this cryptic comment and the problem would have been solved
more quickly.
STAND BEHIND OPINIONS. Testers should be taught to stick to their opinions,
even if the producer tries to dissuade them from logging bug reports containing
negative feedback. Some producers will go to great lengths to get their
game through testing, but it's vital that the testing group report all
issues they feel are important. Training testers to stick by their guns
in the face of a direct challenge doesn't mean allowing them to become
hostile. Testers who aren't perceived as thoughtful and helpful will get
little cooperation from developers, ruining their chances to provide enough
information or obtain enough support to do good work. According to James
Bach, chief engineer with ST Labs, "Testers should be taught to give information,
both positive and negative, without worrying about how developers will
react to it." Furthermore, James advocates teaching testers that "the
whole team owns quality, not just them. Testing is a process of revealing
information that helps to make good decisions."
ENCOURAGE ESPRIT DE TESTING CORPS. Naturally, the size of a testing group
should correlate to the number of games the group is expected to test
at once. Full-time testing teams generally consist of at least one lead,
one assistant lead, and three to six full-time testers, depending on the
type and complexity of the project.
Full-time testers need to have a sense of community as a testing group,
and should have a dedicated testing lab. Testers need to be located together
in an area that promotes communication and cross-training between testers,
particularly in the games industry, where few testers are actually trained
in software testing methodologies, and most of their training is obtained
on the job. Physically locating testers with the project developers they
are assigned to - and not with their fellow testers - could (and often
does) hinder their objectivity. This doesn't mean that testers shouldn't
have offices, just that their offices should be located near other testers.
To counteract this separatism, testers (and particularly lead testers)
need to be trained to work hard at developing strong communication with
the developers whose products they are testing. They need to understand
the basic architecture of the product they are testing to better find
the bugs.
Ideally this community room will have all the necessary testing hardware.
It can also double as a place to observe outside testers. A synergy of
learning, communication, and discussion takes place in this setup. It
promotes game-play-oriented comments and critique.
MIX UP THE HARDWARE. Each project needs to be experienced on the minimum
hardware configuration, as well as the closest thing possible to the maximum
configuration and everything in between. The majority of testing needs
to be conducted on the minimum configuration, because that is the promise
to the customer. It's somewhat ghastly to see both "minimum" and "recommended"
specifications on product boxes these days. What this dichotomy usually
means is that the game will run on the minimum configuration, but if you
want a decent experience, your machine had better have the recommended
configuration. The difference between the two is causing a lot of unhappiness
with customers.
Don't skimp on high-end testing either. Believe it or not, bugs can be
found on the hottest machines around. I worked on one game that tested
perfectly on the minimum specification, yet when customers attempted to
install it on a machine that had 64MB RAM, the installer indicated that
not enough memory was available to install the game. As it turned out,
the game was looking for was 8MB RAM, and it only looked at the last digit.
So it only installed on 8MB machines.
KEEP THE EYES FRESH. When staff testers look at nothing but the project
to which they are assigned for weeks and weeks on end, they become blind
to problems that they might otherwise notice. Therefore, it's useful to
move testers around to other projects every now and then to gain a "fresh
set of eyes." Sometimes, staff testers for one project can be used as
temporary play testers for other projects. This is another reason for
locating testers in a community area, rather than spreading them out all
over a facility.
DISCOURAGE LEGACIES. How often have we caught ourselves passing down a
project legacy to new testers? By "project legacy," I mean the harmful
folklore used as justification for not solving an often-cited problem.
For instance, one project I worked on spanned four CD-ROMs. Each time
the tester started up the game, she needed to insert disk one into the
drive, then swap to a second disk to resume play where she had left off.
The reason she had to endure this disk swapping hassle (so the "pat" answer
goes) was that correcting it would require an engine fix, and the engine
"couldn't be changed." But making a change to the engine was possible;
it was just that the programmer didn't want to do it, the producer didn't
insist on it, and testers didn't make an issue out of the problem. We
all had passed down the legacy that the engine couldn't be changed. A
bug report for this problem was never even written, so when weekly meetings
were held to review the reports, it wasn't ever discussed formally. Simply
put, because of this "legacy," we had our blinders on when it came to
that problem. Of course, this product's number one complaint once it went
to market was the disk swapping issue.
OBSERVE YOUR TESTERS. The best producers spend time in the test lab -
listening, not talking. They listen to the testers and they strive to
derive and implement game play abstracts from the testers' concrete comments.
As Cline says, "We know we have a good game if the testers are enthusiastic
after weeks and weeks of play." However, I have seen producers who spend
too much time with the testers. Often in these situations, each time a
tester critiques an aspect of the game, the producer explains or defends
why it is the way it is. The tester doesn't write up the problem because
he believes it can't (or won't) be changed. Thus, new project legacies
are born. Producers need to interpret and consider - not rationalize -
any issues raised by the testers' comments.
REWARD YOUR TESTERS. Everyone works better and harder if they believe
their hard work will be rewarded. To some staff testers, that reward might
be recognition. To others, cold hard cash. Since the varieties are about
as abundant as the number of people on staff, it is often difficult to
reward everyone adequately. Some of the best (and most difficult) rewards
include: recommending a tester for a promotion in recognition of a job
well done, supporting a deserving tester when he or she applies for another
job in the company (representing a step up the ladder), and recommending
a tester for monetary bonuses. One of the easiest rewards is to spring
for a pizza lunch and have a lunchtime game tournament playing the latest
hot title whenever specific weekly goals for testing teams are met. Over
the last couple of years, lunchtime tournament favorites in my shop have
included Descent, Duke Nukem, and Diablo competitions.
MAKE TESTERS AWARE OF THE COMPETITION. Make time for testers to review
and analyze competitive products that are similar in nature to the one
that they're expected to be testing. Make your testers the experts on
the genre! Not only will you get better information from the testers,
they'll appreciate the chance to play another game.
It All Boils Down To Teamwork
It's difficult to achieve that delicate balance between developers and
testers during play testing. The guidelines addressed here don't encompass
everything a game developer or publisher might want to do to test game
play, but they're a place to start. The most important aspect of successful
play testing is encouraging teamwork among the testers and developers.
Listen to the testers, create an environment that is pleasant to work
in, continually learn more about the craft, and stay fresh and honest.
Play testing can be an ordeal, but when testers and developers work together,
games ship on schedule, under budget, and with great game play.
|