Quality Quality Assurance: A Methodology for Wide-Spectrum Game Testing

By David Wilson

[In this Gamasutra article, Nintendo and Microsoft Game Studios veteran Wilson talks about the value of diverse video game testing, suggesting a formula to make sure that your game debuts with the fewest bugs possible.]

The need for wide-spectrum testing

In the process of software development, there are constant pitfalls and perils to avoid, both to application developers and game developers alike. Software testing, one of the most resource-consuming stages of the development cycle, possesses more than its fair share of these problems, as well as tending to have an unfortunate stigma attached to it.

Some developers consider it a short process to find glaring errors, a necessary evil or a secondary concern. Consumers consider it to be the stage in which any problem they have with the product should have been found (often rightly so). Testers themselves often consider the testing stage to be rushed or insufficient once it ends.

The biggest misconception about software testing is that any one method of testing is better than another. There is both an art and a science to software testing, and neither of them should be ignored.

Testing a strict set of conditions or performing seemingly random tests just aren't enough by themselves, no matter how extensive the process becomes; both the art and the science are needed to find as many of the bugs as possible, leaving the software as functional and polished as possible.

The differences between ad-hoc testing and test cases

Ad-Hoc Testing

Ad-hoc testing, also known as free testing, is a style of testing that definitely falls into the artistic side of the spectrum. This form of testing is most often used in game testing, but it can also be found in consumer-use panels and focus groups, where individuals are brought in to try new software with very broad goals that they are directed to complete.

As a style, it is very fluid and often seems random; a game tester may progress along half of a level as the developer intended only to attempt to jump through a crack in the environment, causing their character to leave the bounds of the environment and become stuck, unable to return to the normal flow of play.

When testing a piece of presentation software, a tester may attempt to loop through the entire presentation rapidly multiple times, not allowing time for the images or videos to load, which may stress the available memory to the point that the software stops responding and locks up.

The Good

While these seem like random things to try during the testing process, they are things that may occur in the real world. This is where ad-hoc testing becomes an art: finding things that the end-user may attempt that the developers haven't planned for.

This may seem simple enough, but the amount of creativity necessary for this form of testing can sometimes seem staggering.

Entering a dungeon in a game, letting it populate halfway, then deciding to go back to town and save the game quickly may lock up the game -- but if a player forgets to save their game before entering a dungeon, it could definitely happen.

Plans can certainly be made to test these sorts of situations, but there's no effective way to plan for them all. This is where ad-hoc testing is the most useful: testing situations that may otherwise occur after release that weren't planned for during development. The number of unusual actions good testers will try when left to their own devices can be surprising, and they will often find a fair number of problems that can then be corrected before release.

The Bad

While this sounds great, and it often finds some major issues that would otherwise have been unnoticed, there are a few problems with this method. It's almost impossible to cover all of a piece of software's functionality in this way; there's often too much space to cover to allow a testing team to perform free testing without any focus.

Also, it is likely that more testers will focus on areas that they have a preference for over others, which will leave some areas of the software with less coverage. To diminish this problem, many advocates for this style of testing temper their test plans by assigning ad-hoc testers to specific portions of the software, but this still isn't enough to compensate for the lack of disciplined testing.

Test Cases

This is the scientific side of testing. Developers and test leads will produce a list of tests to be performed based on the functions in the product, which functions interact with other functions, what different parameters there are for each function, etc

Test cases are the counterpoint to ad-hoc testing; where ad-hoc testing seems random, test cases are strict and disciplined. They are used to go over a function, which can be as simple as moving between cells in a spreadsheet or as complex as casting an intricate spell at a group of enemies, from every point of view and with each command style that the writer of the test cases can think of.

The Good

Test cases perform where ad-hoc testing can often lack: they ensure that the most common actions that will be performed are tested in a large variety of ways in every area of the software.

This alone is a boon to the testing process, as the color palette that an ad-hoc tester may take for granted in the software they're testing may have an incorrect variable call in just one format style, resulting in the desired blue becoming green. A test case to check the color implementation for each color in each format would easily catch such problems.

The Bad

The amount of coverage a title receives through test cases is dependent upon the people writing the cases. This coverage can be very extensive, especially when the test cases are written by people with years of experience and an in-depth knowledge of the functions that need to be tested, but nobody can account for everything that an end-user may attempt.

There are just too many random variables to be considered for test cases to cover every possible occurrence. It's also important to note that some testers may be easily bored by such strict testing protocols, which in rare cases could result in the test cases not being completed properly.


Tools of the trade

White-Box Testing vs. Black-Box Testing

These two methods form the second axis of the testing grid. The amount of information available to testers about the underlying workings of the software is largely dependent upon which testing method they use.

In black-box testing, there is little or no information about the internal workings of the software; all a tester sees is what the end-user would see. Playing a beta build of a game on a test console is a good example of this.

White-box testing, on the other hand, uses internal functions or a second software suite (usually a debugger) to track what's going on as the software runs, relaying information back to the tester or to a test log as necessary.

Good examples of this testing method would be testing a software build in a programming platform's development environment or using automation software to test minor variances of an action repeatedly with a debugger running in the background.

While it seems like everyone would want to use white-box testing, the fact that an extra piece of software is running in the background and intercepting data as it flows through the software being tested is an important consideration.

This interception can interfere with the normal working of the software, which sometimes causes problems that normally wouldn't occur, or may even prevent problems that normally would occur. As such, it's important to keep a balance between white-box and black-box testing to ensure that the software in question receives thorough testing.

Testers

These people specialize in breaking software. They each have their own preferred ways of doing this, but most of them are competent with a variety of testing methods. The amount of access they have to the resources necessary to perform adequate testing, however, is often determined by their level of association with a developer.

Third-Party Testers

The lowest rung, so to speak. These testers are often the backbone of the testing process. Third-party testers are often contracted on a project-to-project basis. They usually have limited access to special testing equipment, though testers who show special talent with certain methods will frequently be given access to extra tools. Third-party testers are often used for black-box testing.

Hiring contract testers in large groups can often result in a mixed bag, but even testers who have little or no technical knowledge can usually perform both ad-hoc testing and test cases with positive results with the right training.

Second-Party Testers

Testers who work in the testing group of a subsidiary or secondary company under a larger company, second-party testers can be either contract or fully employed. Due to their closer relationship to the developers, they often gain access to more advanced tools. This often results in a stronger focus on test cases and white-box testing. Most second-party testers are at least moderately experienced in the testing process.

First-Party Testers

Testers that often work or communicate directly with the developers, first-party testers are usually full-time employees of the company they work for, though skilled contractors may occasionally be used in this capacity. They have the most access to testing tools, and they will often manage groups of testers in their tasks. Most first-party testers are also very familiar with the testing process and various development cycles.

The Tester-Developer Dynamic

This dynamic is why the levels of association described above are important. All too often the developer and the testing group they work with will find themselves at odds with each other on various issues, which often creates friction between the teams.

The further away one group is from the other, the greater the likelihood of this problem arising becomes.

Lack of tools or resources is often the greatest complaint with third-party testers, but disagreements on how to handle bugs are very common with all levels. Cost-effectiveness, time constraints, and feasibility are all objects of contention when bugs are accepted or written off. It should go without saying, but all parties involved should remember that they're all working toward the same goal: a polished, functional product.

To this end, developers should understand that many testers fall within the core demographic of their end-users, and sometimes the opinions of testers would be well heeded.

Likewise, testers should understand that developers have a greater understanding of where the project stands and will make decisions based on information that the testing team may not have access to. These situations can also be mitigated by sharing information between the groups, as well as sharing useful tools with each other.

Managing teams

The Need for the Separation of Tasks

Nobody likes to have their toes stepped on. When you've assigned a task to someone, be it a single person or a group, it's important to ensure that they have the chance to finish that task.

The reasons for this are twofold: first, it's good for the morale of the person or group that had the task assigned to them, as it shows a level of trust in them; second, it helps to reduce the chance of redundant effort.

Having more than one set of eyes on a test is always good, but it's usually best to wait for all the other tests to be completed first. The goal of separating tasks is to get a full battery of tests completed by the combined efforts of the entire team.

To that end, the tasks need to be appropriately separated and assigned to the teams that will be able to complete them most effectively. For instance, if a game requires gameplay testing, a text check, and certification testing, then there should be three teams assigned groups of the tasks necessary for those tests.

Once a team completes the tasks they have been assigned, then they should double-check the work already completed by other teams (with the exception of certification testing, which often has to be performed by people with special qualifications; those tests should be double-checked by another member of the certification team). This is one of the best ways to ensure that at least two people have seen each part of the software being tested.


Team Hierarchies and Communication

For every project, there should be one person in charge of coordinating the various teams. Similarly, there should be one person who coordinates each team and reports to the project leader. This forms a chain of command that can easily be followed when problems within the various teams arise.

This is important for the situations in which one team needs help from one of the other teams; with this sort of chain of command, the leader from one team can talk to both the project leader and the leader of the team they would like help from.

If the team in question isn't too busy, this isn't a problem, and the team lead can agree to help. However, in those stressful crunch times near the end of a project cycle, it's possible that time is too tight to be able to help freely.

In situations like these, it will be up to the project leader to determine if the need is great enough to require the help of the other team or not.

This is why there also needs to be good communication between the teams; when one team is ahead of schedule, it allows teams that need extra help to know where to go to request help first. It also helps the entire team know how they are progressing overall, which allows the team leads and the project leader to be able to better manage the effort overall.

Weekly (or even daily) reports can be very effective in this effort, but even a simple verbal communication between leads on a regular basis can help immensely. A little bit of communication can go a long way to help keep things on track and running smoothly.

Matching the Right Testers to the Right Teams

Needless to say, some people are more competent at certain tasks than others. An effort should be made to match people to the skills they show the highest proficiency in. It can take time to figure out where a person's talents lie, but the effort is almost always worthwhile.

A person who is good at completing games should be assigned to game play, specifically game completion if possible. In this way, that person's talents will help to complete tasks related to checking the overall playability as well as the endings of a game.

Likewise, if someone is good with language and grammar, they should be used for checking the text in the game to make sure it reads as it should. Someone who is talented at noticing problems with graphics should check graphics and animations, and so on. In this way, a team can play to their strengths, and the tasks assigned to them can be accomplished more efficiently.

Weaving it all together

So, now that the general concepts have been put forward, how does it all come together? The first goal would be to set up an appropriate distribution of testing techniques.

A good starting point would be 40% test cases, 30% ad-hoc testing, and the final 30% should alternate until their strengths are determined.

The easiest way to deal with the third group is to assign them any low-priority test cases, as these will be the least detrimental if few members of the group are suited to such tasks. In each of those groups, half should be assigned to white-box testing and half should be assigned black-box testing.

Over the first week or two, it should be possible to determine which testers are best suited to which tasks. After this has been determined, the testers should be separated into the appropriate teams for the necessary testing tasks.

At this point, testing should be redistributed to about 60% test cases and 40% ad-hoc testing, as test cases tend to be more time-consuming, which usually translates to more manpower. Once the teams are formed, test plans should be written for each team explaining how they should go about their tasks and at what pace.

It's important to note that the first week or two is recommended for determining the strengths of unknown testers. If a tester is already known to be competent at certain tasks, it's easy enough to start them in the appropriate team.

Otherwise, it's more valuable in the long run to figure out where a tester would be best appropriated before giving them their final assignment.

Finally, a few words of advice specifically for developers. Documentation for various mechanics and functions can always be helpful. The more information that a testing team receives, the better they'll be able to test the software.

Also, if there's supposed to be any spoken text in a game, try to get the text edited first. In this way, there will be fewer errors in the audio, and the text will not have to be changed to match those errors.

Lastly, ask for the opinions of your testers every once in a while; they may have some opinions that would enhance your software in a way you hadn't considered before. We all want our projects to succeed, and with a little teamwork, we can ship a title with as few errors and as much polish as possible.

---

Photos by choking sun, used under Creative Commons license.

Return to the full version of this article
Copyright © UBM Tech, All rights reserved