Imagine being able to hire someone to wrangle up a herd of users to provide you with critical information about your target demographic, or having enough funds to have your own, private department, doing the same. Imagine being able to track details as minute as individual bullets in-game, or having a lab where you could learn exactly how people are interacting with something you created, and how you could make it even better.
Odds are, if you’re an indie developer, you have exactly none of this.
There are a number of articles which go into detail about why all developers, not just large companies, should always perform user testing (see “Further Reading” below). In short: User testing is the investigation of user experiences, and user tests are experiments in which the scientific method is applied to objective experiences, and by treating design decisions as hypotheses, we can design experiments which give us actionable data for creating the best user-experience possible.
Independent developers are limited by a distinct lack of several critical resources. Namely: space (literally or virtually) in which to perform user testing, time (a limited resource when it seems as if the time being used could be “better spent” programming) and access to certain types of hardware, testing software, and test subjects. However, while these limitations may seem daunting, they are not impossible to overcome. It is possible to design and run efficient user tests so long as those performing them are capable of objectively answering three critical questions:
These questions should be asked each time one reaches a point where it’s possible to perform user testing, and the answers will be different at every stage of the development process. It is of critical importance that they are taken seriously, and answered as completely as possible each time the opportunity for testing arises.
Only then should you move on to choosing a testing method.
Before jumping into the testing options themselves, a note on finding participants: You do not need as many people as you think you do.
Possibly one of the most intimidating aspects of conducting user testing is the idea of just finding test participants in the first place. The idea of finding people who are both willing and able to provide your team with the information you need can be daunting, but it turns out you can begin testing, and gaining valid insights from the results, with as few as five people. According to research conducted by Robert Virzi and subsequent investigations by Jakob Nielsen and Thomas K. Landauer, the relationship between the total number of usability problems (N) and the number of problems found by a single user (L), with n being the number test subjects, can be modeled as follows:
Nielsen and Landauer found L to be, on average, ~31%, which means a test group of 5 should be able to identify ~85% of usability issues present at the time of testing. It is worth noting, however, that while Nielsen has claimed this means one should only test with 5 users (but do multiple tests), this statement is based on the assumption that those five users are both an accurate representation of your target population and actually capable of doing their “job”. Unfortunately, this assumption does not always hold true, it is the opinion of this author that developers should always strive for as much input, and as many test participants, as possible. (See also: The Sample Size Calculator For Discovering Problems In a User Interface)
To find them, start with the people you already have available: friends, family, associates, and even strangers to whom you may only be connected via friends-of-friends or social media. When it comes to things like surveys, a large social network of people who are invested in you, or your company, can bring in far more participants than advertising alone. As for in-person testing, consider incentivizing participation through both short-term incentives (e.g. pizza), the promise of future rewards (e.g. their name in game credits for return-participants), and through the proper application of classical conditioning (e.g. “Participants will be entered to win [insert some prize here]”).
Designing User Tests
Now that that’s out of the way, let’s move on to the fun stuff. The following is a basic overview of five traditional testing options: Heuristic Evaluation, Paper Prototyping, Surveys, Card Sorting, and Direct Observation. Which is most appropriate for a limited budget? That depends entirely on your answers to those three questions from earlier.
What It Is: An inspection method in which “experts” check UI design against a list of pre-established criteria*. Heuristic evaluation should be used in addition to actual playtesting-based experiments, but it is demonstrably useful for preventing common mistakes, and identifying those which arise during development.
Useful for: Keeping in mind “best practices” while designing interfaces. Be sure, however, to bring in outside evaluators, preferably experts, at logical break points in order to avoid bias. Consider also developing your own lists of heuristics if you find yourself in a situation where they may be useful to your project or company as a whole.
What It Is: Designing and testing user interfaces before you build them. They can range in complexity from literal paper mock-ups of UI elements, to “Wizard of Oz” experiments, in which the interface being tested is actually being controlled by another, unseen, person.
Useful for: Testing new ideas to see whether your users are behaving the way you want/expect them to with the interface before spending time and energy building anything new or unproven.
What It Is: Test participants are given a set or cards with ideas, statements, or specific terms already written on them, then asked to sort them. In “open” card sorts, participants name these categories, and/or explain why the cards were placed where they were. In “closed” card sorts, (such as the Q-sort, below), subjects are asked to organize the cards into a pre-existing structure.
Useful for: Creating classification systems/organization (e.g. designing menus)
What They Are: A method of collecting quantifiable data about a wide variety of topics. Many online survey tools include tools for data visualization, which facilitates the process of identifying trends or critical pieces of information. Remember to ALWAYS allow for additional input, questions, doodles, etc…
Useful for: Gathering large amounts of data, determining users’ “average” opinions, and for establishing “baselines” in which game elements can be objectively compared. (An example of this final element would be a question which is asked in exactly the same manner at multiple stages of development.)
What It Is: Watching users interact with your product. There are a number of ways to go about doing this, including:
Remember: The surest way to change a behavior is to announce that you're measuring it. Try to stay as hands-off as possible.
Useful for: Learning how users actually interact with the things you've created; how their behaviors differ from your expectations, and the single most effective method of determining what works and what doesn't.
A Note on Group Feedback
Focus groups are not your friend. They rarely produce clear answers, force people to make snap judgments, and create a situation in which people are less likely to say what they really think than if they were speaking to you one-on-one (as the group might start trending towards “groupthink” or just saying what they think the developer wants to hear).
If you find yourself in a situation where you absolutely have to use a focus group for some reason or another, you may be able to mitigate this issue by having participants provide their answers “privately” by writing them down to turn in at the end of the session, or by playing a modified version of “Heads Up 7 Up” (To hide identities). For more information, see the “Further Reading”, below.
Extracting and using the data acquired through user testing is a five step process.
1. Collect and organize the data.
While many of the resources listed above come with bundled data-visualization tools, this is not always an option. Furthermore, third-party data analysis tools such as R Studio, Tableu, and Splunk are often much better-equipped to handle large amounts of data, and allow for far more flexibility in the data-analysis process. (See also: GameAnalytics)
2. Identify trends and critical information.
“Critical information” isn't simply a matter of what’s most “popular”, it can be something as simple as an issue identified by a single test-user. The value of the feedback received may be entirely subjective, and it’s your job to make that call.
3. Dig down to identify the “real” issue(s).
People talk about their experience, not what’s actually happening, and it may be necessary to make a conscious effort to distinguish between the two. What’s more, more often than not, test participants will be unable to suggest solutions to the problems they are encountering. It’s up to the development team to find them.
4. Make an attempt to address the issue(s) you discover.
An example of the situation outlined above took place during the development of Gearbox’s “Borderlands”. After receiving feedback that one of the game’s areas was “boring” because users encountered “too many” enemies while travelling through the region, the team responded by tripling the number of enemies, thus changing the map from a “travel area that had too many enemies getting in the way” to a “combat area”. Not only did users find the area to be more fun after the change, but their expectations about the in-game situation in which they found themselves had been altered entirely.
5. Test again. Then repeat the process.
User testing is an iterative process, and the only way to determine whether or not your solution(s) have worked is by actually going through the process a second time.
Even if your game is “done” once you've published it (i.e. no plans to release DLC, game-changing updates, etc.) it is still advisable to continue collecting as much data as you can from your users, at least passively. This can be accomplished by keeping in touch with previous test participants (who might have insight for patches or future games, based on their experiences) as well as by opening the lines of communication between your consumers and your team. This can be as simple as placing a feedback form on your company website, or by including a similar system in the game itself.
User testing is critical, especially when you don’t have resources to spare. The long-terms benefits far outweigh the short-term costs, and if you do it right, you'll make a better game without breaking your budget.
On User Testing
 Virzi, Robert, "Refining the Test phase of Usability Evaluation: How many subjects is enough?" in Human Factors (34) p 457-468 1992
 Nielsen, Jakob, and Landauer, Thomas K.: "A mathematical model of the finding of usability problems," Proceedings of ACM INTERCHI'93 Conference (Amsterdam, The Netherlands, 24-29 April 1993), pp. 206-213.
 Nielsen, Jakob. "Why You Only Need to Test with 5 Users." Nielsen Norman Group, 18 Mar. 2000. Web. 13 Apr. 2014.
 Faulkner L. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments & Computers. 2003;35(3):379–83.
* Federoff, M.A. (2002). Heuristics and Usability Guidelines for the Creation and Evaluation of Fun in Video Games. MS Thesis, Department of Telecommunications, Indiana University, Bloomington, Indiana, USA, 2002
* Nielsen, J., and Molich, R. (1990). Heuristic evaluation of user interfaces, Proc. ACM CHI'90 Conf. (Seattle, WA, 1–5 April), 249-256