Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Better Game Design Through Data Mining
August 1, 2021
August 1, 2021
Press Releases
August 1, 2021
Games Press
If you enjoy reading this site, you might also want to check out these UBM Tech sites:

# Better Game Design Through Data Mining

[Design]

August 15, 2003 Page 2 of 3

Statistics

Basic statistics can extract information from this fresh, well-prepared data. Since there is too much raw data to draw conclusions from, categorize or aggregate this data. For a simple example, let's categorize the data by one of four fantasy player classes: fighter, priest, rogue, or wizard.

We will attempt to measure performance. Do not be misled by the popularity of each category. The number of characters that fit into a certain class or choose a strategy in the game depends on many variables irrelevant to optimum performance. Cultural preferences, aesthetics, fads, rumors, and other trends sway players' choice. Chasing popularity as a measure of performance, leads to a vicious circle. Like a cat chasing its own tail, balance would never be achieved.

Measure rates instead of instantaneous values. High performance is not any particular value. It is measure of change from a low value to a high value in a short period of time. The period of time to measure is the week. As noted earlier, the week is more stable than the day.

Let's take experience points per hour versus level for each class as an example. "Experience points per hour" is such a useful indicator that I will abbreviate it as EPH. Like a car's MPH (miles per hour), a player's EPH indicates his speed or rate of progress. Count the "experience points," which is a performance indicator, instead of the population of a class. Count the change in experience points from one week to the next week. Count the time that the character actually played, instead of the total amount of time that has passed. For example, if the character played twenty hours in a week then use this value, instead of the 168 hours in a week. This gives the following formula:

 Like a car's MPH indicates speed, a player's EPH indicates rate of advancement.

EPH = Δ exp / Δ time

Let's graph the results. On the vertical axis is the EPH, and on the horizontal axis is the level range. If there are too few samples per level, then group nearby levels together.

 Compare player performance between various strategies in the game.

Next, plot each category as a data series. In this example, each series is a player class: fighter, priest, rogue, or wizard. Along the horizontal axis we can see the difference between the heights of each class' performance. If the difference is small, then it is statistically insignificant. If the difference is large, then it is statistically significant. Based on the size of the sample and other qualities of the data, statistics defines the minimum gap that indicates significantly low performance. In this example, the most significant gap is between the high-level fighter and the other three high-level classes. So statistics discovered that the high-level fighter segment of the player population suffered from low-performance during that week.

Analysis

The core of data mining begins where statistics ends. Here we can extract golden knowledge from the raw mineral that we began with. Several techniques can be applied, most of them particular to the data and the purpose. Here is a simple set of techniques.

Calculate the maximum and minimum performance values. Do this for performance rate and performance growth. In this example, EPH is calculated from the experience points, and the EPH itself can be viewed as a function of class and level:

EPH = f(level)

Calculus provides the derivative:

EPH' = f'(level)

Because of the finite sample size, the precise limit and derivative does not exist. However, the approximate derivative will provide insight into the game balance. At the maximum derivative players rapidly advance. At the minimum derivative players suffer stagnation. They play for hours with little advancement. Knowing this helps isolate low-performance segments of the player population.

Comparing a previous and subsequent period can identify a trend. In this example, the EPH can be subtracted from its value last week, creating a new function:

Δ EPH = f1(level) - f0(level)

Where the change is significantly positive, that segment of players is performing better than the previous week. This helps isolate the effect of a modification to a game's design. Players' adjustment to the modification delays full impact. Usually only early adopters will use the new feature at first. If it outperforms an old substitute, then most players will migrate. After migration, the empirical comparison between the two features stabilizes.

Both of the above techniques can be combined to isolate and track specific low performance. For example, tracking the change in high-level fighters from one week to the next indicates if their performance is improving or not.

EPH = Fighter1(80%) - Fighter0(80%)

Comparing this value to the other class values indicates the relative change. As the values converge, the classes are becoming balanced.

 Top-down meets bottom-up when you analyze strategies as clusters of game assets.

Data mining can combine top-down analysis techniques with bottom-up analysis techniques. From the bottom-up our game may appear to be a galaxy of game assets with no hierarchical organization. From the top-down the same game may appear to be rigid containers of game assets. Cluster analysis might improve class or strategy design, since it generates clusters from the bottom-up, by mapping differences of individual game assets. This can compare similar assets in different categories. As well, cluster analysis can identify assets that multiple strategies share. If you are interested, the books at the end of this article explain techniques for cluster analysis.

Hypothesis

As a game designer, it is dangerous to assume that you know your game. The analysis should inspire the hypothesis, since analyzing player behavior can prove or disprove a good hypothesis about game assets. The kind of hypothesis mentioned here meets two criteria:

1. Explain existing trends of game assets.
2. Predict the result of modifying, inserting, or removing a set of game assets.

Here are two examples of game asset hypotheses:

1. In EverQuest, players prefer pretty races.
2. In Dark Ages, a trap skill will increase mid-level rogue performance.

The domain delimits where the hypothesis applies. In this case the domain is a particular MMORPG, Sony Online Entertainment's EverQuest or Nexon's Dark Ages. Define the domain, or scope, that the knowledge that you believe you are discovering applies to.

 Is player preference skin deep? (SOE's EverQuest)

Suppose when you discuss the appearance of races in an MMORPG with artists, the team divides into two camps. One camp argues for an equal number of game assets for gruesome player races as well as beautiful races. The other camp argues that many more players will choose beautiful races, so almost all assets should be devoted to the more beautiful races. Nick Yee provides survey data in his EverQuest research paper "Norrathian Scrolls" (http://www.nickyee.com/eqt/metachar.html#4) that may inspire this hypothesis. EverQuest players prefer Elves, in general, about 10-to-1 compared to the two least popular and, arguably, the most ugly races: Trolls and Ogres. To make the hypothesis rigorous, the player population and the race performances should be analyzed, because, as noted earlier, data mining more accurately depicts player behavior than a survey does.

 How can you balance group members but still keep the group together? (Nexon's Dark Ages)

In the second example, suppose you have analyzed player performance in Dark Ages. You note that mid-level, but not high-level, rogues have low-performance in terms of measured EPH when compared to the other four classes. In 1999 this was one of the decisions that I faced. I hypothesized that inserting a set of mid-level trap skills will improve performance, by improving their damage ratio. Then I used techniques in this article to test my hypothesis. During the transition, some players, especially non-rogues, argued about the performance of rogues. But the experiment succeeded: within a month, mid-level rogues had a balanced EPH.

Test

Testing your hypothesis is the most rigorous, sensitive, and critical step in the cycle. Although it feels good to hold a gem of wisdom, it feels bad to realize your treasured hypothesis is a false gem. So it is tempting, and sadly common, to halt the cycle before the testing stage. Test each hypothesis. If it is correct, it will survive with its value proven. If it is incorrect, then please conserve the team's resources by discarding it.

A good test has two and only two possible outcomes: the hypothesis is true, or the hypothesis is false. A good test rarely yields an inconclusive result, which means the test needs to be repeated or modified to yield a definite true or false. This cycle is an elaboration of a basic idea: trial and error. Since testing detects error, it improves a game's design.

 Measure test results to validate or invalidate the hypothesis.

In the earlier example, high-level fighters suffered from low EPH. Suppose someone suggests a new game asset, a new skill to increase the fighter's combat effectiveness. You create a "Sword Mastery" skill to do this. After collecting data on the test server, you compare the old and new EPH for each class in order to conclude if the skill improved high-level fighter EPH and what other results it may have.

In the test, mirror actual conditions as much as possible. Just like an ideal point, or a limit, identical conditions do not exist, yet you can approximate. Test an identical configuration, build version, feature set, and at the same day of week and time of day. Additionally, the population will be smaller, which means results will be less precise. But the most uncontrollable factor of the test is the players. Your test player population is not going to be random sample. It will be a self-selected sample whose average motivations and behavior will be biased. So the test contains error. Worse than this, discovering the direction of bias may be an intractable problem.

Although a perfect test is impossible, a test that contains experimental error may still improve your game's balance tenfold, because this process is iterative. If a single iteration cuts game imbalance in half, two iterations will quarter game imbalance, and so on. This is far better than no improvement at all, and certainly better than designing based on disinformation such as feedback motivated by competing special interest groups.

After a new design passes this test, feed the design back into live service. The process is iterative, so for best results, it should be repeated monthly.

Page 2 of 3

### Related Jobs

Insomniac Games — Burbank, California, United States
[07.30.21]

Character TD
Insomniac Games — Burbank, California, United States
[07.30.21]

Combat Designer
Bytro Labs GmbH — Hamburg, Germany
[07.30.21]