|
Features

Better
Game Design through Data Mining
Statistics
Basic
statistics can extract information from this fresh, well-prepared
data. Since there is too much raw data to draw conclusions from,
categorize or aggregate this data. For a simple example, let's
categorize the data by one of four fantasy player classes: fighter,
priest, rogue, or wizard.
We
will attempt to measure performance. Do not be misled by the popularity
of each category. The number of characters that fit into a certain
class or choose a strategy in the game depends on many variables
irrelevant to optimum performance. Cultural preferences, aesthetics,
fads, rumors, and other trends sway players' choice. Chasing popularity
as a measure of performance, leads to a vicious circle. Like a
cat chasing its own tail, balance would never be achieved.
Measure
rates instead of instantaneous values. High performance is not
any particular value. It is measure of change from a low value
to a high value in a short period of time. The period of time
to measure is the week. As noted earlier, the week is more stable
than the day.
Let's
take experience points per hour versus level for each class as
an example. "Experience points per hour" is such a useful
indicator that I will abbreviate it as EPH. Like a car's MPH (miles
per hour), a player's EPH indicates his speed or rate of progress.
Count the "experience points," which is a performance
indicator, instead of the population of a class. Count the change
in experience points from one week to the next week. Count the
time that the character actually played, instead of the total
amount of time that has passed. For example, if the character
played twenty hours in a week then use this value, instead of
the 168 hours in a week. This gives the following formula:
EPH
= Δ exp / Δ time
Let's
graph the results. On the vertical axis is the EPH, and on the
horizontal axis is the level range. If there are too few samples
per level, then group nearby levels together.
Next,
plot each category as a data series. In this example, each series
is a player class: fighter, priest, rogue, or wizard. Along the
horizontal axis we can see the difference between the heights
of each class' performance. If the difference is small, then it
is statistically insignificant. If the difference is large, then
it is statistically significant. Based on the size of the sample
and other qualities of the data, statistics defines the minimum
gap that indicates significantly low performance. In this example,
the most significant gap is between the high-level fighter and
the other three high-level classes. So statistics discovered that
the high-level fighter segment of the player population suffered
from low-performance during that week.
Analysis
The
core of data mining begins where statistics ends. Here we can
extract golden knowledge from the raw mineral that we began with.
Several techniques can be applied, most of them particular to
the data and the purpose. Here is a simple set of techniques.
Calculate
the maximum and minimum performance values. Do this for performance
rate and performance growth. In this example, EPH is calculated
from the experience points, and the EPH itself can be viewed as
a function of class and level:
EPH
= f(level)
Calculus
provides the derivative:
EPH'
= f'(level)
Because
of the finite sample size, the precise limit and derivative does
not exist. However, the approximate derivative will provide insight
into the game balance. At the maximum derivative players rapidly
advance. At the minimum derivative players suffer stagnation.
They play for hours with little advancement. Knowing this helps
isolate low-performance segments of the player population.
Comparing
a previous and subsequent period can identify a trend. In this
example, the EPH can be subtracted from its value last week, creating
a new function:
Δ
EPH = f1(level) - f0(level)
Where
the change is significantly positive, that segment of players
is performing better than the previous week. This helps isolate
the effect of a modification to a game's design. Players' adjustment
to the modification delays full impact. Usually only early adopters
will use the new feature at first. If it outperforms an old substitute,
then most players will migrate. After migration, the empirical
comparison between the two features stabilizes.
Both
of the above techniques can be combined to isolate and track specific
low performance. For example, tracking the change in high-level
fighters from one week to the next indicates if their performance
is improving or not.
EPH
= Fighter1(80%) - Fighter0(80%)
Comparing
this value to the other class values indicates the relative change.
As the values converge, the classes are becoming balanced.
Data
mining can combine top-down analysis techniques with bottom-up
analysis techniques. From the bottom-up our game may appear to
be a galaxy of game assets with no hierarchical organization.
From the top-down the same game may appear to be rigid containers
of game assets. Cluster analysis might improve class or strategy
design, since it generates clusters from the bottom-up, by mapping
differences of individual game assets. This can compare similar
assets in different categories. As well, cluster analysis can
identify assets that multiple strategies share. If you are interested,
the books at the end of this article explain techniques for cluster
analysis.
Hypothesis
As
a game designer, it is dangerous to assume that you know your
game. The analysis should inspire the hypothesis, since analyzing
player behavior can prove or disprove a good hypothesis about
game assets. The kind of hypothesis mentioned here meets two criteria:
1.
Explain existing trends of game assets.
2. Predict the result of modifying, inserting, or removing a
set of game assets.
Here
are two examples of game asset hypotheses:
1.
In EverQuest, players prefer pretty races.
2. In Dark Ages, a trap skill will increase mid-level
rogue performance.
The
domain delimits where the hypothesis applies. In this case the
domain is a particular MMORPG, Sony Online Entertainment's EverQuest
or Nexon's Dark Ages. Define the domain, or scope, that
the knowledge that you believe you are discovering applies to.
Suppose
when you discuss the appearance of races in an MMORPG with artists,
the team divides into two camps. One camp argues for an equal number
of game assets for gruesome player races as well as beautiful races.
The other camp argues that many more players will choose beautiful
races, so almost all assets should be devoted to the more beautiful
races. Nick Yee provides survey data in his EverQuest research
paper "Norrathian Scrolls" (http://www.nickyee.com/eqt/metachar.html#4)
that may inspire this hypothesis. EverQuest players prefer
Elves, in general, about 10-to-1 compared to the two least popular
and, arguably, the most ugly races: Trolls and Ogres. To make the
hypothesis rigorous, the player population and the race performances
should be analyzed, because, as noted earlier, data mining more
accurately depicts player behavior than a survey does.
In
the second example, suppose you have analyzed player performance
in Dark Ages. You note that mid-level, but not high-level,
rogues have low-performance in terms of measured EPH when compared
to the other four classes. In 1999 this was one of the decisions
that I faced. I hypothesized that inserting a set of mid-level
trap skills will improve performance, by improving their damage
ratio. Then I used techniques in this article to test my hypothesis.
During the transition, some players, especially non-rogues, argued
about the performance of rogues. But the experiment succeeded:
within a month, mid-level rogues had a balanced EPH.
Test
Testing
your hypothesis is the most rigorous, sensitive, and critical
step in the cycle. Although it feels good to hold a gem of wisdom,
it feels bad to realize your treasured hypothesis is a false gem.
So it is tempting, and sadly common, to halt the cycle before
the testing stage. Test each hypothesis. If it is correct, it
will survive with its value proven. If it is incorrect, then please
conserve the team's resources by discarding it.
A
good test has two and only two possible outcomes: the hypothesis
is true, or the hypothesis is false. A good test rarely yields
an inconclusive result, which means the test needs to be repeated
or modified to yield a definite true or false. This cycle is an
elaboration of a basic idea: trial and error. Since testing detects
error, it improves a game's design.
In
the earlier example, high-level fighters suffered from low EPH.
Suppose someone suggests a new game asset, a new skill to increase
the fighter's combat effectiveness. You create a "Sword Mastery"
skill to do this. After collecting data on the test server, you
compare the old and new EPH for each class in order to conclude
if the skill improved high-level fighter EPH and what other results
it may have.
In
the test, mirror actual conditions as much as possible. Just like
an ideal point, or a limit, identical conditions do not exist,
yet you can approximate. Test an identical configuration, build
version, feature set, and at the same day of week and time of
day. Additionally, the population will be smaller, which means
results will be less precise. But the most uncontrollable factor
of the test is the players. Your test player population is not
going to be random sample. It will be a self-selected sample whose
average motivations and behavior will be biased. So the test contains
error. Worse than this, discovering the direction of bias may
be an intractable problem.
Although
a perfect test is impossible, a test that contains experimental
error may still improve your game's balance tenfold, because this
process is iterative. If a single iteration cuts game imbalance
in half, two iterations will quarter game imbalance, and so on.
This is far better than no improvement at all, and certainly better
than designing based on disinformation such as feedback motivated
by competing special interest groups.
After
a new design passes this test, feed the design back into live service.
The process is iterative, so for best results, it should be repeated
monthly.
______________________________________________________
|