spend millions of man-hours selecting optimum strategies in massive
multiplayer online games (MMOG). They are getting the best return
on investment (ROI) from your MMOG. Are you? In this article, I
will show you how data mining can improve game design, and then
I will present four practical applications for applying that information:
To balance the economy
2. To catch cheaters
3. To cut production costs
4. To increase customer renewal
this article is written for MMOGs, you will find that most of these
techniques can be adapted to multiplayer and single-player games.
I will give several examples using fantasy MMORPG terms, since that
genre is probably the best understood. However, these techniques
apply to most MMO genres; I have even used these techniques to improve
an online trivia game show. But before we learn the techniques,
let's understand why data mining is a good tool for these jobs.
players lie. Player feedback alone provides a poor diagnosis of
game design. The picture a player's verbal feedback paints is not
even an approximate guide. It is a distorted portrait of psychological
and social forces. Players do not accurately report their own behavior
in surveys or customer feedback. They may say one thing but do another
instead. For example, anthropologist Dr. William Rathje surveyed
the amount of beer people drank in a household and then went through
their garbage. The garbage revealed twice as much consumption as
the surveys had. This method was more insightful than surveys, which
had been the traditional method of data collection. As psychological
and social creatures, players, and developers, subconsciously revise
political creatures, players, and developers, also revise their
reports. Players belong to special interest groups, which bias their
reports. Political ganging, a human trait, exists in online communities,
too. Wherever a MMOG has guilds, classes, or any social organizations
it has special interest groups. The members of these groups put
their own group's interests before those of the entire community.
Each claims that it is the victim of poor game balance. But the
players that actually suffer the most from poor game balance are
the most silent. The greatest victims are ending their days in your
game in quiet desperation.
many players, the time spent online in your game is an investment.
They expect their investment to perform well. They become upset
if, despite their skill and time commitment, someone who happened
to pick the better class, item, or other option in your game surpasses
them. Data mining begins with accurate, empirical data. With this
the game designer can make informed decisions. He can identify the
victims of poor game balance, and he can correct it so that all
players have an equal opportunity to achieve maximum performance.
mining also builds better theories. It gives the game designer insight
into how players use and abuse the game. It broadens perspective,
proves or disproves hypotheses, and substitutes facts in place of
opinions. With increasing specialization of game development, a
game designer no longer sees the big picture. It is all-too-common
for any game developer to acquire a skewed view of the nature of
his game. Disinformation, best-case scenarios, and a dose of self-hypnosis
distort our theories. But if we can see the big picture, we can
begin to challenge our own misinformed opinions. Let's learn how
to scan this big picture.
the beginning, there may have been the Design, but let's start the
cycle where data mining begins so we can discover how to recycle
old data into new design:
Live: Scoop up lots of raw data in the live service.
2. Archive: From here, clean it up and store it for safe
keeping in an archive.
3. Statistics: Sift through the data to create statistics,
which are more informative than the raw data.
4. Analysis: Then apply the actual mining, which yields
knowledge about player performance.
5. Hypothesis: Propose hypotheses about how to tune the
6. Test: Test each hypothesis and then introduce the new
design into the live service.
final step closes the loop. Each iteration of this cycle evolves
game balance. Let's dive into the details.
massive multiplayer game has thousands of game assets, or more.
Every class, item, monster, quest, skill, zone, or any other game
object is a game asset. In the data these game assets are dead;
in the live service these assets come to life. It is the players
that animate them. Player behavior generates rich information about
game balance, so scoop up as much data as possible. Collect a large
sample. Like any other statistical data collection, the sample should
be random or otherwise representative of the actual proportions
of player population. The larger the sample, the clearer the picture
becomes. In a perfect game, an infinite number of players would
render a perfect portrait of player behavior. On the other extreme,
a small or biased sample generates no meaningful statistics. Given
that this is a server-based game, collecting data is convenient.
The data is already on your server.
should data be collected? Temporal cycles, such as the season, day
of the week, and the time of the day, complicate data collection.
The most basic and instructive of these cycles is the weekly cycle.
Once you understand the week, you can grasp the effect of a month,
season, or holiday. Players cannot play as often as they wish on
all days of the week. They have real-world schedules. So their playing
volume varies depending on which day of the week it is. A graph
depicts when most players participate. For a given player demographic,
it might be higher on certain days of the week and certain times
of the day. For example, usage might peak on Saturdays, Sundays,
and Friday evenings.
addition to quantity discrepancies, the quality of play differs
depending on the day of the week. Some players might go on an extended
adventure when they have more hours to spend. They might just stop
to keep in touch with friends when they have little time to spend.
So to avoid daily variation, collect player performance data once
per week. This provides you with the average behavior for the whole
week. Be sure to measure at exactly the same day and time of the
week. You should automate this process, such as with "crontab"
in the Unix environment, or whatever scheduling tools your database
management software supports. When you measure once per week instead
of once per day, you achieve three ends simultaneously: you eliminate
weekday variation, you reduce the data collection workload, and
you reduce the required archive storage space. If you are measuring
data other than average player performance, then you may need to
collect more often. But that is beyond the scope of this introduction.
scooping up the raw data, let's make it easier to analyze. Like
processing a raw mineral, there are several steps that will prepare
your data for mining. Many alternate methods can do this. Here is
a simple method that economizes storage space and reduces mining
computation. This preprocess has five general steps:
Take a snapshot of the database.
2. Validate that the data is clean and appropriate for
3. Integrate the data into a central archive.
4. Reduce the data down to just the fields you need.
5. Transform the reduced data into a form that is easy
to analyze for player performance.
details depend on the system's configuration. This example explains
each step in a simple system:
you are operating a fantasy MMORPG during its commercial service.
Start at the accounts database. This will be the first step to
economy, since the accounts database has the ID of every record
that you want information about. Schedule an automated snapshot
of the user data at 00:00 on Sunday morning.
Validate which data is relevant and clean. This eliminates garbage
as soon as possible, so that you are not storing or analyzing
unusable data. Starting at the accounts database, exclude unregistered
accounts or administration accounts. For example, exclude test
and admin characters that have artificial attributes. For each
valid character in an account, query for activity in the log database.
If the character has not been active during the previous week,
then its record contains no player performance information.
Backup valid user, log, and accounts records into an archive database.
This will be a useful warehouse that you may return to in the
future to mine for data you have not considered yet. Treat this
backup preciously; if you were an archaeologist, this would be
your find; if you were a detective, this would be your forensic
However, you are now overwhelmed with a deluge of data. There
is much more than you need to analyze a particular problem, such
as the amount of experience points earned per hour of play. So
reduce the data down to the fields you need. In this example,
select the character ID, level, class, experience points, and
number of hours played. Create a table of these values.
ID, level, class, exp, time
Transform this reduced data to make it easier to analyze. Since
this archive has weekly versions of the data, use last week's
data to create new information. Get the difference of the experience
points and the difference of the time played. Append these columns
to the table. If this is the character's first week, then there
will be no information from the previous week. If the character
has not played a while, then search backward through each prior
Δ exp = exp1 - exp0
time = time1 - time0
ID, level, class, exp, time, Δ exp, Δ time