Gamasutra: The Art & Business of Making Gamesspacer
Better Game Design Through Data Mining
View All     RSS
September 25, 2017
arrowPress Releases
September 25, 2017
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Better Game Design Through Data Mining

August 15, 2003 Article Start Page 1 of 3 Next

Players spend millions of man-hours selecting optimum strategies in massive multiplayer online games (MMOG). They are getting the best return on investment (ROI) from your MMOG. Are you? In this article, I will show you how data mining can improve game design, and then I will present four practical applications for applying that information:

1. To balance the economy
2. To catch cheaters
3. To cut production costs
4. To increase customer renewal

Although this article is written for MMOGs, you will find that most of these techniques can be adapted to multiplayer and single-player games. I will give several examples using fantasy MMORPG terms, since that genre is probably the best understood. However, these techniques apply to most MMO genres; I have even used these techniques to improve an online trivia game show. But before we learn the techniques, let's understand why data mining is a good tool for these jobs.

Why Mine Data?

Because players lie. Player feedback alone provides a poor diagnosis of game design. The picture a player's verbal feedback paints is not even an approximate guide. It is a distorted portrait of psychological and social forces. Players do not accurately report their own behavior in surveys or customer feedback. They may say one thing but do another instead. For example, anthropologist Dr. William Rathje surveyed the amount of beer people drank in a household and then went through their garbage. The garbage revealed twice as much consumption as the surveys had. This method was more insightful than surveys, which had been the traditional method of data collection. As psychological and social creatures, players, and developers, subconsciously revise their self-reports.

Which gives you the clearest picture of your game? Surveys or logs?

As political creatures, players, and developers, also revise their reports. Players belong to special interest groups, which bias their reports. Political ganging, a human trait, exists in online communities, too. Wherever a MMOG has guilds, classes, or any social organizations it has special interest groups. The members of these groups put their own group's interests before those of the entire community. Each claims that it is the victim of poor game balance. But the players that actually suffer the most from poor game balance are the most silent. The greatest victims are ending their days in your game in quiet desperation.

To many players, the time spent online in your game is an investment. They expect their investment to perform well. They become upset if, despite their skill and time commitment, someone who happened to pick the better class, item, or other option in your game surpasses them. Data mining begins with accurate, empirical data. With this the game designer can make informed decisions. He can identify the victims of poor game balance, and he can correct it so that all players have an equal opportunity to achieve maximum performance.

Data mining also builds better theories. It gives the game designer insight into how players use and abuse the game. It broadens perspective, proves or disproves hypotheses, and substitutes facts in place of opinions. With increasing specialization of game development, a game designer no longer sees the big picture. It is all-too-common for any game developer to acquire a skewed view of the nature of his game. Disinformation, best-case scenarios, and a dose of self-hypnosis distort our theories. But if we can see the big picture, we can begin to challenge our own misinformed opinions. Let's learn how to scan this big picture.

From Data To Design

In the beginning, there may have been the Design, but let's start the cycle where data mining begins so we can discover how to recycle old data into new design:

Recycle old data into new design.

1. Live: Scoop up lots of raw data in the live service.
2. Archive: From here, clean it up and store it for safe keeping in an archive.
3. Statistics: Sift through the data to create statistics, which are more informative than the raw data.
4. Analysis: Then apply the actual mining, which yields knowledge about player performance.
5. Hypothesis: Propose hypotheses about how to tune the game.
6. Test: Test each hypothesis and then introduce the new design into the live service.

The final step closes the loop. Each iteration of this cycle evolves game balance. Let's dive into the details.

Live Service

A massive multiplayer game has thousands of game assets, or more. Every class, item, monster, quest, skill, zone, or any other game object is a game asset. In the data these game assets are dead; in the live service these assets come to life. It is the players that animate them. Player behavior generates rich information about game balance, so scoop up as much data as possible. Collect a large sample. Like any other statistical data collection, the sample should be random or otherwise representative of the actual proportions of player population. The larger the sample, the clearer the picture becomes. In a perfect game, an infinite number of players would render a perfect portrait of player behavior. On the other extreme, a small or biased sample generates no meaningful statistics. Given that this is a server-based game, collecting data is convenient. The data is already on your server.

When should data be collected? Temporal cycles, such as the season, day of the week, and the time of the day, complicate data collection. The most basic and instructive of these cycles is the weekly cycle. Once you understand the week, you can grasp the effect of a month, season, or holiday. Players cannot play as often as they wish on all days of the week. They have real-world schedules. So their playing volume varies depending on which day of the week it is. A graph depicts when most players participate. For a given player demographic, it might be higher on certain days of the week and certain times of the day. For example, usage might peak on Saturdays, Sundays, and Friday evenings.

Player behavior is a function of the day of the week.

In addition to quantity discrepancies, the quality of play differs depending on the day of the week. Some players might go on an extended adventure when they have more hours to spend. They might just stop to keep in touch with friends when they have little time to spend. So to avoid daily variation, collect player performance data once per week. This provides you with the average behavior for the whole week. Be sure to measure at exactly the same day and time of the week. You should automate this process, such as with "crontab" in the Unix environment, or whatever scheduling tools your database management software supports. When you measure once per week instead of once per day, you achieve three ends simultaneously: you eliminate weekday variation, you reduce the data collection workload, and you reduce the required archive storage space. If you are measuring data other than average player performance, then you may need to collect more often. But that is beyond the scope of this introduction.

Preprocess Data

After scooping up the raw data, let's make it easier to analyze. Like processing a raw mineral, there are several steps that will prepare your data for mining. Many alternate methods can do this. Here is a simple method that economizes storage space and reduces mining computation. This preprocess has five general steps:

1. Take a snapshot of the database.
2. Validate that the data is clean and appropriate for analysis.
3. Integrate the data into a central archive.
4. Reduce the data down to just the fields you need.
5. Transform the reduced data into a form that is easy to analyze for player performance.

The details depend on the system's configuration. This example explains each step in a simple system:

Prepare the raw data for mining.

Suppose you are operating a fantasy MMORPG during its commercial service.

1. Start at the accounts database. This will be the first step to economy, since the accounts database has the ID of every record that you want information about. Schedule an automated snapshot of the user data at 00:00 on Sunday morning.

2. Validate which data is relevant and clean. This eliminates garbage as soon as possible, so that you are not storing or analyzing unusable data. Starting at the accounts database, exclude unregistered accounts or administration accounts. For example, exclude test and admin characters that have artificial attributes. For each valid character in an account, query for activity in the log database. If the character has not been active during the previous week, then its record contains no player performance information.

3. Backup valid user, log, and accounts records into an archive database. This will be a useful warehouse that you may return to in the future to mine for data you have not considered yet. Treat this backup preciously; if you were an archaeologist, this would be your find; if you were a detective, this would be your forensic sample.

4. However, you are now overwhelmed with a deluge of data. There is much more than you need to analyze a particular problem, such as the amount of experience points earned per hour of play. So reduce the data down to the fields you need. In this example, select the character ID, level, class, experience points, and number of hours played. Create a table of these values.

ID, level, class, exp, time

5. Transform this reduced data to make it easier to analyze. Since this archive has weekly versions of the data, use last week's data to create new information. Get the difference of the experience points and the difference of the time played. Append these columns to the table. If this is the character's first week, then there will be no information from the previous week. If the character has not played a while, then search backward through each prior week's archive.

Δ exp = exp1 - exp0
time = time1 - time0
ID, level, class, exp, time, Δ exp, Δ time

Archive a table of player performance data in terms of EPH.

Article Start Page 1 of 3 Next

Related Jobs

Skydance Interactive
Skydance Interactive — Marina Del Rey, California, United States

Narrative Designer
Substrate Games, LLC
Substrate Games, LLC — Des Moines, Iowa, United States

Software Engineer
Infinity Ward / Activision
Infinity Ward / Activision — woodland hills, California, United States

Senior Visual Effects Artist
Pixelberry Studios
Pixelberry Studios — Mountain View, California, United States

Senior Game Writer

Loading Comments

loader image