|
Statistically Speaking, It's Probably a Good Game, Part 2: Statistics for Game Designers
Populations and Samples
The base of statistics is the analysis of data. When dealing with data, there are two main terms that you need to know:
- Population: the entirety of a field for which measurements are to be taken. The population is arbitrary, and is dependent only on what you wish to measure. For example, say you want to know what people think about a particular issue. Your chosen population could be all of the people on earth, all of the people in Iowa, or just all the people on your street.
- Sample: a portion of the population for which measurements are actually taken. For very obvious reasons, it’s often too hard to gather data for an entire population. Instead, you gather data for a portion of the population. This is your sample.
Accuracy and Sample Size
The strength of a statistical conclusion is extremely sensitive to the size of your sample.
In a perfect world, you’d always like your sample size to be equal to your population--that is, you want to collect data on the entirety of whatever matters to you! Because anything less means you have to infer trends (a mathematical inference, but an inference nonetheless). Furthermore, the more data points, the better; you’d rather have a giant population than a tiny one.
Marketers and politicians would give their left brains to get a sample that is equal to their (large) population of interest. For example, instead of polling 10,000 junior high school kids to get an idea of how they feel about Fruit Roll-Ups®, imagine if they could poll *every junior high school kid*. Failing that, polling 1,000,000 would be super. Failing that, 100,000 would be dang nice. Failing that…okay, 10,000 will do.
It is for reasons of time and money that studies are performed on samples rather than entire populations.
- The Common Sense Rule of Statistics: mo is bettuh
You can’t predict a trend with one data point. If you know I like chocolate ice cream, you can’t draw any meaningful conclusions about what all Sigmans like. Now if you ask many members of my family, then you might be able to draw a reasonable conclusion about what the rest think...or at least know *whether* you can draw a reasonable conclusion. Ain’t stats fun?
Population Explosions and Wide Distributions (BEEP! BEEP!)
For reasons that only The Big Guy can explain, many things in life tend to follow similar patterns, or distributions.
One of the most common is the aptly-named “normal distribution.” That’s right, anything not matching this is abnormal, and therefore weird (and should be shunned appropriately).
The normal distribution is also known as a “Gaussian” distribution, primarily because “normal” doesn’t sound scientific enough.
The normal distribution is also commonly called a “bell curve” because, well, just look at the durned thing, will ya!?
Normal “Bell Curve”
Standard Form (variance of 1, mean of zero)
*Image Courtesy Wikipedia.org
The distinguishing characteristics of a bell curve distribution are that most of the population are clustered closely around the mean, or average, value, and comparatively few are scattered at the extremes (high or low). This middle-clustering leads to the bell-curve appearance; the highs and lows are the flange of the bell.
We see the bell curve around us in a million different things. If you measured the heights of all the people in your city, they’d probably match this distribution. That is, a tiny few would be super-abnormally short, a tiny few would be super-Yao Ming tall, and a great many would be within a few inches of the average.
The bell curve typically holds true whenever you are looking at people’s skill levels, too. Take sports - a tiny few are good enough to play professionally, a great many are good enough to get by, and a tiny few are so bad that they don’t get picked to be on teams (like me).
Other Distributions
The normal distribution, despite being swell, isn’t the only distribution around. It’s just amazingly common.
For examples of some additional distributions that are directly related to gaming and game design, just take a look the probability distributions of dice throws, in this case a d6 and then a 2d6 throw:

D6 Distribution

2d6 Distribution
In part 3 of this series, which should hit Gamasutra shelves around 2010, I’m going to spend a bunch more time talking about these dice distributions. For now, all I’m going to say is that the first one looks nothing like a bell curve, whereas the second throw is starting to resemble one (but still isn’t quite there yet).
|