Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
September 21, 2017
arrowPress Releases






If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Analyzing a Dataset of Game Releases

by Bobby Lockhart on 07/27/15 03:18:00 pm   Expert Blogs   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutras community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

Hi, I'm Rob Lockhart, Creative Director of Important Little Games. I'd be grateful if you followed me on twitter.

It all started when I stumbled across this misleadingly-titled Polygon article written last year and followed the link to the data source out of curiosity. Basically it's just a list of videogame titles, some of which have been annotated with a developer, a year, and/or a platform. Since I'm fond of semi-structured data sources, I downloaded the list, which had grown to nearly 150,000 titles since the Polygon article was published, and started to play around in Mathematica. As you read on, be advised that this is an extremely noisy dataset and does not necessarily reflect the videogames industry's history, or even the titles it lists.

The first thing I did was take a look at the top words that occur in videogame titles. There were 150,000 game titles and a vocabulary of around 45,000 unique words. About 21,000 of these were used only once in any game title. For scale, consider that apparently it is not uncommon for a native speaker to have 20,000-35,000 words in their whole vocabulary.

Let's take a look at the top 50 words I found:

Screen Shot 2015-07-24 at 11.29.10 PM

There are a lot of words that are completely unsurprising, as they are overwhelmingly frequent throughout English. Numerals, both Arabic and Roman, play a big role, meaning that there are a lot of sequels. Frustrating for those of us who value originality in interactive entertainment, but by no means surprising. Let's filter out these uninteresting results and look again:

I also recombined plurals into the root word.
I also recombined plurals into the root word.

In my humble opinion, it really sucks that 'war' shows up second, after 'game.' There's nothing wrong with war as a theme for any particular game, but our industry's singular focus on war and violence becomes pretty tiresome, as this chart exemplifies. Which word would I prefer in second place? 'Magic,' of course!

~

I also noticed that there were quite a lot of games which use subtitles. Not the written dialogue at the bottom of the cutscenes, but the second part of a title separated by a colon. Things like the underlined part of "Call of Warfare: Modern Videogame." Let's take a look at the most common subtitles:

Screen Shot 2015-07-25 at 12.22.45 AM

'The Game' and 'Gold Edition' seem to make sense, but for some reason 'The Movie' comes in third. Why are there so many games (56) with ': The Movie' in the title?!

I'm not very fond of this naming pattern in the first place, but some of these should unquestionably be retired. Let's not name any more games "Something Something: Vengeance" shall we?

~

As I mentioned earlier, some of the entries in the data are tagged with a developer, year, and/or platform. I found the developers more or less impossible to extract systematically, but I had better luck with years and platforms.

About 1/5 of the games were tagged with a year, but they were represented unevenly. As you can see below, only the years from 2000 to 2015 had any kind of decent coverage. It's interesting to note that within that period, the number of games released per year did not increase or decrease significantly (if this dataset can be taken as a representative sample).

Screen Shot 2015-07-25 at 12.44.30 AM

If we compile a list of the top ten words for each of these usable years, we might notice some trends.

Screen Shot 2015-07-25 at 12.51.06 AMI think you can kind-of see the zombie craze creeping up in the past few years, as the words 'dark,' 'night,' and 'dead' climb the charts. You can also see where we became obsessed with 3D for a little while.

If we bring back the trivial words we decided to exclude early on, you'll see that some games' titles include the year they were released and many include the following year.

Screen Shot 2015-07-25 at 1.00.00 AM~

In terms of platforms, the coverage was very spotty. Here you can see the number of games tagged by console. The fact that Linux is any significant presence should be a clue that some platforms are far overrepresented amongst tagged games.

Screen Shot 2015-07-25 at 1.22.45 AM

If you're interested, here is a list of the top ten words by platform. Many of these platforms only have one or two titles listed, so you'll see some oddly specific words.

PlatformWords

Thanks for reading! If you're interested in exploring the dataset yourself, feel free to download my Mathematica notebook. I'd love to hear your suggestions of further analyses to do and other data sets to explore.


Related Jobs

Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States
[09.21.17]

Engine Software Engineer
Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States
[09.21.17]

Sr. Core Systems Engineer
Infinity Ward / Activision
Infinity Ward / Activision — woodland hills, California, United States
[09.21.17]

Senior Visual Effects Artist
Infinity Ward / Activision
Infinity Ward / Activision — Woodland Hills, California, United States
[09.21.17]

Senior Game Designer (Scripter)





Loading Comments

loader image