Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
March 29, 2017
arrowPress Releases






If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
Analyzing a Dataset of Game Releases
by Rob Lockhart on 07/27/15 03:18:00 pm   Expert Blogs   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

Hi, I'm Rob Lockhart, Creative Director of Important Little Games.  I'd be grateful if you followed me on twitter.

It all started when I stumbled across this misleadingly-titled Polygon article written last year and followed the link to the data source out of curiosity.  Basically it's just a list of videogame titles, some of which have been annotated with a developer, a year, and/or a platform.  Since I'm fond of semi-structured data sources, I downloaded the list, which had grown to nearly 150,000 titles since the Polygon article was published, and started to play around in Mathematica.  As you read on, be advised that this is an extremely noisy dataset and does not necessarily reflect the videogames industry's history, or even the titles it lists.

The first thing I did was take a look at the top words that occur in videogame titles.  There were 150,000 game titles and a vocabulary of around 45,000 unique words.  About 21,000 of these were used only once in any game title.  For scale, consider that apparently it is not uncommon for a native speaker to have 20,000-35,000 words in their whole vocabulary.

Let's take a look at the top 50 words I found:

Screen Shot 2015-07-24 at 11.29.10 PM

There are a lot of words that are completely unsurprising, as they are overwhelmingly frequent throughout English. Numerals, both Arabic and Roman, play a big role, meaning that there are a lot of sequels.  Frustrating for those of us who value originality in interactive entertainment, but by no means surprising.  Let's filter out these uninteresting results and look again:

I also recombined plurals into the root word.
 
I also recombined plurals into the root word.

In my humble opinion, it really sucks that 'war' shows up second, after 'game.'  There's nothing wrong with war as a theme for any particular game, but our industry's singular focus on war and violence becomes pretty tiresome, as this chart exemplifies.  Which word would I prefer in second place? 'Magic,' of course!

~

I also noticed that there were quite a lot of games which use subtitles. Not the written dialogue at the bottom of the cutscenes, but the second part of a title separated by a colon.  Things like the underlined part of "Call of Warfare: Modern Videogame."  Let's take a look at the most common subtitles:

Screen Shot 2015-07-25 at 12.22.45 AM

'The Game' and 'Gold Edition' seem to make sense, but for some reason 'The Movie' comes in third.  Why are there so many games (56) with ': The Movie' in the title?!

I'm not very fond of this naming pattern in the first place, but some of these should unquestionably be retired.  Let's not name any more games "Something Something: Vengeance" shall we?

 ~

As I mentioned earlier, some of the entries in the data are tagged with a developer, year, and/or platform.  I found the developers more or less impossible to extract systematically, but I had better luck with years and platforms.

About 1/5 of the games were tagged with a year, but they were represented unevenly.  As you can see below, only the years from 2000 to 2015 had any kind of decent coverage.  It's interesting to note that within that period, the number of games released per year did not increase or decrease significantly (if this dataset can be taken as a representative sample).

Screen Shot 2015-07-25 at 12.44.30 AM

If we compile a list of the top ten words for each of these usable years, we might notice some trends.

Screen Shot 2015-07-25 at 12.51.06 AMI think you can kind-of see the zombie craze creeping up in the past few years, as the words 'dark,' 'night,' and 'dead' climb the charts.  You can also see where we became obsessed with 3D for a little while.

If we bring back the trivial words we decided to exclude early on, you'll see that some games' titles include the year they were released and many include the following year.

Screen Shot 2015-07-25 at 1.00.00 AM~

In terms of platforms, the coverage was very spotty.  Here you can see the number of games tagged by console.  The fact that Linux is any significant presence should be a clue that some platforms are far overrepresented amongst tagged games.

Screen Shot 2015-07-25 at 1.22.45 AM

If you're interested, here is a list of the top ten words by platform.  Many of these platforms only have one or two titles listed, so you'll see some oddly specific words.

PlatformWords

Thanks for reading!  If you're interested in exploring the dataset yourself, feel free to download my Mathematica notebook.  I'd love to hear your suggestions of further analyses to do and other data sets to explore.


Related Jobs

Vicarious Visions / Activision
Vicarious Visions / Activision — Albany, New York, United States
[03.29.17]

Senior Designer (World) - Destiny
Bartlet Jones Supernatural Detective Agency Inc.
Bartlet Jones Supernatural Detective Agency Inc. — San Diego, California, United States
[03.28.17]

Engineer (all levels)
Disruptor Beam
Disruptor Beam — FRAMINGHAM, Massachusetts, United States
[03.28.17]

DevOps Engineer
Mindshow, Inc.
Mindshow, Inc. — Los Angeles, California, United States
[03.28.17]

Unity Engineer / VR Inverse Kinematics





Loading Comments

loader image