Gamasutra: The Art & Business of Making Gamesspacer
TO WHAT ISSUE WILL THIS COME? - Games, Metacritic, and why critics love putting numbers on things
Printer-Friendly VersionPrinter-Friendly Version
View All     RSS
April 16, 2014
arrowPress Releases
April 16, 2014
PR Newswire
View All
View All     Submit Event





If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:


 
TO WHAT ISSUE WILL THIS COME? - Games, Metacritic, and why critics love putting numbers on things
by Andreas Ahlborn on 03/04/13 09:00:00 am   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

What the critics agreed upon

Preface

When I was 10 years old I started to systematically catalogue and categorize all my media. I think it started as a Valve to ease up the pressure of everyday schoollife by being constantly judged by teachers and parents to have sth. to judge myself.

I started with comics: I would have a small collection of favourite comic-books that I graded A, a bigger one graded B etc. With the years my methods got more refined and I dissected the media in their different parts: I would judge the penciling and coloring apart from the writing and developed crude formulas that weighted the “value” of a comic book like: 50% weight for the penciling, 30% for the writing, 10% for the coloring and an additional 10% for the cover. When I reached my teens I further developed this system with friends and applied it to all kinds of things: Music, Books, Movies. Our decision process was very heated and often we adapted our value categories, because it couldn´t be that a John Cougar Mellencamp-Record was numerical superior to any Springsteen-Record.  And yes, we numbered Girls, too (we didn`t got further in differentiation than Face/Body), years before Social Networks did it.

Without being aware of it, we (like millions of other teens worldwide that are obsessed with numbers) preconceived Metacritic and Facebook.

Would we have been so naive to assume that our subjectively to the discourse contributed Numbers actually measured something objectively? I hope not. It was always intended to explain something for ourselves about how we came to verdicts about good/bad quality we perceived in certain Artworks. It was not scientific and we never could have imagined at that time that our verdicts would be getting “more objective” the more we trained them. Our tools of evaluating Artwork improved with our knowledge, sure. You can call our decision-process more informed after we heard 1000 records, in comparison when we thought in the beginning Mike Oldfield and Chris de Burgh were to be considered the most valuable Assets of modern Popmusic. It was also very obvious that our Numbers changed all the time.

If the 10000-hours theory is even remotely true, I hate to bring it to the official critics: at this time and age practically every youth has clocked in this time with gaming, listening to records, watching films until her/his 25th birthday. We are all involuntarily experts when it comes to mass media.

Facts & Figures

I won`t lie to you, it took me some years to actually discover the fundamental flaw that lies at the core of Metacritic. Up until last week I never noticed that Metacritic is in fact a double morality-system. I would have never thought any institution could get away with so obviously fiddling their numbers.

What is wrong with this picture?

Metacritics Scoretable for different media

And what Explanation is given by Metacritic to justify this treatment?

Why is the breakdown of green, yellow, and red scores different for games?

The reason for this special treatment for games has to do with the games publications themselves. Virtually all of the publications we use as sources for game reviews (a) assign scores on a 0-100 scale (or equivalent) to their reviews, and (b) are very explicit about what those scores mean. And these publications are almost unanimous in indicating that scores below 50 indicate a negative review, while it usually takes a score in the upper 70s or higher to indicate that the game is unequivocally good. This is markedly different from movies, TV or music, where a score of, say, 3 stars out of 5 (which translates to a 60 out of 100 on our site) can still indicate that a movie is worth seeing or an album is worth buying. Thus, we had to adjust our color-coding for games to account for the different meaning of games scores compared to scores for music, movies and TV


I don´t get it.

A site that is dedicated to merge ratings from different systems is exactly then unable to do it properly when the named media uses the same basic 100 scale as itself?

If the overwhelming majority of gamecritics would agree that only the upper 5 values of their scoring system is used –meaning they are not as draconic as for example movie critics with the damnation of a bad movie, why wouldn`t they adjust their ratings according to that information? They have no problem with mapping multiple different star-thumbs ratings to their needs, why not doing the same with the gamemagazine-grades? Further recherche uncovered this article (“Understanding review scores in the meta critic age”). It basically states what another article on Gamasutra also hinted at recently: The Gaming industry is far more fixated on the Metacriticscore than any other.

Developers of big Studios may be somehow punished from their employers for mediocre scores and game magazines might run into trouble with advertisers if they underrate a Blockbustergame. So far so bad.

Imagine a Musicmagazine that rates all its records with a three star rating system: 0/1/2/3 meaning bad/mediocre/good/excellent and then stating that all genres classical/pop/jazz are treated likewise, except Heavy Metal, where 0/1/2/3 means inaudible/what?/I-can-hear-sth/That’s-better arguing that since heavy metal fans are to an overwhelming degree near-deaf, they can´t really appreciate anything else other than volume degrees.

Now imagine a college that rates all its foreign students categorically better than the native students to counter the disadvantage that English is not their first language. The foreign students end up getting nearly the same averaged degree then the native students but every headhunter secretly knows, that a Chinese B translates to an English C, thus the bias towards native students that might be inherent in the schoolsystem is only clouded not abolished.

In the end game critics will run into problems with this “Hype-creep”. In fact if you read most of the big magazines reviews you wonder if the critic has played the same game he has slapped a grade/figure on. Instead of speaking directly to the audience they tend more and more to encode their language with political correct terms. It`s a lot like in the business world, when bosses write in their credentials of your work performance: “He did his best”, but to be actually considered as an applicant to a new job the credentials should say: “He constantly excelled in his performance”. He did his best is managerspeak for “tried hard but failed”.

What would we expect to be the outcome if we made a forecast how over a large number of data samples the averaged critics would mirror this “correction”?  Games should end up with a higher numerical Metacritic rating compared to Movies, TV & Music, but with a similar color coded Metacritic rating: counting every red number as -1, yellow as 0, green as +1, such only emphasizing the positive/negative outcome.

Here is a sample of 11800 games vs. 8400 movies: (Highscores overall/Alltime).

Meta-Metacritic sample

The Problems with Games could be that the sample is "polluted" by the fact that it includes all platforms (mobile/consoles/PC) and therefore the result could be distorted, since successful (critically well received) games/franchises should tend to get get wider distribution across more platforms than unsuccessful ones, thus unfairly multiplying their influence. (After all we don`t get to count the 3D/48fps/normal Version of The Hobbit as 3 films which would be a comparable thing).

Pokerface

So at this point I´m not sure what this all means.

Are games actually on average of higher Quality than Movies? (Meaning while only every fourth Movie ends up being “good”, nearly every 3rd game does).

Are game critics easier persuaded (bribed) into favorable scores as some sources suggest?

Or do they have simply lower quality standards?

Are gamers per se expected to have higher trash-tolerance than movie-goers?

Despite the fact that the Moviegoer-crowd largely overlaps with the Gameplayer-crowd?

Am I being paranoid or is something rotten in the meta of critic?

 

POSTSCRIPTUM

Since many comments of this article indicate that its intention not to contribute to any kind of conspiracy-theory about what are the hidden reasons why Metacritic is "really" treating games different, don`t seem to have come across, that might totally be my fault due to the fact that English is not my first language, i will add an example how we could get rid of the double-standard system on the one side and taking into account that gamesmagazines have written themselves in the corner of a armsrace (scoreinflation) on the other, thus killing two angry birds with one slingshot.

It simply proposes an arithmetical recalibration, that can be done with 3rd grade school math.

Taken from a comment below:

I am well aware that maybe the core of the problem lies in the fact that by having the "same" measure (100) as Metacritic its simply more convenient to adapt the score instead of "recalibrating it". This could be either done by cutting unnecessseary appendixes (for example if the worst ever created game gets at least 20 points for the effort, then the 100 point game score system is effecrtively a 80 point one) and then equally dividing the gap. The best way in my opinion would be like this:

If we take 100 as the upper limit, 62.5 as the average pure yellow mediocre game, then the lowest score any game could get would be 25. We have 75 gameunits that have to be mapped to metacritics 100. So 1 metacritic point corresponds to 1.33 "gamecritic" points.

Example:

The new TR has at the moment a score of 85.
1. we have to subtract the 25 that is an unspoken bonus every gamecritc is used to count in.
-> TR has now an adjusted Score of 60.
2. Now we have to multiply the 60 with 1.33:
60x1.33 = 79,8 is roughly 80.

We now have effectively eliminated the need for a double-standard system.


Related Jobs

SOAR Inc.
SOAR Inc. — Mountain View, California, United States
[04.16.14]

Game Designer/Narrative Writer
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[04.16.14]

Associate Producer - Treyarch
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[04.16.14]

Production Coordinator (temporary) - Treyarch
Vicarious Visions / Activision
Vicarious Visions / Activision — Albany, New York, United States
[04.16.14]

Software Engineer-Vicarious Visions






Comments


Steven Stadnicki
profile image
The thing that stands out at me is that for all the ranting in this article about how Metacritic has skewed its assignment of green/yellow/red to scores for games, the last chart points out that *EVEN WITH THAT SKEW TAKEN INTO ACCOUNT* games still receive more overall favorable ratings than film does. That suggests to me that Metacritic is clearly doing the right thing and may in fact not even be doing enough of it - while it's certainly possible that there are just fewer bad games than bad movies being released, Metacritic has obviously identified an overall skew in the ratings system and has taken measures to correct for it, measures that have been mostly if not wholly successful. Metacritic isn't doing anything that wouldn't be done by any self-respecting statistical organization looking at the data they have.

Andreas Ahlborn
profile image
"That suggests to me that Metacritic is clearly doing the right thing and may in fact not even be doing enough of it"

Weird enough in a certain sense I agree with you. And disagree at the same time. Metacritic is doing the right thing, but in a wrong way and if they would do it right (to find a calibration method that would truly make games comparable to other media) my data "could" suggest they might even be a bit "stricter".

While only looking at how they partitioned their rating system for
Other Mass Media VS. games
(20/20/20/20/20) VS. (10/15/25/30/20)
you should get suspicious.

Here we have a metric systems that is creative but not very helpful when it comes to measuring things: We have a ruler that has some of his lines in inch-intervals, some in centimeter intervals and some in Pixel, all mixed together.

Jerry Curlan
profile image
If game critics pretty much universally dub a 40/100 game as "bad" or "poor" in their own published scoring explanations - and all seem to - and film critics generally consider a 40/100 movie "below average", how can you expect a site like this to blanket label that score across industries as "bad"? Or mixed/average? The analogies herein are crude. Critical groups, just like scoring bodies, have their own standards and customs. Nothing wrong with being as accurate as you can be in representing those norms as they vary from industry to industry. Have to say that I Love the M & M's image, though. Especially the blue. Makes me hungry. - "I would do any.. THING for ....."

Andreas Ahlborn
profile image
"how can you expect a site like this to blanket label that score [40] across industries as "bad"")

I am well aware that maybe the core of the problem lies in the fact that by having the "same" measure (100) as Metacritic its simply more convenient to adapt the score instead of "recalibrating it". This could be either done by cutting unnecessseary appendixes (for example if the worst ever created game gets at least 20 points for the effort, then the 100 point game score system is effecrtively a 80 point one) and then equally dividing the gap. The best way in my opinion would be like this:

If we take 100 as the upper limit, 62.5 as the average pure yellow mediocre game, then the lowest score any game could get would be 25. We have 75 gameunits that have to be mapped to metacritics 100. So 1 metacritic point corresponds to 1.33 "gamecritic" points.

Example:

The new TR has at the moment a score of 85.
1. we have to subtract the 25 that is an unspoken bonus every gamecritc is used to count in.
-> TR has now an adjusted Score of 60.
2. Now we have to multiply the 60 with 1.33:
60x1.33 = 79,8 is roughly 80.

We now have effectively eliminated the need for a double-standard system.

Ramin Shokrizade
profile image
I think Andreas is on to something. If possibly the worst game of all time, Battlecruiser 3000 A.D. (http://www.gamespot.com/battlecruiser-3000-a-d/) was given a 2.6 rating from Gamespot, then we can feel safe saying that the scale is from 26 to 100 in games, not 0 to 100. I can also confirm, having spent at least 4 years as a gaming journalist doing game reviews, that there is tremendous pressure to up-rate games from major studios. I know back during my time (2001 to 2005) if you gave an SOE game even a mediocre rating, no matter how buggy at launch, you would be barred from further previews of upcoming products. This was a serious handicap back when people cared what SOE was putting out.

Thus I think the double standard might say more about problems in our industry than it does about problems with Metacritic.

Andreas Ahlborn
profile image
@Ramin: While I agree with what you say and your story of how journalists are pressed from their superiors not to bite the hands that feed them is totally believable - this is surely a problem across all medias, whether its a TV channel, a Newspaper or a gamemagazine, it will always be the case that critical reports about big moneyspenders will be silenced if it hurts your revenue.
To state that its a bigger problem for a gamemagazine than the New York Times to loose advertisers is reasonable. But its more of a problem of size matters, than which sort of media they specialize in.

We have now the problem in our industry that while many media have build up authorities like the NYTimes or Forbes that might heavily differ in their critical reception of certain news, informations, political decisions or critical reception we don`t seem to have an instution (or preferably more of them) that are considered industry-wide authorities, such creating a balance of critical powers.

We have only metacritic, that somehow was sucked in this kind of power-vaccum and so we should make sure that at least there is everything comprehensible, which -in my opinion- it is not.

Vin St John
profile image
This is only a problem if anyone is using Metacritic to compare a game to a movie, which I sincerely doubt is a common use case (or its intended one). But I'm not a big metacritic user to begin with so I may be wrong.

Andreas Ahlborn
profile image
It might be not the most common use case but thats at least a sideffect of numbering everything in a "calibrated" form.
Since their Slogan is "Keeping score of entertainment" they are encouraging their readers to compare different mass media with one and the same "entertainment value".

Will Oberleitner
profile image
I take metacritic with a grain of salt with their scores but i do look for key things i know i enjoy in the actual reviews. Somethings are just not see easy quantifiable and i get the feeling we should be trying most games we can if we can. Metacritic is not good for criticism especially if we just look at the numbers, this just quantifies culture as product reviews and it is not that easy.

Jeremy Reaban
profile image
Well, for one, I think a lot of movie reviewers use a 4 point scale. 1 terrible 2 okay 3 good 4 great. While most game reviewers either use a 10 point scale, 5 point scale, or letter grade.

But let's be honest, it's because video game reviews are far more dependent on video game companies, and the scores usually reflect that. More advertising, the better the review.

That Aliens game is a perfect example of that - one of the only sites to give it a glowing review also happened to have full page advertising, where the site's theme was Aliens. Coincidence? Sure. Right.

Meanwhile movie studios have to search for an obscure news outlet to come up with a good review for their stinkers, and in some cases, have to manufacture quotes.

Tom McGarry
profile image
I have about 4 people in the game review industry that I have filtered down to that have similar likes as I do when it comes to games. They are the only reviews I pay much attention to, the rest I look at myself.

But again the review process is set up in a way that its generally editorial. As Zach said "It got 'highly acclaimed' when it was an _awful_ fucking game that ended up spawning a franchise." of Assassins Creed 1. I loved the game. Thought it was awesome. There was some broken things in there but I still really enjoyed my time.

Game reviews are a personal opinion. Metacritic including the mommy bloggers of games in the review cycle was a mistake. Most of them are just users with a blog.

Robert Green
profile image
One component I think often goes overlooked is that it's a lot easier for a film critic to see every film than it is for a game critic to see every game. Especially now in the smartphone era, there are far more games than anyone has time for. For a game review website then, you have to be selective about which games to cover, and the easiest way to do so is to focus on the ones gamers are more likely to care about. Chances are, these will also be better than average, because gamers care about sequels to good games, about games from respected developers, about the games with positive preview coverage, etc..

The end result of this is that potentially dozens of games every week that would fill that 0-50% range don't get reviewed, because no one thinks it's worth their time. And since you need multiple reviews to get a metacritic score, a terrible game has to be reasonably high profile before it will show up at all.

Andreas Ahlborn
profile image
Good point. Same and arguably bigger problem for developers is getting visibility on itunes, steam or any other hegemonic organisation that has demigod-like powers to prefilter mass media products for their users and who tend to be almost completely occupied by AAA-sequels or AngryClonesFarmvilleRunners.

Ramin Shokrizade
profile image
I was thinking something similar. Back when I was doing game reviews from 2001 to 2005, I literally played everything, even some products that were not in English. That just is not remotely possible now, even on one platorm. Many independent products which I find to be superior to the current crop of AAA products just don't get reviewed. Again this has something to do with my previous comments about advertisers and studios influencing reviewer scores also.

Robert Green
profile image
I decided I'd better check on my hypothesis to be sure, and I think it checks out. I had a look through the PC games released in Feb that don't have a metacritic score yet, and it's comprised almost entirely of things you've never heard of, from devs you've never heard of, and the ones that do have critic or user reviews are generally low to middling. I'm sure this must affect the overall average of games that do have metacritic scores.

Having said that, I'm not sure the numbers were low enough to really make up the difference being discussed here.

Paul Laroquod
profile image
The only thing that is more full of BS than Metacritic is almost every game review ever written. Seems like they deserve each other and I have little sympathy when people who think it makes sense to judge art by numbers get gamed by inflationary tactics.

Jaime von Schwarzburg
profile image
The difference between films and games: Even the worst reviewed films are still technically watchable

Jerry Curlan
profile image
Excellent point. I sat through David Lynch's "Inland Empire", one of the worst movies I've ever seen - even though I'm a huge Lynch fan - but it was ONLY 3 hours. And the projector worked - the film played through as it was intended. Having to grind your way through a broken video game is excruciating.

Mathieu MarquisBolduc
profile image
A problem in recent years is review trolling: a low traffic website posting a terrible review of a good game to get a traffic boost, since the highest and lowest scoring reviews usually get the most clicks on metacritic.


none
 
Comment: