Design Exercise: A Better Steam
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.
The following is a design exercise in identifying problems with Steam’s user reviews, with the second part of the discussion focusing on potential solutions to those problems, with a view to improve overall discoverability.
Let’s start with some cognitive observations. We know that salience in UI design is as much about increased visibility as it is about selective hiding. However, where information is limited, whatever bit of information is displayed takes on far more significance than its intrinsic value warrants. Visibility itself is understood to be equivalent to importance, even when the visible data isn’t particularly salient to the observer, simply because that’s all the observer knows.
This problem is then compounded by our reliance on anchoring to produce assessments. From the Wikipedia article on anchoring:
During decision making, anchoring occurs when individuals use an initial piece of information to make subsequent judgments. Once an anchor is set, other judgments are made by adjusting away from that anchor, and there is a bias toward interpreting other information around the anchor.
That is, the more inaccurate the anchor is, the more inaccurate every subsequent assessment is likely to be. And the most prominent bit of information presented by the UI becomes the anchor through which we appraise comparative values.
It becomes exceedingly important, then, to provide users with anchors which are actually relevant to the user since they have a tendency to become, for lack of a better word, reified. The avenue through which data is filtered increasingly ends up determining which data enters that avenue in the first place.
Consider the Humble Store, an example of a storefront with a tangible dearth of useful anchors. The Humble Store front page features no less than three sorted lists which are essentially inscrutable: “What’s Hot”, “Most Popular”, and “Trending”. What do these labels even mean? How are they different from each other? The only information this set of anchors really provides is that these games have been put on these lists through the actions of persons of unknown taste using unknown criteria. It’s asking the user to trust an entirely profile-less opinion.
And it’s not so much that popularity engenders more popularity as it is that popularity is the only filter available to sift all the data. The problem would be the same if popularity were switched out with any other filter that has a merely tangential connection to the user.
Ignoring these lists, the user can sort games by genre, but this is basically useless when the results can literally fill hundreds of page clicks. Within genres, the user has no choice but to sort by “Bestselling” once more as the only other purportedly content based method of refining. But again, this accomplishes very little beyond letting the user know that other people had to resort to the same thing.
The point here is that when the anchors that a user has access to are self-apparently without utility or relevance, the user will in fact shy away from engaging in any new assessment at all. A reliable reference point is the absolute bare minimum requirement for robust discovery. Otherwise, it simply takes too long to make individual assessments of every single game, particularly when so little can be known about these games from their store listings.
Keeping the above in mind, let’s move on to the actual topic of this discussion: Steam user reviews. The problem currently is that in practice there are no real means of accessing the reliability of those reviews. Sure, every review has a written component as well as a usefulness rating, but when these systems are open to abuse or hijacking, it becomes entirely too easy to dismiss a large portion of them as resulting from misguided personal grievances or idolizing. And the less users trust reviews, the less users participate in the algorithms powered by those reviews, or indeed, in the purchasing of content covered by those reviews.
Given that Steam reviews have (or are perceived to have) questionable reliability, the de facto information that holds place of prominence on Steam is not the quality of a game, but its popularity or infamy (or lack of both)—exactly the data that is counter to the purpose of discoverability—because this is the only other bit of “objective” information that the user can anchor to. It doesn’t even matter if popularity isn’t actually calculated into the discovery algorithm because the user herself will incorporate it post factum.
Unfortunately, we can see from the Humble Store that simply eliminating reviews entirely doesn’t help either—in fact, it is significantly worse for it. So we need another baseline reference that removes impertinent information altogether. We need information balance.
A simple realization, then: when it comes to content choices, it doesn’t really matter to me what other players think about a game so much as it matters to me what I would probably think about that game—or at least, someone like me. What is needed is an anchor or body of reference that can be directly constructed by the user herself that also conceals or de-emphasizes anchors constructed by others with opposing tastes. And the less social second guessing is involved, the more the user will be able to focus on genuinely salient information, which leads to a virtuous cycle of ever more accurate anchoring.
Returning to the perspective of balance, the situation is akin to the issues Elite Dangerous suffered due to the solo, private group, and open game modes all sharing the same persistent world.
If you're in private or solo, for example, and attack one of Elite Dangerous' factions, those [outside those modes] who wish to defend them are powerless to respond. You can even change the politics of a system of space, turning space stations hostile (this is called "flipping" a station). Those space stations may then try to kill those in open who dare to get too close.
Just as the quality of experience for open players declines due to the perceivable lack of self-determination in content engagement, so too does user experience with Steam user reviews decline whenever any game’s review page is “flipped”.
To the user experience, then, the fact that all voices are given equal weight actually disadvantages the individual user’s agency to a significant degree.
Digging even deeper, the crux of the issue is that Steam inadvertently privileges social positioning over any other utility that reviews may provide. Users don’t submit reviews in order to discover new games, they submit them in order to impress their views upon games they already have an opinion on, to influence that game in some way. Indeed, it probably doesn’t even occur to most users that their reviews have any impact on their ability to find interesting new games (and really, its hard to say if it does or not at the moment).
This is the opposite approach of the ratings system that can be found in Netflix (which to my mind is the gold standard of salient information production). With Netflix, the average or aggregate opinion is actively hidden. Instead, the rating that is shown is a predictive one that attempts to approximate an assessment based on the user’s previous personal ratings. Other users’ ratings matter, but, as far as I can tell, personal input and the input of users with a history of similar ratings carries by far the most weight.
And the feedback is immediately palpable: all one has to do is see the predictive ratings for content the user has already viewed but has yet to rate to see how accurate the predictions are. The more the ratings are used, the more they form a genuinely useful anchor which the user can continually contribute to and improve. The additional benefit for discovery is that the social currency (or lack thereof) of any specific content takes a far back seat to personal enjoyment as a criterion for content selection.
Such responsiveness to user assessments—to user agency—generates greater and greater confidence in the entire system. Again, it’s a virtuous cycle that leads to more exposure for more content. Chances are, if you’ve engaged in Netflix’s ratings system, you’ve probably found yourself enjoying content you knew nothing about beforehand.
That is to say, the incentive for reviewing games on Steam needs to change from politicking to discovery. And, perhaps surprisingly, the feeling of individual ownership over content choices actually increases the less Steam reviews are couched as a social platform. By reducing social visibility, the focus shifts from influencing particular games through reviews to influencing the user’s own experience instead.
To put it another way, by redesigning the reward structure to provide an individualized, intrinsic dividend for participation, and by moving the spotlight away from scorekeeping, the entire system should have the aggregate result of better and better discovery for all users involved.
Of course, weighted, predictive ratings rely heavily on data produced by users; it’s obviously impossible to predict anything if the initial data set is empty. To produce more data for games lacking them, it becomes important to matchmake users with other users of similar taste. Steam already attempts this through Curators, but the reality is that Curators themselves suffer from information opacity. There’s no way to know how close a Curator’s tastes are to your own without already knowing or heavily researching the Curator.
Which is to say that there needs to be a discovery system for Curators as well, with similarly weighted predictive ratings for how relevant their tastes are likely to be. Or, more radically, the option should be available to find and follow other lay users who have a history of similar tastes (as graded on a scale), provided there are privacy opt out or anonymous participation options (in other words, actual matchmaking). This offloads social positioning from individual store pages onto the users themselves, so that part of the experience can still be preserved to those that want it without having to go full Curator.
Finally, and perhaps most importantly, Steam should allow devs to initiate matchmaking with Curators by constructing content profiles of their games, through which Curators with corresponding interests or tastes can be discovered (and vice versa). This behooves devs to be accurate in their profile construction, with more accurate profiles likely leading to greater or more positive coverage. For Curators, such a system would ease the task of finding new games to cover, though again they should always be able to opt out. In any case, a potential mechanism would then exist for new games to generate the data they need for more universal matchmaking, with intrinsic rewards for all involved.
A summary to wrap up: obscure or hide ownership and review numbers as well as public reviews on the store page; use predictive ratings based on user history instead to provide a more relevant estimate; increase the accuracy of said ratings by greatly reducing the weight of ratings from users with opposing tastes; and introduce matchmaking systems for discovering other content profiles with similar tastes as measured on a graded scale. Lastly, implement a matchmaking system for devs and Curators, again based on a weighted profile, to better ensure that every game gets at least some coverage.