A Deep Dive Into Steam's Discovery Queue
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.
In my previous article, I discussed how Valve introduced changes to Steam's visibility algorithms on October 5th, 2018. It's not clear exactly what changed, but this seems to have resulted in less Discovery Queue traffic to smaller games. This is unfortunate, as Discovery Queue traffic is particularly important for these games. It provides visibility during sales. It provides an audience that is looking for new games. And it can also amplify the effect of whatever marketing efforts a developer can deliver.
Valve hasn't explained what changed back in October. But I wanted to get a better understanding of how the Discovery Queue works and what games it is showing. Between December 2nd and February 28th I viewed 672 games. For each game I was shown, I saved the store page HTML. I then spent far too many hours obsessing over the data. This is an in-depth breakdown of the statistics and an analysis of the results. I may have gotten a little carried away.
Limitations of This Approach
There are many caveats regarding the relevance of this data. For one, this only reflects my personal Steam account. It's a reasonable assumption that what I'm seeing will generalize to other accounts, but I have no way to know for certain.
Also, my data only goes back to December 2nd. It offers no insight into how the current algorithm compares to how things worked before October. I also believe that Valve made further tweaks to the algorithm in early March, which is why I am restricting my sample to the end of February. But I have no way to know if the algorithm has been consistent during the time I was collecting data. Valve could have been making subtle tweaks to their algorithms through the whole time, and we would not know it.
My Steam account preferences are set to allow mature content to be shown, with the exception of Adult Only Sexual Content. I haven't set any tags in Steam's Tags to Exclude feature. In my Discovery Queue preferences, Early Access Products, Software, Videos, and Unreleased Products are all allowed. Prior to starting this experiment, I was a semi-regular user of the Discovery Queue feature. I had seen 1323 products, and I marked 339 of them as "Not Interested." The impact of marking a product as Not Interested is unclear. One page states that "It does not change what kind of games will be recommended to you." A different page claims "We will also exclude these products from being used to recommend you other, similar items." During the experiment, I did not mark any additional products as Not Interested. The Discovery Queue does not repeat itself, so each of the games I saw was being recommended for the first time.
There are a lot of considerations to make in the design of a recommendation system. One of the most common and straightforward tools to apply is popularity. Basing discovery recommendations on popularity has its downsides, as people are likely already aware of popular things. And popularity-based recommendations means that hidden gems will inevitably be missed. But when a lot of people like something, it is a strong signal that it might be interesting.
Another factor in creating good recommendations is quality. This can be difficult to pin down, since quality can be extremely subjective. Critic and user scores attempt to put a numeric value on quality. Review scores can gloss over a lot of subtlety, but they give a number that is easy to incorporate into an algorithm. While popularity tends to be correlated with quality, something doesn't have to be good to be popular. And critically, low popularity does not imply low quality.
One final tool that I think is key to a good recommendation system is personal relevance. Determining what things might fit a user's particular tastes is challenging but potentially very helpful.
Valve's page explaining the Discovery Queue provides a good overview on how to use the system, and I recommend you give the page a read if you aren't familiar with the system. It notes that the Discovery Queue "…tries to strike a balance between prioritizing products known to be good, and new products that you may find interesting." This stated approach of balancing quality and relevance seems like a good start, with no mention of popularity. The description also notes the importance of new products, introducing the idea of recency as an additional priority in making recommendations.
This Product is in Your Discovery Queue Because it is Popular
Valve has said that they do not want to share too many details of how their algorithms work, because they don't want people to cheat the system. But Steam does provide a lot of information about why things are in your Discovery Queue. This information starts with what I call the top line, or qualifying reason. The top line reasons banner shows a general reason for why you are seeing a game, such as "because it is popular" or "because it has positive user reviews." These reasons seem to explain how a game qualified to be shown to you, but does not necessarily explain why this particular game was recommended.
I shared a draft of this article with Alden Kroll of Valve. He explained that these top line reasons do not necessarily indicate a direct cause for the game appearing. He provided the following details on this system:
- When you see the “reasons” line (eg. “This product is in your discovery queue because it has positive user reviews”), that doesn’t actually mean that review score is the reason it is there (or at least not the only reason it is in your queue).
- Think of the “reasons” field being a suggestion to the user about why, in a vacuum, they might enjoy this particular title.
- We want to pick reasons that seem strong and easily understandable to a user. For example: if a title is very highly reviewed, that’s something that’s easy to explain, and so we’re more likely to pick that as thing to highlight for a customer. If a game is really popular, then we are more likely to pick that as the thing to explain why the game is shown in your queue.
- We want flexibility to adjust and experiment with the underlying algorithm and we can’t always explain all the reasons why a game is in your discovery queue. In some cases, the algorithm used to generate the recommendation may not come with easy to digest reasons for any given recommendation, but we still think it’s valuable to give some indication to a customer about why they might like that title.
The following table shows all of the types of top line reasons that I saw and how many times each of these reasons occurred.
|Because it is popular||281||41.8%|
|Because it has positive user reviews||255||37.9%|
|Just to see if you might be interested||46||6.8%|
|Because it is new on Steam||36||5.4%|
|Because it has a high Metacritic score||26||3.9%|
|Because it is on sale||21||3.1%|
|Because it is a top seller||7||1.0%|
281 games that were shown to me with the top line reason indicating it was because they were popular. Representing 42% of the games I viewed, this reason is the most common one. It is not clear what criteria is being used to determine popularity. It is likely a mix of factors, but my best guess is that it is mostly based on recent revenue.
As I said, popularity is a useful tool for making recommendations. But I am surprised to see it so heavily emphasized in the Discovery Queue, where users are looking to find new games they may not already be aware of.
Because it has positive user reviews
The second most common top line reason for this data set is because the game has positive user reviews. The threshold for this is reason appearing is 80% positive. The following chart shows a histogram of user scores for the games I saw. The games with this top line reason are highlighted separately, and the stacked values represent the full distribution of user scores.
Stacked histogram of User Scores for games in my Discovery Queue
With this reason appearing on 38% of my Discovery Queue views, it seems like this represents an important visibility threshold in the system. However, Alden tells me that "...review score (above ‘negative’) has very little impact on whether a game gets recommended" and that there is no code that is explicitly selecting for games over 80%. He suggests that the discontinuity in the chart is due to various criteria for getting recommended happening to be correlated with having over 80% positive reviews. He wanted to make it clear that developers do not need to target a particular review score to get recommended.
Just to see if you are interested
Surprisingly, the third most common qualifying reason is "just to see if you might be interested." While this was only 46 games (6.8%), I did not expect it to be so frequent. The logic behind this category makes sense. Valve wants to collect data about the games on the platform, and randomly recommending them gives an opportunity to learn more. Though the systems only seem to factor in a very limited amount of data. If I buy a game, that will feed into Steam's popularity metric. If I play the game, then it will factor into future recommendations. But whether I Wishlist, Follow, mark as Not Interested, or simply click Next, my actions do not seem to factor into Valve's algorithms. I would think that these actions would reveal some information about the game and quite a lot about my personal preferences. But that data does not seem to be used currently, making these views much less effective than they could be.
Because it has a high Metacritic score
Metacritic score is another attempt to quantify quality. Again the threshold is 80, though this represents a much higher bar to overcome than an 80% User Score. For smaller games in particular it is difficult to even get a Metacritic score, let alone to stay above 80. And again, it seems that this is a hard yes/no threshold. There does not seem to be a trend towards higher Metacritic games being recommend more than lower-scored games. With only 26 games qualifying for this reason, the system is obviously placing a much lower emphasis on Metacritic compared to User Scores.
Stacked histogram of Metacritic Scores
Because it is new on Steam
The Discovery Queue provides some additional visibility to new games in the store. Of the 36 games that showed up under this reason, ten were products taking pre-orders, one was not available for purchase yet, and 26 were launched. The launched games were at most ten days old.
Because it is on sale
Currently being on sale also shows up as top line reason for a game being recommended. It certainly makes sense that users would be interested in games that are discounted, and this is a feature that can help developers amplify their own marketing efforts.
Because it is a top seller
Finally, we have 7 games that were shown because they were on the top sellers list. Each of these games that showed up with this reason attached were taking pre-orders. Whereas all of the games that were displayed because they were popular had, at a minimum, launched into early access. So this seems to be a system that is linked to popular pre-orders.
In these categories, we see popularity and quality factors playing the biggest role in what is qualified to be shown. Discounts, recency, and some random trials are also mixed in at a much lower rate. Something that does not show up in this at all though is personal relevance. This part of the algorithm doesn't seem to be qualifying any of the traffic according to what my personal tastes might be.
While browsing my Discovery Queue, it seemed like the games I was seeing tended to be recently released. The "new on Steam" qualifying reason only explains a small percentage of this, and the effect seems to go beyond that. This chart shows the distribution of games by age.
Histogram of days since release capped at 2000 days.
44 games (7%) had been released for a week or less, and 74 games (12%) had been available for a month or less. There does seem to be some recency effect, though it may be explained by recent games tending to be more popular than older games.
While browsing the queue, I felt like I was seeing a lot of unreleased games, but looking at the numbers, the reality is that most of the games were launched. Here is the breakdown of release status.
As part of Steam Direct, Valve introduced systems to prevent developers from creating games targeted at exploiting the achievements and trading cards systems. Games that have not reached some criteria for eligibility are marked as having restrictions on certain features. Initially, the indication would always use the text "Steam is learning about this game" but in early February, the text "Profile Features Limited" started to appear for some games. It is not clear what the difference in status indicates.
Valve has said that this "still learning" status only impacts achievements and trading cards and does not impact store visibility. I saw 71 games marked with the still learning status and 14 games marked with the features limited status. If we look at the breakdown of the topline reasons for how these games qualified for my Discovery Queue, we see a different distribution for these games, indicating that these games tend to be either newer or less popular compared to the rest of the sample.
|Because it has positive user reviews||36||41.4%|
|Just to see if you might be interested||20||23.0%|
|Because it is new on Steam||14||16.1%|
|Because it is popular||14||16.1%|
|Because it has a high Metacritic score||2||2.3%|
|Because it is on sale||1||1.1%|
It has always seemed like the Discovery Queue tends to show me a lot of VR games, even though I don't own a headset, and I've never played a VR game on my account. In this sample, 46 games (6.8%) were VR. I estimate that 10% of the total Steam catalog has VR support, so despite the impression that I had, it does not seem like Valve is giving VR games an extra visibility boost in the Discovery Queue.
464 (60.9%) of the games in the sample had the Indie tag, whereas 72% of the total catalog has the Indie tag, indicating that indie games are relatively underperforming when competing for visibility in my Discovery Queue.
Another source of information about Steam's recommendations is the "Is this game relevant to you?" info. This feature of the Steam store is not specific to the Discovery Queue. Logged in users are shown a relevance section on every game page. As we will see, these relevance factors don't seem to be directly deciding the recommendations. It seems that the information shown in this section isn't showing why the Discovery Queue chose this particular game. Instead, it is a system that looks at a game and returns whatever information it has. This is the first area where we are seeing how the recommendation system is considering personal relevance as well as popularity and quality factors.
Is this relevant?
One of the most informative things this section provides is that sometimes it provides no information at all. On occasion when exploring your Discover Queue, this section will show text stating that "You've already looked at a lot of games that we have the best information on for you. Until new games release, you might see less relevant games as you explore more of your queue."
During my experiment, I was shown games with this text nine times. For eight of those games, the top line qualifying reasons for being shown was because the game is popular. The ninth was shown because it was new on Steam. I found encountering these messages to be frustrating. The message seems to be stating that I have exhausted the algorithms ability to provide personalized recommendations for me. Yet continuing to explore my queue revealed many games with reasonable personal relevance. This seems to indicate that the algorithm is showing me the game because popularity factors outweigh the lack of relevance. The text indicating that the algorithm has run out of personally relevant games is misleading. It is also interesting to see a reference to the importance of recency in this text.
I have been shown less than 10% of the Steam library by the Discovery Queue, but it seems like the systems are designed to emphasize recent and popular titles over exploring Steam's deep back catalog of games.
Similar by tags
The most common relevance explanation I was shown was the "Because you've played games tagged:" reason, where a game's tags match tags for other games I have played. This reason showed up on 532 (79%) of the games I viewed in the experiment.
Tags on Steam are primarily crowd sourced, though developers can set tags on their own games. Valve does do some moderation. Users can report misapplied tags for review, and only tags from a pre-approved list will show up. Tags are a mix of various types of metadata, covering genre information like puzzle-platformer, feature information like Online Multiplayer, and various other miscellaneous properties such as Indie, Crime, or Colorful. Tags can provide useful information about a game. But they tend to be a very noisy data source because of their crowd sourced nature and inconsistent application. While I have been told that Valve is making efforts to reduce the importance of tags in their algorithms, as of today tags are still a major factor in visibility on Steam.
When showing game relevance by tags, one to seven matching tags will be listed in this area. While looking at my queue, it often seemed like these tag matches often weren't very relevant. A small number of tags would be listed, and those tags wouldn't be particularly related to my tastes. Here is the distribution of frequencies of tag counts that I saw.
As we can see, many of the games were matching on one or two tags, though the majority had three or four.
Beyond the number of tags, it often seemed like the particular tags that were showing up didn't capture a lot of useful information. To investigate this, I am borrowing a concept from Information Theory to calculate how much information a given tag conveys about a game. For example, the Strategy tag is present on 5894 of the 28268 games I was able to access using the SteamSpy API. Computing -log_2(5894/28268) gives 2.262 bits of entropy, which is the number of bits of information the presence of the tag conveys. If the relevance field shows multiple tags, we can sum the entropy to get the approximate amount of information the tag matches are conveying. This isn't a strictly correct measure of entropy, since tags aren't independent from each other. But it's still an interesting way to evaluate the results.
The 14 games that showed a single match on the Indie tag are recommended based on the least amount of information at 0.467 bits. The average match is based on 7.07 bits of entropy.
Histogram of bits of data conveyed by tag matches
33 different tags appeared in this section during my experiment. The following table shows how many times each tag appeared.
|Horror||61||Platformer||19||Hack and Slash||3|
This gives a rough snapshot of what the algorithm thinks of my preferences. Unfortunately, looking at this list I feel like it is a bit mismatched. Indie, Action, Adventure, and Singleplayer are all extremely broad. I don't tend to play RPG or Horror games, and I mostly find it unhelpful when the Discovery Queue recommends them to me. But I've been playing Bioshock recently, and because tags are so broadly applied, Bioshock is tagged as both an RPG and a Horror.
This system also seems to be inconsistent in reporting tag matches. For example, let's look at what it showed me for Hades' Star. The similar by tags section shows that the game is relevant to me because of the Sci-Fi tag. But this game also has the Atmospheric, 2D, Co-op, Great Soundtrack, Multiplayer, and Indie tags. All of which came up as matching tags for some other games. Digging into the data, I found this pattern of under-reporting matching tags occured on 503 of the 532 games that reported this information.
Similar to games you've played
Another type of match explanation is "Similar to games you've played." This reason, and the previous similar by tags reason, will always appear at the top of the list when they are present, and they are never both present at the same time.
This explanation shows one or two games that you have played in the past. Steam considers these games a match for the game that is currently being shown by the Discovery Queue. 125 (18.6%) of my Discovery Queue views had this relevance explanation. 78 of those matched one game, and 48 matched two.
This chart shows the 15 games that appeared in this section, my play times in those games, and the frequency that they occurred at.
|Out There Somewhere||43||0.4 hours|
|SHENZHEN I/O||14||3.4 hours|
|BattleBlock Theater||12||0.5 hours|
|Into the Breach||11||46 hours|
|NEXT JUMP: Shmup Tactics||8||0.9 hours|
|Life Goes On: Done to Death||7||11.6 hours|
|Crypt of the NecroDancer||7||63 hours|
|Fidel Dungeon Rescue||5||10.4 hours|
|DOOM 3||5||6.7 hours|
|BioShock Remastered||4||4.7 hours|
The datapoint that immediately jumps out is Out There Somewhere. The algorithm seems to think this game is extremely relevant to me, even though I have only played it for 24 minutes.
It appears that this set of games is drawn from my past year of play history and does not weigh relevance by playtime. The decision to restrict this system to the past year strikes me as unfortunate. The hundreds of hours I put into FTL a few years ago seem a lot more relevant to my tastes than some of the games on this list.
As with many things on Steam, this system appears to be driven by tags. While tag information isn't visible on the store page, the HTML shows each game is matched with a list of four tags that are shared with the game the Discovery Queue is recommending. Using these tags, we can apply the same entropy analysis we applied to the similar by tags section.
On average, game relevance is being calculated on 15.4 bits of entropy from tags. This is significantly higher than what we saw looking at the similar by tags section. This is partially explained by matches always using four or eight tags, rather than a range of one to seven. But this system seems to match more relevant tags as well.
55 different tags appeared in this system. The following table shows the frequency each tag occured. Note that because the system can show matches for two games, sometimes a tag match can happen twice on the same page.
|Pixel Graphics||65||Atmospheric||8||Turn-Based Combat||3|
|Strategy||19||Funny||6||4 Player Local||1|
|Rogue-like||15||Local Co-Op||5||Family Friendly||1|
|Turn-Based||14||Turn-Based Strategy||5||Local Multiplayer||1|
This chart of tag matches is much more diverse and seems to match my personal preferences better than the previous system did. It's interesting to note tags like Puzzle and Strategy taking a more prominent position, while Horror and RPG are lower down the list. It's too bad that this relevance reason only factored into 125 of the recommendations vs the 532 games matched by the similar by tags system.
Because it's popular
68 of the store pages I saw noted the game may be relevant to me because it is currently popular.
In the top sellers
15 of the games I saw were noted to be in the top sellers. This is another instance of determining relevance based on popularity. And it seems to closely correlate with, though not exactly match, a game being in the top 25 on the Global Top Sellers list.
Recommended by curators
89 of the games I saw were recommended by curators I follow. Of those, 71 were recommended by one curator, 17 by two curators, and one game, Wizard of Legend, was recommended by three curators. I follow 14 curators on my account.
I think this is one of the more interesting pieces of information to be fed into the algorithm. If I make good choices in selecting curators, then their recommendations can feed the algorithm information about both the quality of a game and its personal relevance to me. A limitation to this system is that it requires users to find good curators, and it requires curators to consistently make good recommendations.
Recommended by friends
Six of the games I saw were recommended by my Steam friends—five of those recommendations all came from one person. With only 19 friends on my Steam account, it is no surprise this number is low. This does seem like an interesting source of data about game quality, but I expect for many users it will be a relatively sparse piece of information.
475 (68%) of the games noted one of Positive, Very Positive, or Overwhelmingly Positive user reviews. Interestingly, this particular relevance reason seems to be given a lower priority than the other reasons. If a game does not trigger any of the other relevance factors, then the "you might see less relevant games" text will show, rather than showing the Positive reviews reason on its own.
A single game, NBA2K19, was shown to me with the warning that it may not be relevant due to mostly negative user reviews. This is a particularly interesting case, as it clearly shows the algorithm favouring popularity and recency over quality or personal relevance. This is not a good recommendation for me. And some parts of the system know that.
Theoretically the relevance system could also warn me when a game has a negative review from friend or a curator that I follow, but that did not occur in this experiment.
It's not easy to objectively evaluate how good of a job the Discovery Queue is doing. I don't have the data to say how good the recommendations I saw were compared to all of the games that could have possibly been recommended to me. Understanding the performance of the system would also require looking at data for many users, not just me. That said, there are a few data points that I thought were interesting.
During the experiment, I added eight of the games the Discovery Queue showed me to my Wishlist, which now has 94 games on it. Five of the eight games had the "similar to games you've played" relevance explanation, whereas only three had the similar by tags explanation. During this timeframe I also wishlisted three games outside of seeing them in my queue. Though perhaps if I had held off on wishlisting them, these games would have eventually been recommended. While I have not gone looking for new games that the system missed, I am aware of Art Sqool, which I thought seemed interesting, but it has not appeared in my queue.
I feel Steam shows me a lot of adult target anime games, a genre I have no interest in. Looking at the numbers, I was shown 59 games tagged with at least two of Sexual Content, Visual Novel, Anime, Nudity, or NSFW. Steam does offer a feature to exclude tags of your choice, but I am personally reluctant to use this approach. Blocking Anime and Visual Novel would exclude interesting games like VA-11 Hall-A. And blocking Nudity would exclude games like Mount Your Friends 3D and This Is the Police.
Amusingly, when I look at my Ignored Games list that I populated before starting this experiment, it shows me a list of tags explaining that "These are the top tags that are common among the products you have ignored. You can go to your Preferences page to add them to the 'Tags to exclude' list and we'll hide products with these tags." The list is: Indie, Action, Adventure, Singleplayer, Multiplayer, RPG, Great Soundtrack, Casual, Strategy, Atmospheric, Story Rich, Horror, Survival, Simulation, First-Person, Open World, FPS, 2D, Anime, and Shooter. These look remarkably similar to the list of tags that are used to show that a game is relevant to me.
My impression is the system could make better recommendations if it relied less on popularity and recency, and instead did a better job of surfacing titles based on quality and personal relevance factors. The challenge here is that popularity and recency are easy to quantify. Quality and relevance are more elusive.
The most surprising omission in all of these systems is the lack of collaborative filtering. This is a technique for determining relevance by looking at broad trends in user behaviour. If I play Portal, and if most other Portal players also play Portal 2, then Portal 2 is likely relevant to me. This isn't easy to do well, but it is the starting point for most successful recommendation engines offered by services like Netflix and Spotify. Valve has indicated that they are working with neural network based approaches to improve their recommendations, which may help fill in this gap. But I suspect that a straightforward collaborative filtering approach would be a better starting point.
It's also important to keep in mind that defining what it even means to make a good recommendation depends on goals and values. A system designed to maximize revenue will have very different metrics for success than a system designed to maximize user delight, for example. And we have seen the problems that Facebook and Youtube created when their algorithms were tuned to value engagement above everything else.
Something that I personally value in a discovery system is finding the hidden gems amongst weird, diverse, and niche titles. I think this is important not only because of personal interest, but also because the industry will be worse off if only the most popular games are visible. The traffic from the Discovery Queue is critical to indie games, and it will be a loss if making niche titles becomes even less commercially viable than it currently is.
To some extent, the information in this article is already out-of-date, as Valve has continued to evolve and tune their algorithms. But the systems described here are all still active on Steam today, and they are playing a role in determining what players are seeing on Steam. I hope Valve's ongoing work will improve the Discovery Queue, and that they are mindful of the importance of these systems and the impact they have on players, developers, and the industry as a whole.
As always, if you want to talk about market analysis, development, or really anything indie games at all, reach out to me on Twitter (@erikejohnson). My DMs are open.