Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Cyberspace in the 21st Century: Part Six, Scalability with a Big 'S'
View All     RSS
May 22, 2019
arrowPress Releases
May 22, 2019
Games Press
View All     RSS








If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Cyberspace in the 21st Century: Part Six, Scalability with a Big 'S'


February 21, 2001 Article Start Page 1 of 4 Next
 

Interesting Times

Well, a fair bit's changed throughout the year 2000. We've seen the rise and plateau of Napster. We've seen the advent of viral marketing as a viable technique. People are beginning to recognize that being open isn't such an unsound business practice as it might at first have appeared - Open Source is now a creditable development strategy. Microsoft is getting worried, I mean interested in Linux. Intel is getting worried, I mean interested in peer-to-peer computing and distributed processing. Meanwhile Cisco is rubbing its hands together in glee - though I understand that we may yet see a revolution in the use of domestic wireless routing devices. Perhaps Cisco is interested in that area? Power seems to be returning to the people…

Interesting times eh?

Why is all this happening? It's the Internet. The Internet with the Web as its visible face makes the world a small place. Traditional business models may have been the first to be applied online, but more suitable models are beginning to arise. With a small world, with the miniscule amounts of friction present, competitive strategies that rely on having enough time for an inertia burdened business to adjust to a change, simply can't cope against lightweight individuals and co-operatives. Cut out the middle man. Deal direct. Co-operate. Do your business in full view of your traditional competitors, because it doesn't matter what they see, they haven't a hope of catching up in any case.

This applies to the games industry too. The music industry is in crisis, the movie industry may be next. The software industry as a whole is undergoing its own revolution. Digital content is simply too mobile to remain protected. Information really does want to be free. Games are no different. The players want them for nothing. That's not to say that players won't pay if they have to, in fact most players are in perfect accord with the idea of the games developer getting a fair reward for their labor, it's simply that games developers, just as musicians, should keep an eye on the writing on the wall. Things are going to change. If you want to be prepared: learn Linux, buy an Indrema, set up a website, join an Open Source team.

In the not too distant future, the player will be paying your wages direct.

Games and P2P

How does this industrial revolution relate to cyberspace?

Cyberspace is simply the logical evolution of peer to peer systems such as Napster, Gnutella, FreeNet, MojoNation, FreeHaven, etc. While the latter and others will be concerned with distribution of static content (music, images, movies, software, etc.), sharing of storage and sharing of processing resources, cyberspace will be the system that combines elements of all these together to distribute dynamic and interactive content. It's the low friction way of delivering digital content: anyone can put something in one end and it's instantly available to anyone else. Instead of treating digital content like a tangible commodity that can't be readily duplicated, and one that requires a one-to-one exchange mechanism, we instead replace it with a system that treats all content as inherently manifest. It's a bit like the Borg in Star Trek - any thought that occurs in one mind is available to all.

Ready for Success

So in terms of games, cyberspace is a platform that supports many games, but for each game there is no concept that the number of players is limited. Each game is a persistent environment that expands in proportion to the number of players playing within it. If it's fun for one, it's fun for one thousand. If it's fun for a thousand, it's fun for a million. It's a shared environment, a virtual space, a sandpit in which any number of people can have fun. The games designer figures out the rules, creates the style, sets the theme, provides some suggestions for objectives, but largely the players make their own fun.

This is where the peer-to-peer paradigm makes its entrance; the distributed systems technology, that comes into its own by ensuring that whatever one player does is visible to all the other players in the vicinity. Not only are we distributing content, we are also distributing change in that content. This is what all multiplayer games are doing. It's just that the current approaches in the form of client/server and crude peer-to-peer systems aren't sufficiently scalable for billion player games. Truly scalable systems design is what I'm trying to get across to you in this series of articles. Whether your game attracts a million players or only a hundred, if your design is scalable at least you can cope with a success. Imagine if you had a good idea for a multiplayer game and it was so popular it ground to a halt at a hundred thousand players. What a bummer eh? Instead of driving away in a Ferrari you end up with a major administrative headache. No problem you say, we just create parallel instances of the game in order to balance the load.

I'm wondering if this is really a matter of convenience rather than evidence of sound design. What would have happened if the web ground to a halt at 10 million users? Oh, no problem we'll just create multiple webs. We'll have MSN, AOL, Compuserve, Reuters, FreeServe, Yahoo, IBM, etc. A hundred different webs each slightly different, but with a heck of a lot of overlap. It wouldn't just be "Oh shall we get the .net as well as the .com domain name?", it would be "Should we have a presence on MSN as well as AOL?"

Here we have a potential success story, a game that's so good 40 million players want to join in. That's 40 million Pentiums beavering away. If we can produce a system that copes with 10,000 players why not 10 million? Let's not be so lazy that we allow a design limit that creates a ceiling on our success? It is better to get a dollar from 10 million punters for a little extra design effort, than it is to charge ten dollars to 100,000 players with all the admin costs of spreading players across different shards. Why do we ever produce a piece of software with the knowledge that human intervention will be required if it's too successful?

I don't know about you, but I'm into the idea of 'fire and forget' software. I want to produce a single package of software capable of supporting a billion players that will never encounter problems with support costs, additional infrastructure, software updates, maintenance, telephone help lines, etc. One game - a billion players - zero admin. What could be simpler?

So Many Players - So Little Time

I know there are people out there who have incredible difficulty understanding why on earth a game would benefit from even a million players, when surely a thousand is plenty? Check out the Star Wars Galaxies discussion boards for a discussion where prospective players of the Star Wars online game are even now questioning the wisdom of having multiple Star Wars galaxies, i.e. several instances of the same game each with tens of thousands of players. Instead of admitting it as a technical limitation, the excuse is that there's not enough content to support a greater number of players in a single game. Blimey, a whole galaxy, and they can't squeeze a few million players into it?

Space is mind-bogglingly big as Douglas Adams once wrote, and that's Big with a big 'b'. What I'm going to spend the rest of this article talking about, is how to engineer scalability into a distributed modeling system. And that means Scalable with a big 'S' - making sure that no part of the design grinds to a halt as the numbers start getting big with a big 'B'.

Threats to Scalability

How do we know a threat to scalability when we see it? It's any part of a design that exhibits an assumption concerning a 'reasonable use' and that is embodied as some kind of absolute limitation. Any time you see people storing years in a single byte, filenames in 16 bytes, directories in 256 bytes, any kind of fixed limit like that is an obvious candidate.

However, scalability limits can manifest themselves in more subtle ways. For example if you are imagining a system that is likely to be used for applications with numbers of simultaneous users on the order of hundreds, then an operation that takes an amount of time that has a squared relationship with the number of users is a big problem. It might be 100ms in the case of a hundred users, but 400ms for two hundred - not too bad. However, if you go up to ten thousand users it takes a quarter of an hour, and for a hundred thousand users you have to wait longer than a day.

Even a linear relationship can be a problem. Say a system, in some reconciliation process, consumes a certain amount of bandwidth from every connection of every connected player. It might be just one bit per second, but the system will grind to a halt at around 56,000 players (probably much sooner). This is the main reason why client/server is inherently non-scalable. If you require high bandwidth from each player to a single server then at certain number of players (about 5,000 say) you begin to start warping the network infrastructure - even if you do have a server that's powerful enough to cope with the workload. Sure, ask for half a dozen T3 connections to be piped into your server room, you might end up waiting a few months while the network warps into shape to accommodate you - unless of course, you just happen to site your server room near a decent backbone…

The only relationship we can really even countenance is a logarithmic one, i.e. linear growth in overhead in proportion to exponential growth in players. For example, if you need one more bit in a particular data word for every doubling in user players, a 32 bit word allows you to cope with 4 billion players. But even then, nothing's that straightforward in the real world - you have to watch out for noise, spurious anomalies, and nefarious phenomena. Sod's law puts paid to back-of-the-envelope calculations that should work in theory. And where Sod's law finds the job too hard, there's plenty of hackers to fill the breach.

So if you honestly think that "We'll never get more than about a hundred players congregating in the same place" - hah!

Of course, you can make assumptions about what should happen in practice, but you still need to cater for what shouldn't, because it will happen. The trick is in ensuring that the system degrades gracefully and appropriate to the situation. If there simply isn't enough bandwidth to support a riot, then only those participating in the riot should notice any reduction in fidelity. Ideally players would still see the behavior of their closest opponents, but those further away in the crowd would simply be modeled from infrequent samples. This brings me back to the idea of prioritization as one of the solutions brought to us by the 'best effort' philosophy. When perfection is impossible, the best is good enough, and certainly better than failure.

Things Change

Hand in hand with scalability goes adaptability. Players are pretty unpredictable blighters and they can change in behavior from day to day or even second to second. Players are human beings (well most of them) and as we all know, human beings are pretty clever when it comes to understanding their environment - their lives have depended on it - probably why intelligence evolved in the first place. Any system has to be almost as intelligent in adapting to its users as its users are in adapting to it. One thing that the architects didn't figure when the designed the millennium bridge over the Thames in London was that its users aren't all independent entities. Even subtle behavioral cues can be enough to get people to act in concert (causing a bridge to oscillate wildly in this case). With multiplayer games it's much worse: we tend to have to presume we're dealing with players deliberately setting out to stress the game to breaking point.

But, there's more to change than just the players. We also have computers winking in and out of the network, coming back with bigger hard disks, CPUs, and ADSL connections. Considerate ISPs might realize they can attract more subscribers if they donate some spare capacity by making nodes out of some of their Linux boxes.

Even the network itself is in a continuous state of flux, in terms of effective topology as well as traffic, with consequent unpredictable fluctuations in bandwidth and latency. Sometimes a network might undergo a serious problem when an earthquake strikes a key set of points. A satellite might be zapped by aliens.

In general, anything could happen to pull the rug out from under the system. However, it must adapt. We can't allow a game to go on for two years that builds up to a 100 million players many of whom may have made a considerable investment of effort in building towns, cities, empires, relationships with other players, spent a wodge of money on certain weapons or resources, only for it to fail when someone accidentally pulls a plug. "No worries everyone - we'll restart it on Monday!" The outcry might trigger a civil war!

Hopefully you'll notice where scalability and adaptability come into play in designing for a billion players.

A Self-Organizing System

Each player interacts with the system via a user interface running on a piece of software I call the front-end. This front-end interacts with a back-end service operating as a node in the distributed system. It is the back-end that performs the modeling of the virtual world and does its best to communicate the modeling it does with any other nodes that may be interested and to receive their communication of the modeling that they are doing. The process of managing the relationships and responsibilities also falls to the back-end.

Each node can be considered to correspond to a player's computer. However, this is not necessarily the case. It is possible that multiple front-ends may exist on the same computer - a split screen on a console for example. Alternatively, multiple front-ends may be running on different computers (mobile, handheld devices) and they could all be talking to the same back-end. Multiple back-ends may also exist on the same computer, e.g. one node acting in a fully interactive capacity and operating from a fast hard disk, one node acting in a back-up capacity operating from relatively slow near-line storage, and one node might even be operating in a read-only capacity from a DVD-ROM jukebox. But, there might be plenty of CPU capacity for them to all operate on the same computer.

Anyway, for the time being we'll consider that we're operating on a basis of 'a computer is a node'.

Let's assume we know all about how to uniquely identify computers/nodes, objects, players, and anything else. We don't have a problem utilizing locally available storage (hard disk) to store as many objects as may interest us now or in the near future. We don't have a problem utilizing locally available processing (CPU) to execute as many object methods and modeling services as we can. We don't have a problem sharing the CPU between the player's visualization application (3D renderer) and the modeling back-end. We don't have a problem exploiting all available communication channels to get in touch with other nodes, nor do we have much difficulty with the idea of how a new node goes about discovering a fairly good node to initiate contact with.

The key problems we're left with are these:

  • How do we keep track of which nodes are arbitrating over state?
  • How do we determine which nodes should arbitrate over state?
  • How does one node determine which node or nodes it should be communicating with?
  • How does the system achieve persistent storage of state?
  • How do we determine what happens when a node becomes unexpectedly disconnected?
  • How do we cope with the situation when a major division occurs in the network?

There is of course the issue of security, and although at first glance there may seem to be an insurmountable security flaw in any system that utilizes client-side resources, let's remember that we're dealing with two related, and very difficult problems here: a scalable distributed modeling system, and a secure one. Let's not give up on one, just because we can't see how to solve the other. Putting it metaphorically: if we're trying to build an airplane, let's not give up just because no one's invented parachutes yet. And you never know, once flight becomes popular, the unthinkable idea of flying without a parachute might just end up being quietly forgotten.

If we first understand how a system can be scalable, then we can qualify ourselves to be in the position of understanding how it can be secure.


Article Start Page 1 of 4 Next

Related Jobs

Square Enix Co., Ltd.
Square Enix Co., Ltd. — Tokyo, Japan
[05.20.19]

Experienced Game Developer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[05.20.19]

Technical Artist - Cinematics
Flight School
Flight School — Montreal, Quebec, Canada
[05.20.19]

Game Programmer
Dream Harvest
Dream Harvest — Brighton, England, United Kingdom
[05.18.19]

Technical Game Designer





Loading Comments

loader image