Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Localizing MMOGs
View All     RSS
October 22, 2019
arrowPress Releases
October 22, 2019
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Localizing MMOGs


September 12, 2003 Article Start Previous Page 2 of 3 Next
 

A Meta-Language To Improve Grammar Quality

In a text-heavy game where players are going to be reading your strings for hours upon hours, it pays to have good grammar. This is a polish item - people don't even notice when the grammar is good, but they certainly notice when the grammar is bad. We wanted to avoid the grammatical errors that are typical of table-based solutions. However, since keeping the number of strings low saves on bandwidth cost, we wanted to avoid using multiple versions of a string just to change parts of speech. For instance, if one of your strings is

"You give $PLAYERNAME$ an $ITEM$. He inspects it and hands it back."

You don't want to have a separate version of this string just to change the "He" to a "She" based on player gender. The common solution to this problem has been to reword the sentence to avoid using a pronoun. But in other languages, even items have genders; if this one string must fit all possible languages, it becomes extremely awkward.

And what about items with plural names? If you substitute "pants" for the $ITEM$ in the string above, you need to change the modifiers and pronouns, even in English. Similarly, words that start with vowels need to be prefaced with "an" whereas words that start with consonants use "a", but items that have proper names shouldn't use either "a" or "an." And there are several other common cases that require the sentence to change based on the variables. Since the game adds content every month, it isn't reasonable to just rework the strings to avoid the problem. We can't really tell our content designers, "Never add a new item whose name starts with a vowel," nor can we tell them to never have an NPC refer to an item by pronoun.

After a bit of thinking about the problem, we decided upon a simple scripting language to let content designers embed the different cases into the string itself. For instance, the first part of the above sentence would be something like this in our data:

"You give #1:{the [!n]} #1:$PLAYER$ #2:{a[!n] | an[v] | some[p] } #2:$ITEM$."

This "meta-language" lets the content creator embed a lot of meaning into one sentence, at the expense of legibility in the tables themselves. Some of our content designers found the language intimidating while others learned it very quickly. Fortunately, most people only needed to use the meta-language to name items, which is pretty easy.

Each item, monster, and place name must be annotated with the proper tags to indicate what rules to use for that thing. We do this right in the string itself - for instance, if an item name starts with a vowel, the string might be

ID_Name = "Apple[v]"

Here the [v] indicates that it starts with a vowel, so it should be prefaced with "an" instead of "a". Multiple letters could also be used; for instance,

ID_Name = "Eric[mnv]"

Indicates that "Eric" is a proper noun ([n]), a male name ([m]), and begins with a vowel([v]). The complete list of tags we used for AC2 is listed below.

[m] = male ("he" or "him")
[f] = female ("she" or "her")
[i] = inanimate or gender-neutral (an "it")
[p] = plural name (as in "those pants")
[v] = starts with a vowel (so use "an" instead of "a")
[n] = name (proper noun - don't use "an", "a", or "the")
[s] = ends in the letter 's' (so use "'" instead of "'s" to make it possessive)

If the name of an item didn't need any tags at all, it should still be marked in some way (we used [E], which stands for "empty") so that the translators can tell at a glance which words might need meta-tags or not. Even when a name doesn't need any tags in English, it might still need tags in French. Each language can have different tags for each string according to the rules of that language.

The game engine could have programmatically figured out which letters to use - for instance, if a name starts with a vowel, it could automatically have appended [v] to the word without requiring us to manually assign it. However, there are too many special cases for that to be safe - some words that start with "h" should be prefaced with "a" while others should be prefaced with "an": compare "an hour", with "a house." Only a human being can tell at a glance what language rules should be followed for a particular word, so we assign that task to the human being naming the item. In the case of "an hour", the [v] tag should be applied, even though the word doesn't actually begin with a vowel, because the word should use the same semantics as a word that really does start with a vowel.

Thus in general, the game engine doesn't actually know what each letter means; the letters are just flags assigned to the word as far as the engine is concerned. The exception to this is player names. Since we obviously can't know the meta-letters to assign to a player name until the user types that name in, we have to programmatically generate the tags for that name. So the character-creation system automatically adds the appropriate letters to the players' names. There is a chance the game will get these letters wrong because it is using such simple rules of thumb to assign the letters, but fortunately the English rules for names are simpler than the rules for objects. For instance, it doesn't really matter whether a name starts with a vowel or not - you don't add "a" or "an" before it. I am not "an Eric," I'm just "Eric." The fact that my name is a proper noun trumps the vowel rule.


AC2 in German. German words tend to be longer than English words.

(We were lucky in that we didn't need to add any special tag-assignment rules for French, German, or Korean. Had we localized to other languages, or wanted to allow other grammar forms, we might not have been so lucky.)

Once all the items and creatures and whatnot are tagged, they can be used as variables in sentences. These sentences can choose different words based on what meta-letters are in a variable. The basic structure of an optional block of text is to enclose it in {} characters, with different options separated by "|" characters. Each option lists the meta-tags that it should be used with, or none if it is the default:

"I want {an[v] | a} $FOOD$."

This sentence can become "I want a banana," or "I want an apple," depending on whether the value of $FOOD$ has the [v] tag or not. You can also indicate that a case is supposed to be used specifically when a tag is NOT present, by prefacing the letter with an exclamation:

"You kill {the[!n]} $NAME$."

This would only insert "the" into the sentence if the $NAME$ variable does not have the [n] tag. So this sentence becomes "You kill Timmy," or "You kill the Gurog Shaman," depending on whether $NAME$ has the [n] tag.

Cases can also indicate that they should only be used when two or more tags are present; this is most useful in French and other languages that have genders for inanimate objects:

"You {murder him[mn] | murder her[fn] | destroy the feminine-gender object[f] | destroy the masculine-gender object[m] | destroy the gender-neutral object}!"

This isn't needed in English, but it is a useful construct in French, where even inanimate things can have genders. You don't want to use "murder" as your verb for a barrel, and you need to use the proper French form of "the" based on the gender of the word "barrel."

In cases where there are lots of choices like this (which is pretty rare, especially in English), the game chooses the option that has the most letters in common with the variable. (Where negated letters count as a match if they aren't present.) In the case of ties, the last tying match found is used. Here is a contrived example:

As you kill {the[!n]} $NAME$, {it[i] | they[p] | he[mn] | she [fn] | that named machine[n]} {screams | scream[p]} and {explodes | explode[p]}!

If $NAME$ is "Eric[mnv]", then the string comes out "As you kill Eric, he screams and explodes!" This happens because "he[mn]" has two letters in common with "Eric[mnv]", and that's more than any other choice. If $NAME$ was "Sir Roboto[ni]", then it would come out "As you kill Sir Roboto, that named machine screams and explodes!" This is because two different choices match here: "it[i]" matches with one letter, and so does "that uniquely-named machine[n]". Since there is a tie, the last valid match is chosen.

Then we come to the case of sentences with multiple variables in them. When there are multiple variables, each {} must specify what variable it references. The variables are numbered (such as "#1:$NAME$"), and the same number is used for blocks that depend on that variable (such as "#1:{he[m] | she[f] | it}"). Here is a torturous example of using lots of pronouns in a complex sentence with three variables:

"#1:$PLAYER1$ gives #2:{the[!n]} #2:$ITEM$ to #3:$PLAYER2$. #3:{He[m] | She[f] | It} thanks #1:{him[m] | her[f] | it} and goes about #3:{his[m] | her[f] | its} business."

This works out to "Bob gives the sandwich to Sue. She thanks him and goes about her business." Or, if Sue gives the sandwich to Bob, the gender of all the pronouns is reversed.

We can also do fancier things. Sometimes we want to refer to an object without even displaying the name of that object at all. If a variable is given a negative number, it does not appear in the string, but it can still be used by meta-cases:

"#-1:$FIRSTPLAYER$ The withered old man shakes his head sadly and says, 'I already gave it to #-1:{him[m] | her[f]| it}.'"

We also added a meta-tag that is automatically applied by the engine, instead of being applied by the string's authors. When a variable is a number, it is examined by the game, and if it is singular, it is given the [1] tag automatically. This lets us put both the singular and plural cases into the same sentence:

"You find $NUM$ {coin[1] | coins}."

This is a little like the [p] meta-tag, but [p] means the object is inherently plural, whereas [1] means that the variable is a singular number.

The [1] tag is an example of feature creep. In this case it's very useful feature creep, but nonetheless it's an example of improving a system to do things beyond what it was originally intended to do. The [1] variable has proven its worth many times over, but in general I tend to be wary of adding arbitrary new features into the engine on a whim. When it comes to human languages, it's never as simple as it looks. When you add a feature, you have to evaluate the underlying grammar assumptions you're making, and then have those assumptions validated by your translators, and then make any alterations necessary for foreign languages. Even the [1] tag has a special case: in English, only the number 1 is treated as a singular number, so only variables that are "1" get the [1] tag applied to them. But in French, the number 0 is also singular; when the client is running in French it has to tag both "0" and "1" with the [1] case. There are other languages with even more complex rules about singular digits; if we had been translating to other languages this feature may have proven impossible to add.


AC2 in Korean. Translation teams help quite a bit.

Some underlying wisdom can be gleaned here: first, don't assume you know how other languages work unless you speak them fluently. Consult your translation team a lot. They care about translation quality and will be happy to help you get it right. Have your translation team available as early as possible so that you have them available for consultation early in the project, when you still have time to add or change features as necessary. It's also important to know what languages you should be targeting as soon as possible. Even a somewhat over-ambitious list is better than a list that omits languages you end up needing to target. Better to be safe than sorry.

Because meta-languages are dramatic simplifications of complex human languages, there are many times when they cannot cope. If you can foresee problems with your game's features, talk to your translators and come up with a plan before you start coding. For AC2, the biggest problem was our randomly generated treasure. After consulting our translation team, we decided to name our treasure using the form "$ADJECTIVE$ $NOUN$ $PHRASE$" ("Mighty Helmet of Extreme Comfort"). If we had chosen a different form, like "$ADJECTIVE1$ $ADJECTIVE2$ $NOUN$" ("Mighty Comfortable Helmet"), translation into German would have been much more difficult, because German adjectives modify each other's conjugation form.

But even though we chose the simplest form for translating, it was still not as straightforward as we'd hoped. The meta-language had a hard time coping with random treasure items that were used in other sentences. If we wanted to correctly use an "Accurate Sword of Malaise" in a string, we needed to somehow tag this item with the [v] tag. "Bob's Sword of Maiming" needs to have the [n] tag automatically applied to it. That way when we get to a string like

"You pick up {an[v] | the[!n] | a } $ITEM$."

the random item works correctly. So we had to add a feature to the meta-language so that it automatically pulls certain tags from certain words. For instance, if the adjective contained [v] or [n], the final object would have those tags. If the name component had [m], [n], or [f], the final object would get those tags. If the phrase on the end had the [s] tag, the final object got that tag. These rules were different for each language, but were easily specified in a per-language configuration file. This wasn't a hard feature to add, but because we realized the problem very late in the development cycle, it was a pain to schedule even the single day of work needed to add the feature. Then we had to spend time explaining how the feature should be configured by the translators, documenting it, etc. This should have been done earlier.

A couple of face-to-face meetings early on would have saved us some headaches, but in the end the meta-language proved worthwhile. The entire system took two weeks of programming time to implement. It took additional time to teach the designers and translators how to use it, of course, but the result is very high quality text in all of our target languages.

As more and more MMORPGs hit the market, they will begin to differentiate themselves by their level of polish as much as by which features they offer. MMORPGs are unique among games in that they are constantly improving and can add features over the lifetime of the game - typically at least three years. But after the game ships it is impractical to rework all the strings to incorporate a meta-language. If it is going to be done, it has to be done during development. As such, I'm very glad that we were able to pull it off. Our translators are very pleased with the results, too. As a simple solution to a complex problem, a meta-language is worth investigating for any text-heavy game, not just MMORPGs.


Article Start Previous Page 2 of 3 Next

Related Jobs

Sucker Punch Productions
Sucker Punch Productions — Bellevue, Washington, United States
[10.22.19]

Camera Designer
Deep Silver Volition
Deep Silver Volition — Champaign, Illinois, United States
[10.22.19]

Gameplay Programmer
University of Utah
University of Utah — Salt Lake City, Utah, United States
[10.21.19]

Assistant Professor (Lecturer)
HB Studios
HB Studios — Lunenburg/Halifax, Nova Scotia, Canada
[10.21.19]

Experienced Software Engineer





Loading Comments

loader image