Gamasutra: The Art & Business of Making Gamesspacer
arrowPress Releases
April 18, 2014
PR Newswire
View All





If you enjoy reading this site, you might also want to check out these UBM TechWeb sites:


In-depth: Localization pipeline
In-depth: Localization pipeline
June 29, 2012 | By Michael Carr-Robb-John

June 29, 2012 | By Michael Carr-Robb-John
Comments
    2 comments
More: Console/PC, Production



[In this reprinted #altdevblogaday opinion piece, in-depth piece, Monolith Games' Michael Carr-Robb-John discusses the technical aspects of localization and offers a detailed walkthrough for a pipeline.]

In my previous post on localization, I talked about some of my experiences localizing games for different languages / regions. This time I wanted to expand upon those notes a little and talk more about the technical aspects of localization and walk through a pipeline.

The language and locale encoding

In the early days, I used to simply have an enumeration in a header file that was very similar to this:
enum ELanguage
{
eLanguage_English,
eLanguage_French,
eLanguage_German,
eLanguage_Spanish,
eLanguage_Amount
};
20 years ago this was fine, I was developing on a cartridge that had all the languages essentially loaded at once and really there was no need to support regions beyond the specific languages. These days, however, we need something a little more robust and as you should have picked up from my last post the locale is very important these days.

So lets start by looking at how we identify each translation, thankfully two very useful standards have been defined by people who know a lot more about languages and regions than I. These standards allow us to specifying each language and each region as a two digit code.
Using these we can create a short code for every possible supported language and region we are likely to encounter, for example:
   en-US     English America
en-GB English Great Britain
es-MX Spanish Mexico
nl-NL Dutch Netherlands
en-CA English Canada
fr-CA French Canada

A pipeline

This is by no means the only pipeline that can be used for localization, they all have different benefits and issues this one just so happens to be my preference, probably because I like offline tools.


The storage and manipulation of localized strings I have seen done in every way possible from databases to proprietary editing tools. My personal choice is to use Excel for editing and manipulation, but this does not come without two issues that you should be aware of:
  • Although version control software generally is fairly good at merging xml files, the xml generated from Excel always seems to make merging difficult (especially for designers) to the point that it is safest to simply lock the file while it is being edited.
  • Not all translators like to work in Excel so you will probably need someone or a tool (probably both) to convert what ever format the translators are working in to the excel format.
An example of the strings in Excel:


Column A contains the identifier string then each column along contains one translation. Notice the encoding id at the top of the sheet, this not only tells us the language / region but is used by the exporter tool to know which files to generate. The export tool exports the data into whatever binary compressed format you prefer to use in-game.

Since I have not worked on a game with massive amounts of text, I have generally stuck to a text format with each language being written out as a separate text file, like so:

en-US.lang
     PRESS_START=Press [Start]
OPTIONS=Options
MUSIC=Music
fr-FR.lang
     PRESS_START=Appuie sur [START]
OPTIONS=Options
MUSIC=Musique
Depending on which language / locale is required at run-time, just that single translation file is loaded into memory.

The exporter tool can also be useful in other ways:
  • Automatically detect and report missing strings.
  • Build fonts based upon the characters that are actually used, very important if you are doing Chinese which has thousands of characters, this method alone has been known to save megs of texture space.
  • Detect formatting mistakes and illegal / reserved characters.

Strings in code / scripts

In order to provide a framework for localization the first thing that needs to be cracked down upon is the use of strings themselves.

Previously you might of written the code or script:
     DrawString("Hello World, My name is Mr Flibble.");
Instead it should now be written passing a String Identifier like so:
     DrawString( eStringId_HelloMessage );
The string enumerations can be auto generated by the export tool, however I did this for a couple of projects and decided that it was more hassle than it was worth. My recommendation is to avoid this if possible, a better way is to pass the string identifier as a string itself:
     DrawString( "Hello_Message" );
Either way both methods would end up looking into a table to find the specific string to be displayed.

Encoding

There are quite a few encoding systems for text out there. Since this ground has been walked quite a few times in a lot of other posts, I'll skip it here with only a note that for game development my take on the subject is if you are working with limited memory, use UTF-8; otherwise use UTF-16.

Icons

More often than not, it is far simpler to insert an icon into a string than it is to use a long drawn out explanation to describe something. In the text string, I indicate where an icon is to be displayed and which one by using the [] markers, for example:
     Press [START] to continue.
Activate [GEM] by pulling string.
Part of my text rendering manager loads a setup file (text again) at startup that contains a list of all the codes and textures to use when that icon is encountered. Very similar to this:
     START, 0, X360_StartButton.tga
MOVESTICK, 0, X360_LS.tga
I can add additional textures on the line if I wanted to animate the icon for example:
     DODGE, 4, Wii_RemoteWave_1.tga, Wii_RemoteWave_2.tga, Wii_RemoteWave_3.tga
The number after the code is the animation speed (FPS).

On the subject of icons, consider this:
     "Use [RS] to aim and [RT] to mark enemy before pressing [A] to fire."
Imagine that your project is multi-platform, [A] should really be [X] on the PS3 and [B] on the Wii. An additional issue is that the Wii doesn't generally have a [RS]! You could create a string unique to each platform but that really would just double or triple the amount of data that needs to be maintained as and when things change.

My solution in the past to this little nightmare has been to ban platform specific icon names, which includes identifiers like [D-PadLeft], [A], [LeftStick], [X], [Y], [Z], [RT], [L1], etc. Instead I encourage game descriptive text:
     "Use [TARGETTING] to aim and [TARGETREGISTER] to mark enemy before
pressing [FIRE] to fire."
Then I have a different icon setup file for each platform and everything works between platforms without any major headaches.

Parameters

It's quite common to construct a string for displaying on screen, but it can cause issues for the translators if they don't know the context. Consider this:
     DrawString("%s! Get rid of them!", m_PlayersName );
Now you can see straight away that the %s will be replaced with the players name, however what the translators see is:

"%s! Get rid of them!"
Their best guess might be that it is going to be a name of a character but it might also be something else i.e.:
     "Chairs! Get rid of them!"
"Michael! Get rid of them!"
In order to help the translators, I use {} to mark parameters:
     "{s-Name}! Get rid of them!"
"{s-Object}! Get rid of them!"
The context after the '–' is ignored when rendering the text, it is purely descriptive text to help the translators.

Formatting

As I'm sure we are all aware by now, not everyone writes the date in the same way.

Consider the date 3/4/2012; to me personally this is 3rd April 2012, but to some it is 4th March 2012. Obviously once you get past halfway through the month it becomes a lot easier to spot, but it does mean that your region needs to know which date format to use.

Translators

Good translators should produce strings in the new language that are roughly the same length as the original string. I usually estimate a rough 20 percent difference between the English and other languages. This is another useful feature I have built into my tool; it can detect excessive differences between the lengths of the various translations.

Translators MUST not change the order of parameters in a translation. Obvious from a programming stand point, but I have in the past had translations that not only re-ordered the parameters but added additional ones as well!

Keep communication levels between you and the translators to a minimum. There have been times when a 5-10 minute email or phone call could of solved a problem, but because it has to be filtered through channels it can end up taking days or even weeks to sort out.

Assets

Asset management for localization has the potential to touch so many different moving parts of an engine it very quickly stops being funny. The solution I describe is tailored to the way my engine works, and it may not be applicable to how your tech works; still you might find this useful.

When an asset is requested, my manager has a list of directories that it scans for the requested asset. The first instance it finds is the file that gets loaded. By controlling which directories are in the list and their order, I can in effect override assets according to the language and region.

This is my directory structure for localization:


If the requested audio file exists in the specific language directory, that file will be loaded, if it doesn't exist, the manager will carry on searching the other directories until it finds the asset. Obviously I don't allow the player the ability to change languages half-way through a game.

It's simple, but it works.

Finally

That's everything I wanted to talk about in relation to localization, I hope you find it useful.

[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]


Related Jobs

Hasbro
Hasbro — Pawtucket, Rhode Island, United States
[04.18.14]

Sr. Designer/Producer, Integrated Play
2K
2K — Novato, California, United States
[04.16.14]

Web Producer
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[04.16.14]

Associate Producer - Treyarch
Treyarch / Activision
Treyarch / Activision — Santa Monica, California, United States
[04.16.14]

Production Coordinator (temporary) - Treyarch










Comments


Curri Barcelo
profile image
Good briefing. I have two comments:

"Good translators should produce strings in the new language that are roughly the same length as the original string. I usually estimate a rough 20 percent difference between the English and other languages. This is another useful feature I have built into my tool; it can detect excessive differences between the lengths of the various translations"

Well, I don't think being a good translator is to be measured by how long your translations are. If you want an accurate translation, a creative translation, a translation that is going to sound natural, sometimes you need to write longer sentences. Other times, the target language might be lucky enough to have just a word for that long sentence in English. Good translators are those who are able to write a target text that doesn't seem a translation but written from scratch by a native. Yes, 20% might be a general idea, but sometimes I have found myself that, in order to keep this 20%, I had to remove information. Or even going further. Some of the standard terminology to follow by the format holders make this task to keep the texts under 20% completely impossible. For example: "stylus". In Spanish is "lápiz táctil". That is 50% longer and I cannot do anything about it to keep it shorten, or Nintendo will not approve your game. Are you saying that NIntendo's translators aren't good enough because they cannot keep their translations below the 20% ;)
I think it is always safer to keep the text boxes with a flexible size and advise the translators to please try to keep the sentence shorter. That is what a good developer, who understand about languages, does and a good translator will try to keep that too.

"Translators MUST not change the order of parameters in a translation. Obvious from a programming stand point, but I have in the past had translations that not only re-ordered the parameters but added additional ones as well!"
If by "parameters" you mean the different variables and bits of coding that are scattered around the text and that will be exchanged by words and elements belonging to the text, I only can say that that is, I'm afraid, impossible, unless we have to invent a new language. Something very simple to explain this. In Spanish, for example, the word modifying any noun will go always after the noun (exept for very specific stylistics reasons). In English it goes before the noun. So: "red car" will become "car red" in Spanish (and probably in French, Italian and other Latin languages). So in this case, I need to be able to change the other. There are more complicated examples, but I just wanted to let you know taht sometimes we are asked impossible tasks.
Of coruse, there are things that must stay the same (like those parameters that might englobe the whole sentence, so they are right at the beginning and right at the end of a sentence). Those must not be changed, but only eveything within them. If you have ever received a translation with added parameters or amended parameters that couldn't be changed, the either you haven't coded the game to be properly internationalised (and, therefore, be translated to several languages), or your translators have no idea about games localisation :) If I ever have a doubt about the possibility of moving or swapping a parameter, I will always ask the developer. As you said, communication is very important and it solves many problems very quick. Maybe if a translator asks you if it's possible to swap a parameter and you say "no", then together you can find a solution (either changing the sentence or changing the way it is coded).

Bye! :)
Curri

Michael Carr-Robb-John
profile image
Hi Curri,

Thanks for the comment, good feedback.

Consider a small to average game might have ~1,000 strings translating that into the basic languages (EFIGS-N) would give us at least six languages which means that potentially there are 6,000 strings. The idea behind the (very rough) 20% check in my tool is to flag up to me personally which strings are most likly to cause problems in the games presentation both in terms of screen space but also timing i.e. audio speech and time taken for the player to read a message. I have never given a translation back just because it failed the 20% check, I have however asked for a shorter translation when the translation is excessive (Which sometimes happens), is too long to fit on screen or in the required space or breaks the momentum of the games pacing.

There is a mistake in the article I don't make it obvious that the 20% rule really only applies to sentences and paragraphs, in the case of your "Stylus" single word that wouldn't cause any problems.

"I think it is always safer to keep the text boxes with a flexible size and advise the translators to please try to keep the sentence shorter. That is what a good developer, who understand about languages, does and a good translator will try to keep that too."

Completely agree with that statement, in fact I think you have just hit on a point that is worth noting. I am a gameplay / A.I. engineer that is my primary focus and what I spend my time doing, when I implemented localisation for the first time (many moons ago) I knew nothing about languages and even today I would never consider myself knowledgable about another language (I have enough problems with just English). But I have experience (through lots of hair pulling) on how to implement localisation in games, this is not an uncommon scenario in the games industry, the people implementing localisation are not language experts we are game play or GUI engineers and our primary focus has always been game play not necessarily translations.

In regards to the parameters, If I had the following string:

"Hello {s-Name}, Are you {n-Age}?"

and the translators returned me any of the following, it would cause problems with the text parsing:

"{n-Age} are you?"

"{n-Age} are you? Hello {s-Name}"

"Hello {s-Name}, Are you {n-Age}? {s-Name}"

It's not that we are trying to make your life a pain, it's just the way text is parsed and handled from a programming standpoint. That said however the original posting of this article on http://www.altdevblogaday.com/2012/06/27/localization-pipeline/ has some interesting comments regarding the text parsing that might potentially solve this problem.

"There are more complicated examples, but I just wanted to let you know taht sometimes we are asked impossible tasks."

It might help you to know that game engineers always set impossible tasks for everyone, it's how we strive to build better games. :)

Take it easy,
Michael.


none
 
Comment: