In which I explain how the word choice of developers affects translation and localization

Over the years, I have had the pleasure to work with many developers, some of them who have zealously taken it upon themselves to reinvent English grammar, design new forms of syntax and lexicon, or otherwise abuse the English language into a petrified shell of its former self. We all do this to some extent, and, in many cases, these developers have good reason to do so. However, translation and localization of anything other than the Queen’s English has its difficulties for users and translators alike.

These are some common issues we face as translators that developers are advised to keep in mind.

* As a disclaimer, these issues stem from translating English into other languages; using another source language would present yet other issues.

Word choice, not word AnythingYouLikeItToBe-ify!

Not a word, not a sentence — I care a lot about accuracy in translation, so one of the biggest show-stoppers for me is what we call the “untranslatable” phrases. Usually, this is stuff creative programmers and marketing gurus have pulled from out of their proverbial “backend,” and inevitably involves multiple words pushed together into one. Here I’m talking about things like:

  • SuperSelectiveSync
  • Re-undelete
  • Mysticgel

Unless you are Goethe, it is not recommended to combine multiple words into SuperAwesomeWords, even if you capitalize each one. Translators won’t know whether they should repeat your Germanization of the user interface, why it was done in the first place, and in some languages (Chinese, Arabic, etc.) it is not possible to do anyway. One typical resolution is to translate literally (i.e. “Sync that is very selective; Undo the delete command again”), but this often leads to confusing or just plain incorrect translations (“jelly of magic”?). This literal decoupling of words also results in very long translations that, in the English version, may have been purposefully shortened for whatever reason.

Not in the dictionary — A solid piece of advice is also to steer clear of one-word gems like these:

  • Capturize
  • Solarify

(These are real examples, by the way.) On the surface, this may seem like a non-issue to a developer, even reasonable or, dare I say, “necessary” word choices. Once we’ve returned to planet earth, however, we all realize this doesn’t make sense to anyone, including English speakers. Combine with a lack of context, and you may as well have punched the translator in the face and told him to “Suck it up, and deal with it.” The often desirable solution, equally for developers and translators, is to assume users will just “figure it out.” In my opinion, this is not a solution, and good communication with developers, and good forethought, can really help. You may just have to change the “English” (I use the term loosely here), or at least explain your particular addition to the good folks at Merriam-Webster.

Not in my recollection — A related category of phrases are those that just don’t make much sense at all—even to the developers who wrote them. It is not rare for me to have a discussion with a programmer who, months earlier, coded abominations like these:

  • The — By itself, “the” is meaningless, and doesn’t even exist in many common languages like Japanese and Russian, but developers often have this string lying around to combine with other strings, like “The 5th of December”
  • ’s — The possessive apostrophe s (“Ben’s cup”) in most languages we deal with is translated very differently (e.g. la copa de Ben).
  • to the forum! — Half a phrase is the worst kind of phrase. Unless a translator knows the other half of the string, they have trouble translating it. Even after you reveal to us what it might be, the multiple possibilities might have to be translated in completely different ways and with different word order in other languages.

Developers sometimes cannot even recall why they wrote such things, where it appears in the user interface, or what reason they had for coding such still-born, half-mutant strings. The resolutions here are more difficult. It is always preferable to use multiple strings rather than dividing them up and assuming they can be recombined just like in English.

Not my way of doing things — That said, there are other, often more difficult issues, regarding the syntax and word order of English versus other languages. English does not have word gender or cases, we write dates differently (wrong?) compared with others, we adore measuring things by the length of the king’s foot, and so on. When localizing, these issues inevitably arise. A common pitfall is to assume the English way of dealing with numbers:

1 page
{quantity} pages

Developers do this all the time because they know that, in English at least, if there is more than one “page” they have to add an ‘s’. But in a language like Russian, where nouns have both gender and case, the {quantity} will change the noun “page” in many more ways:

and so on…
(It’s the last character in each that changes.) These problems have both precedent and solutions, but the key is to become aware of them. After all, developers long before, in the decades of sin we call the 1980s and 1990s, already faced these same issues and found workarounds. Unless a developer proactively includes such internationalization solutions, we will have to discuss them after the fact and add them in like patchwork.

Not from this world — That brings us to by far the biggest “gotcha” in localization: a lack of context. Professional translators are, more often than not, given a long list of words and phrases to translate without knowing anything about them. Context is key. Just try to decipher things like:

  • Archive — Is it a verb (“Archive the file”) or a noun (View the Archive”)? In most languages, these translations would be different.
  • New — A new what? Since gender is often critical in languages other than English, translators want to know if this is a “New file…” or a “New window” so that the gender of adjectives and nouns are correct. More interestingly, in some languages a literal translation of “New” might not be the best word choice. When a “New!” message arrives in your inbox, in some languages it might be preferable to translate “Arrived”.
  • Create Day — Only God can create a day, so if you meant to say “Creation Date” you probably should write it that way, or at least specify it in the comments. Otherwise, translators facing 10,000 lines of code will quickly translate “Create” as a verb here and move on.

Developers are not wrong in asking for these strings to be translated, and they are not to blame for failing to foresee the repercussions word choice might have for dozens of other languages. However, offering a little context can aid translators in resolving these issues. Ideally, developers can help out by writing text that is very clear, which is helpful for English-speaking users anyway. They should also be adding a comment string as they write code. However, if that is too much to ask, even making your keys and placeholders descriptive (e.g. ArchiveMenu.CreateNewArchive) can go a long way to describing the text “New”.

The one true word

Although my job is generally focused on making sure translators deliver a localization project completely, accurately and on time, there are clearly many aspects out of my control. As the above attests, a few of these have to do with the original English we receive. By English, I mean both the kind we humanoids use when communicating with each other in the United States, Britain, Australia and the like, but also what developers like to call the English they use when writing strings that will appear in software, on a website, or an iPhone app.

In any localization project, the long list of random English phrases can present a real challenge. Working together, however, developers and translators can go far in making their word choices spread into more languages and regions than even the Queen’s English.