How far are we from an accurate machine language translation service?

Another question from Quora:

How far are we from an accurate machine language translation service?

As a professional translator, this is a topic that interests me greatly. I agree with Steven’s assessment of what a machine should be able to do, and I believe all of us in the linguistic field believe that machines will be able to do this. The question remains: when?

It is amusing to go back and review science fiction in the past 60 years, which, since the advent of computers, has always believed that we are on the cusp of this breakthrough. “Within a decade” seems to be a common response, but it has been wrong every time and will continue to be for the foreseeable future. Advances like IBM’s Watson are encouraging, and show that a computer that is well trained and given copious amounts of data can decipher “what” we mean in most cases. This is half of the battle for translation.

The other half is correctly translating into a given context, and as Steven shrewdly points out, even humans cannot do it properly every time. However, we are good at putting ourselves in the user’s context and deciding to change English units to metric, fixing mailing addresses, and even explaining terms that make no sense in another country. For instance, imagine you are a rental car company in the US: how useful would it be for me to tell you that I have 26 points on my driving record in Italy? Points are good in Italy, with a maximum of 30. You often need a translator or a native to point such things out.

One thing I often tell translators is that the key to translating is not just what you know, but being able to see that you don’t know something. Idioms often make some sense when translated literally, but it takes someone well versed in a language and in their own abilities to identify a phrase that might have a second or third meaning. In these cases, translators research in specialized glossaries, search for examples in articles and search engines, etc. Computer architects will need to teach machines this same skill of double-checking their work.

An interesting circumvention of typical thought on translation is Google’s online translator. In large part, it works like any other translator. However, Google is also trying to gather proper translations (from humans) for everything in every language. For instance, recently it acquired rights to the European patent catalog. Using such information, Google continually improves its translator with the hopes of one day offering translations based on what it “knows” is correct. Even this has its limitations and seems a ways off. Notwithstanding, it does show, of course, that lots of computer power and human intellect is trying to tackle the problem. Ask Google, and its engineers might tell you they’ll be there “within a decade” but we all know this is unlikely.

When will machines be able to translate for us? For getting the gist of something, online translators are already there. They will be much better in 10 years time, and perhaps good enough for many more common uses. But to do a professional-quality translation, where we truly rely on the computer: that might take a lifetime.

How do two cultures with different languages learn to communicate at first contact?

Original question from Quora:

How do two cultures with extremely divergent languages learn to communicate with each other at first contact?

My thesis work was in Latin American studies, and as such I read a lot of the “first encounter” diaries and biographies that were written. The answer is actually surprisingly easy: CHILDREN.

Columbus, Cortez, and likely many of the other explorers kidnapped young children or took them on as servants. As children learn language incredibly quickly, it was not long before they could work as interpreters for the Spanish. By the time of the second voyage to the Americas, Columbus had fluent interpreters.

Other accounts of first meetings between the Spanish and the Native Americans show similar patterns that work from there. The Indian tongues could be divided into several groups, but even if you had an interpreter from one Indian language, that person could communicate with other tribes not too far off, much the way Spaniards and Italians could (and still can) communicate even though they don’t speak exactly the same language.

When no interpreter was available, they were reduced to drawing pictures and using gestures, just as you would imagine. However, language is learned rather quickly, especially when you have a child’s brain to help you!


Google gets more translations to ponder from European Patents

Via Google in Translation Pact for European Patents – ABC News.

Google said Thursday it has reached an agreement with European patent authorities to use its online technology to translate some 50 million patents.

Mountain View, California-based Google will gain access to all the translated patents – more than 1.5 million documents and 50 000 new patents each year – which will help improve its machine translation technology. Moreover, it will also deal with the growing amount of technology-related information in Japanese, Chinese, Korean and Russian.

It’s no secret that Google’s ambition for cataloging the world’s information encompasses every language. Through many efforts, including its translator toolkit, Google has been gathering raw data on translations from professional translators. Since teaching a computer grammar and syntax logic has not brought new gains, the best approach seems to be to mimic. That is, if Google’s computers can see enough examples of proper translations done by professional translators, eventually the computer can simply cut and paste phrases and put it all together.

As a patent translator, I wouldn’t be scared by this just yet. Google’s machine translation still has a long way to go before it can truly understand us mere mortals. Take a look at the following machine translation for a shipping product in Apple’s iPhone App Store.

Don't hire Google Translate to do your App Store marketing

For those who don’t read French, it says “Now available in unemployment insurance French!”

<joke>Insert French unemployment joke here</joke>

Did your mother name you “At the time”?


A birth certificate in Hindi
They don't call it a name certificate for a reason.

Most blogs begin with an explanation about why the blogger began his blogging. This one will leave that subject to mystery (or privacy, perhaps). What I prefer to talk about is how difficult it is to come up with a name.

Don’t name your child

Just today I was helping someone translate a birth certificate from Hindi into English. Since Hindi isn’t a language I specialize in, or know even a single word of (is paneer Hindi?), I partner with a trusted translator in India I’ve worked with for the past several years. When he sent it back to me, the baby’s name was written as Tatsamay. That wasn’t the client’s name, and as it turns out, this is simply Hindi for “At the time” — meaning, the baby’s name had not yet been decided at the time of birth.

Imagine that: nine months to think up a name and…nothing. (It could be laziness, but likely it is just the tradition in India to leave the naming of child until later.)

Don’t name your blog

It seems fairly auspicious to begin a blog on the same day with the wise and ancient advice that a name is better left to the future, when you know something about the baby blog being born. After all, if Indian tradition doesn’t know about blogging, I don’t know who does.

So, let’s leave it at that. Language and names are very important, too important even for a blog about language.