Another question from Quora:
How far are we from an accurate machine language translation service?
As a professional translator, this is a topic that interests me greatly. I agree with Steven’s assessment of what a machine should be able to do, and I believe all of us in the linguistic field believe that machines will be able to do this. The question remains: when?
It is amusing to go back and review science fiction in the past 60 years, which, since the advent of computers, has always believed that we are on the cusp of this breakthrough. “Within a decade” seems to be a common response, but it has been wrong every time and will continue to be for the foreseeable future. Advances like IBM’s Watson are encouraging, and show that a computer that is well trained and given copious amounts of data can decipher “what” we mean in most cases. This is half of the battle for translation.
The other half is correctly translating into a given context, and as Steven shrewdly points out, even humans cannot do it properly every time. However, we are good at putting ourselves in the user’s context and deciding to change English units to metric, fixing mailing addresses, and even explaining terms that make no sense in another country. For instance, imagine you are a rental car company in the US: how useful would it be for me to tell you that I have 26 points on my driving record in Italy? Points are good in Italy, with a maximum of 30. You often need a translator or a native to point such things out.
One thing I often tell translators is that the key to translating is not just what you know, but being able to see that you don’t know something. Idioms often make some sense when translated literally, but it takes someone well versed in a language and in their own abilities to identify a phrase that might have a second or third meaning. In these cases, translators research in specialized glossaries, search for examples in articles and search engines, etc. Computer architects will need to teach machines this same skill of double-checking their work.
An interesting circumvention of typical thought on translation is Google’s online translator. In large part, it works like any other translator. However, Google is also trying to gather proper translations (from humans) for everything in every language. For instance, recently it acquired rights to the European patent catalog. Using such information, Google continually improves its translator with the hopes of one day offering translations based on what it “knows” is correct. Even this has its limitations and seems a ways off. Notwithstanding, it does show, of course, that lots of computer power and human intellect is trying to tackle the problem. Ask Google, and its engineers might tell you they’ll be there “within a decade” but we all know this is unlikely.
When will machines be able to translate for us? For getting the gist of something, online translators are already there. They will be much better in 10 years time, and perhaps good enough for many more common uses. But to do a professional-quality translation, where we truly rely on the computer: that might take a lifetime.
I tested the Google translation tool on a Japanese patent and the results were pretty dismal.
My article is here if you want to read it:
http://patenttranslator.wordpress.com/2010/08/29/a-short-test-of-the-google-translate-function-on-a-pct-patent-application-published-in-japanese-on-the-wipo-website/
I think that the machine translation tool available on the Japan Patent Office website for Japanese patent applications published after 1994 is better than Google translate, although it is not very good either of course, otherwise I would be out of business by now.
Thank you for the link to your blog; it’s wonderful. I’ve done tests of my own with Google Translate and like you have found some very strange occurrences I can only believe are derived from the statistical translation model. I would bet in the future spammers will purposefully find ways to translate common translations into nonsense and ads, which could derail this model. They already do it with search results, so there is no stopping them from infecting the translation system by gaming Google’s algorithm.