Lost in Translation
Wojciech Gryc writing in the official iSummit blog:
The session itself was a series of presentations, with Chris Salzberg moderating and focusing on the difficulties of working in a multilingual web. Leonard Chien presented on Chinese translation on the Global Voices site, and Hanako Tokita did the same for Japanese. Kyo Kaguera talked about his QRedit tool for translating English texts to Japanese, and Mohammad Daoud presented how machine translation is being used by the Digital Silk Road (DSR) project. Shun Fukazawa finished with a presentation of the Japanese Wikipedia and Wikia sites.
While statistics are difficult to get, it appears that less than a third of the web’s users use English as a first language, and only a third of all websites are in English. Unfortunately, building a multilingual web is more complex than simply using an automated translation service. Computers have yet to understand local contexts, cultural references, and do not have a proper grasp of grammar.”
Translation is extremely difficult, especially in a distributed context. For example, when translating from English to Chinese, one has to decide whether Traditional or Simplified Chinese will be used. Furthermore, a volunteer from Taiwan may use different characters or metaphors to describe events than a volunteer from Beijing. As such, volunteer management is often more structured and complex than one would initially assume. User interfaces are a key challenge for effective translation. While open source packages like MediaWiki and WordPress have support for multilingual interfaces or posts, the translation process itself requires more than just a dictionary and somewhere to type. Translators often use sites like Wikipedia or extensively search the web to understand the cultural contexts of certain sayings and metaphors. For example, how would you translate “It was raining cats and dogs” into Swahili — and more importantly, how would an African volunteer figure out that this is a metaphor for “heavy rain”? This brings up a difficult problem in user interface design: How can sotware developers build user interfaces for translation that not only provide support for using dictionaries and typing text, but actually help search for the meanings of analogies, supported fonts, verb conjugations, and other language-specific features? For example, how would a translator converting an English text to Japanese know when to use a formal (polite) or informal verb conjugation, when the original writer never even had to consider such a choice? Luckily, tools like QRedit are already trying to solve the problem.
And now Google has stepped into the fray with their newest addition to the family; the Google Translation Centre. Blogoscoped reports that:”…this is meant to be a translation service which offers both volunteers and professional translators… and I suppose at least the professionals will want to get paid.”
Here’s what’s printed as a description on the service’s frontpage:
Request translations and find translators
Upload your document and request translations into over 40 languages.
Translate and review translated documents
Create and review content in your language through Google’s free, easy-to-use, online translation tools.
Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. Statistical MT works by comparing large numbers of parallel texts that have been translated between languages and from these learns which words and phrases usually map to others — similar to the way humans acquire language. The problem with statistical MT is that it requires a large number of directly translated sentences. These are hard to find, and because of this SMT systems use sources like the proceedings from the European Parliament, United Nations, etc. Which are fine if you’re writing in bureaucrat-speak, but aren’t so great for other texts. Google Translation Center is a straightforward and very clever way to gather a large corpus of parallel texts to train its machine translation systems.
For freelancers, GTC could be very good news; they could work directly with clients and have access to high quality productivity tools. Overall this is a welcome move that will force service providers to focus on quality, while Google, which is competent at software, can focus on building tools.
I do hope the Google releases an API for this so that it can be implemented withing other projects too. We at Pratham Books would certainly be interested given that we’d like to publish in more languages than we currently do and finding translators and reviewers and managing that process is very difficult.
This sort of article is fascinating and enjoyable to peruse. I cherish perusing and I am continually hunting down useful data like this.