04 Apr 2012
Our English-speaking ears pricked up earlier this morning, as we overheard a strange and troubling conversation taking place between two regulars in one of our favorite haunts:
A:"And the idea to pass through us state security and change its name to the national security state security roles ended most bureaucratic de gain important o need protection"
B:"De you keep optimistic industry"
A:"Roles of bureaucracy, Red Hat is an example of security approval bitalb place before appointment?"
B:"Less its Pilgrim ... Hat one bikodm mabikodmsh military college and its answer file security ... Le son aunt Uncle father entered before keda. section. Bitrvd"
Who is trying to pass US State Security? Who or what are Red Hat and Pilgrim? What kind of code are "bikodm mabikodmsh" and "bitalb"?
After a few hours of speculation over whether Michael Cera was involved in some sort of national conspiracy, we asked our Arabic-speaking cousin Abu Meedan to intervene. One glance at the source and he reliably reassured us that we had unwittingly witnessed a fascinating discussion on the changes to Egypt's State Security apparatus post-uprising:
The big news here is that, quietly and to a "handful of users" Twitter has rolled out an auto-translate feature for testing. The feature allows users to read a translation of any tweet in "your" language (presumably the one you choose for your Twitter interface) courtesy of Bing's translation API (also adopted recently by Facebook - a topic for future rumination).
At Meedan, we love Twitter. We love translation. And we love translating Twitter. On one level, it's exciting that Twitter are acknowledging the importance of translation in their architecture. This is not before time, as the number of Tweets in English has declined from almost two thirds in 2009 to 39% today. But as the example above proves, the model of pure Machine Translation is a problematic one for the social web: MT renders vernacular almost completely meaningless. ( N.B. This is certainly not a problem confined to Bing, and this post is not written to highlight Bing's failinfs here, but rather to emphasise the problem with Machine Translation itself and the richness of vernacular language; Google's translation renders arguably worse results: "less Haaajh .. Hat one Baiqdm in a war college Mabaiqdamish security and file his decision to answer ... If cousin with his father entered the section before this .. Batervd" and Meedan: less حاااجة ... Give me one, the military مابيقدمش Security file and bring his decision ... If the son of son's father entered the department ... بيترفض )
This is particularly the case in Arabic, where MT is trained to a standard form almost never found in use on Twitter or Facebook, where users talk in dialect and frequently use a Latin-alphabet version of dialect, but it's also an issue for other languages too.
To translate Twitter in a meaningful way, then, we need to look beyond pure Machine Translation to a model that makes translating tweets fun, scaleable and rewarding. Here's why we think it's important, and here's how we're thinking about how to do it.
Do you translate tweets? Have you ever been browsing Twitter and wanted a translation? We'd love to hear from you and love to talk about these important ideas. Do comment on this post, or on our Knight Foundation application for Translatedesk, and send us a tweet @meedan!