The Meedan Blog Archive

Red Hat calling @Pilgrim: Bing can't do it alone

Our English-speaking ears pricked up earlier this morning, as we overheard a strange and troubling conversation taking place between two regulars in one of our favorite haunts:

A:"And the idea to pass through us state security and change its name to the national security state security roles ended most bureaucratic de gain important o need protection"

B:"De you keep optimistic industry"

A:"Roles of bureaucracy, Red Hat is an example of security approval bitalb place before appointment?"

B:"Less its Pilgrim ... Hat one bikodm mabikodmsh military college and its answer file security ... Le son aunt Uncle father entered before keda. section. Bitrvd"

Who is trying to pass US State Security? Who or what are Red Hat and Pilgrim? What kind of code are "bikodm mabikodmsh" and "bitalb"?

After a few hours of speculation over whether Michael Cera was involved in some sort of national conspiracy, we asked our Arabic-speaking cousin Abu Meedan to intervene. One glance at the source and he reliably reassured us that we had unwittingly witnessed a fascinating discussion on the changes to Egypt's State Security apparatus post-uprising:

The big news here is that, quietly and to a "handful of users" Twitter has rolled out an auto-translate feature for testing. The feature allows users to read a translation of any tweet in "your" language (presumably the one you choose for your Twitter interface) courtesy of Bing's translation API (also adopted recently by Facebook - a topic for future rumination).

At Meedan, we love Twitter. We love translation. And we love translating Twitter. On one level, it's exciting that Twitter are acknowledging the importance of translation in their architecture. This is not before time, as the number of Tweets in English has declined from almost two thirds in 2009 to 39% today. But as the example above proves, the model of pure Machine Translation is a problematic one for the social web: MT renders vernacular almost completely meaningless. ( N.B. This is certainly not a problem confined to Bing, and this post is not written to highlight Bing's failinfs here, but rather to emphasise the problem with Machine Translation itself and the richness of vernacular language; Google's translation renders arguably worse results: "less Haaajh .. Hat one Baiqdm in a war college Mabaiqdamish security and file his decision to answer ... If cousin with his father entered the section before this .. Batervd" and Meedan: less حاااجة ... Give me one, the military مابيقدمش Security file and bring his decision ... If the son of son's father entered the department ... بيترفض )

This is particularly the case in Arabic, where MT is trained to a standard form almost never found in use on Twitter or Facebook, where users talk in dialect and frequently use a Latin-alphabet version of dialect, but it's also an issue for other languages too.

To translate Twitter in a meaningful way, then, we need to look beyond pure Machine Translation to a model that makes translating tweets fun, scaleable and rewarding. Here's why we think it's important, and here's how we're thinking about how to do it.

Do you translate tweets? Have you ever been browsing Twitter and wanted a translation? We'd love to hear from you and love to talk about these important ideas. Do comment on this post, or on our Knight Foundation application for Translatedesk, and send us a tweet @meedan!

Comments on this post

2012-04-04 16:05:46 -0700
I'm not sure why I am one of Twitter's selected few to have access to this feature, but I can say that for the tweets I'm interested in translating it's provided almost zero help in translating beyond my at times faltering Egyptian Arabic. Thanks for drawing attention to this ya Tom!
2012-04-04 16:40:40 -0700
Fascinating post, for context the conversation was about how revolutionaries are ignoring the bureacratic aspects of oppression and so failing to defend some gains won in 2011 due to focusing only on the most graphic violations. A complex topic in a in multiple modes of language with broad assumptions about shared context, I doubt mt will ever cope
2012-04-04 17:01:27 -0700
@alaa - totally agree. The real interesting question is how HT (Human Translation) can help with computational approaches to finding, sorting, indexing content across languages. Right now our data model gets updated every six months or so - how could we fuel learning loops directly into the system?? Even so, this is useful for indexing and discovery, not for understanding. MT as a solution for deeply contextual, informal language will require ...well...revolutionary changes in automated language processing algorithms and approaches.
2012-04-04 17:05:19 -0700
Heck, we should translate the entire conversation, Alaa - it's fascinating stuff. (As would be possible with a more robust human translation platform working above Twitter and other social media.) I think what's so incredible is that Twitter thinks there is enough value in providing this MT tool - that there is enough gratification for the user to keep it there - when in fact it is spewing out total rubbish. This is clearly the case for Egyptian Arabic, and other regional forms of the language. But it is also probably true for many other languages, because as you say Twitter condenses the conversation and relies heavily on shared assumptions, elisions and informal language. Be great to have your views on our early thinking for a solution to this:
2012-04-04 20:51:57 -0700
Ed I think you are hitting the right nail here, if I come from a different language community I may want to explore what topics are "interesting" I don't have to follow the conversation itself. trying to find interesting topics on basis of popularity only is a major failure (twitter's trending topics are hardly ever insightful), instead of MT use NLP to improve discovery of trends, use MT not to translate but to find context in other forms of media. for example I'm sure there is enough in my tweets to link to news reports about Egyptian protesters storming into state security and some articles about the role of state security in academia (the specific example I used to describe how bureaucratic oppression works). even if a human translator is asked to translate this conversation they'll probably benefit from such automated contextualization (instead of embarking on lengthy research or guessing blindly at what is between and behind the lines).
2012-04-07 11:18:26 -0700
[...] First published on the Meedan blog, April 4, 2012. [...]
2012-04-19 00:30:00 -0700
What is the best online Arabic to English translator?... The engine uses for its sites is trained on more informal domain content - this said, the various Arabic dialects are quite distinct and right now there is no engine (including Meedan's) that yields satisfactory results across most user gen...