07 Mar 2012
Think about the Arab Spring and you probably think about citizen media. Syrians, Egyptians, Libyans, Bahrainis, Tunisians have not just been taking to the streets over the past year, but documenting their experiences in text, image and video - even building whole new social movements with a digital dimension. Has there ever been a historic moment of this scale unfold before our eyes through new media publishing tools?
Surely this is inspiring to the rest of us who are not in the Middle East. More than ever before, an American from Atlanta can listen to, learn from and connect with an Egyptian in Alexandria on the frontline in the pursuit of empowerment and social justice. Imagine if that opportunity had been present in the early years of the new millennium, the invasion of Iraq looming.
But there is still a big roadblock to the Atlantan and the Alexandrian actually communicating: language. Language remains a primary barrier to information flow despite the incredible burgeoning of social media and the growth of automated translation techniques using statistical models trained on vast corpora of translation strings. This is particularly true for global society's ability to listen to actors from the new Middle East.
Why Arabic Conversation is Hard to Process using Automated Translation
The range of vernaculars that comprise Arabic is much more varied than the range of vernaculars that are typically considered to comprise European languages like English and French, and this makes the linguistic environment much more fluid and harder to translate using automated techniques (aka, Machine Translation or MT).
The national standard vernaculars of English and French were disseminated through the printing press from the 16th Century onwards, whereas in the Middle East the printing press took hold much later (as did nation states following decolonization). Arabic is also associated with the Qur'an.
The result of this is that various colloquial vernaculars - Egyptian, Syrian, Iraqi, Palestinian etc. - are considered 'dialects' of Arabic in the region and not national languages. So children in Arab countries learn a 'modern standard' Arabic at school for writing purposes which is derived from the classical. MT is generally trained on the modern standard, rather than the local vernaculars.
But increasingly Arabic speakers are using their spoken Arabic variants on Twitter and Facbeook, with a huge mix of language in between. So it is that Google Translate - clearly one of the most powerful translation engines around - could produce the following English output of @Alaa's Egyptian vernacular:
In my opinion means that the de Assar is the Old Mstqsd Nawara Bebat and hit people, Bagmana Eil and rumors bouncing Barog asshole violin not know Ihpkha @ nawaranegm
Machine translation does not suffice. This is probably true for conversational language anywhere right now, but it is particularly true for Arabic.
The Arabic Web is Growing Fast, just as Advances in Translation are Slowing
Automated translation techniques are slowing. Google has admitted that the data-driven statistical approach to MT (an approach that was first pioneered by the late IBM scientist and former Meedan advisor Fred Jelinek in the early 1990s) seem to offer a decreasing return on investment. And yet, in the aftermath of the Arab Spring, Arabic is flourishing as a language on the world stage.
There are already 60million Arabic speakers online in the Middle East, and at over 2000%, internet penetration growth is one of the highest rates in the world. Google is predicting the number of Arabic-speaking web users to grow by 50% by 2013. That means more content, more commerce and more knowledge exchange. For example, there were 30,000 tweets published per day in Arabic in July 2010 compared to a staggering 2 million per day by October 2011.
All else being equal, this internet growth will fuel the growing market for devices in the region, with PC market set to triple between 2010 and 2015, according to Intel. Mobile devices are particularly popular, suggesting a trend towards active conversation and reporting on the go. For news makers, media companies and the wider global public, engaging with this increasingly active linguistic community is going to be critical.
It also works the other way - the Arabic web accounts for under 2% of total web content, yet Arabic is the fifth most widely spoken language in the world. Arabic speakers are increasingly demanding new opportunities to Arabize knowledge online. For example, a host of Egyptian universities are encouraging their students to develop the Arabic Wikipedia from its current base of 150,000 articles. One popular method will be to translate from the English language Wikipedia which contains 3.8 million articles. For Arabic speakers who do not speak other languages - which may be the majority of those online - the experience of the web is very different from the average English speaker. Translation is a compelling method for addressing this.
What Role for Translation in Transforming the Global Web
Translation is a powerful vehicle for bridging these barriers. Translation enables citizens to access previously unreachable information whilst expressing themselves on a level with interlocutors in other languages. It helps to network communities of practice working on critical social causes around the world, such as environmentalists, teaching groups, and human rights advocates, and it provides a space for new more equitable forms of knowledge to be forged.
In its 2009 Arab Knowledge Report, the Mohammed bin Rashid Al Maktoum (MBR) Foundation argued that translation “contributes to the development of indigenized intellectual production and opens it to the possibility of looking at phenomena and reality from new angles”. Translation should be promoted, it suggests, not just as a way to Arabize knowledge, but to encourage new knowledge to be created that could 'revitalize Arab thought’.
The Arab world publishes a small volume of education content in translation relative to other regions. According to UNESCO, the number of books currently translated into Arabic is roughly five books for every million Arab people, compared with 920 books per million people translated into Spanish in Spain. But this could be connected to a lower publishing output in general.
The web offers an unprecedented opportunity to break this cycle precisely because uptake is growing so fast. In the context of such profound social and political change, and with a large youth bulge of 'digital natives', the Arab world can innovate uses of social media that are not seen as primary to other regions. Translation is one such opportunity. With strong links to the United States and Europe, there is a growing body of graduates in the region who are more linguistically versatile than their counterparts in some English-language speaking countries. These communities could not just help to bring critical information into the Arabic public sphere, as the Cairo Wikipedia pilot is doing, but also help broadcast to the world the social media pulse of the Arab Spring generation - whether that be activists documenting human rights abuses or diverse citizen voices debating government legislation.
Translating Tweets: Meedan's Vision for a New Media Translation Workbench
Meedan's core mission is to enable speakers of different languages to share information and conversation online. Our first product release in 2008, news.meedan.net, was an attempt to enable that exchange to take place between Arabic and English using a combination of machine and human translation, all surfacing through a nifty cross-language interface.
We are now developing an early prototype for a new media translation workbench focused on Twitter initially. Our goal is to make it fun to translate tweets and share them back into Twitter - as fun as writing an original tweet or sharing a link.
Many services are using machines to translate tweets. We want to enable Twitter users to translate tweets. This is particularly timely because the Google Translate API is now behind a paywall and so unavailable to the majority of Twitter users.
Thanks to its open publishing model, its hashtags, lists and follow features, Twitter is the perhaps best network in the world for exposing its users to content outside their language. An American can follow leading Arabic-speaking activists in the Middle East in minutes. The challenge now is to provide human-generated translations to mediate these feeds and conversations.
Twitter users are already engaging in ad hoc translation, particularly from the Middle East. At Meedan, we have also tested the water in pilots for Twitter translation workflow using HootSuite and Curated.by, and through the brilliant Alive.in/Egypt translation project where volunteers translated hundreds of pieces of testimony recorded by Egyptian citizens using Speak2Tweet during the revolution. Global Voices are also very active in this area. From these experiences we know that the translation process needs to be fun, interactive, cause-driven, and built upon Twitter as much as possible for the kudos that brings.
We envisage this work as an integral piece of CheckDesk - an open source fact-checking workbench for global new media which we are developing through a Sida funded project. We are also enlisting a talented group of advisers and the nonprofit IT support company TechSoup Global as testing and distribution partner.
What would you Need to Translate a Tweet?
My goal with this lengthy post was to explain why social media translation is so relevant, particularly for the Middle East, and to look briefly at some early plans at Meedan for tackling the problem.
In the spirit of open development, perhaps you could help? Here are some tricky questions that come to mind when translating from Twitter - perhaps you have some ideas on how to solve them....
how would you represent a translated tweet? How would you represent the translator and the author in this tweet?
how would you inform the original author of the translation?
how would you help the author of the original tweet check the translation, and seek redress if not up to scratch?
how would you mitigate against abuse by unscrupulous translators with an agenda?
how would you translate hashtags?
would you expect urls in translation? how would you know that a url had been translated too?
what incentives would you give translators for taking part?
how would you represent the direction of translation in a tweet? (eg. English to Arabic or Arabic to English)
who would you like to translate your tweets?
Thanks for reading. -George