16 Dec 2008
Below is a reproduction of the article that appeared in the December 2008 Year in Ideas issue of the New York Times Magazine. We're delighted to see Meedan in print! You can read a version of this article in Arabic using the IBM Transbrowser.
By Jim Giles - It’s a lovely idea: a social-networking site with automatic translation bolted on. That’s Meedan, a gathering place for English and Arabic speakers who want to exchange thoughts on Middle East issues. Comments are translated automatically and instantly; Nebraska can now chat with Nablus.
But consider a cautionary tale. A group of Israeli journalists prepared for a visit to the Netherlands last year by sending questions via e-mail to the Dutch Foreign Ministry. After online translation from Hebrew to English, the first question reportedly read: “The mother your visit in Israel is a sleep to the favor or to the bed your mind on the conflict are Israeli Palestinian and on relational Israel Holland.” That’s the problem with translation Web sites: they tend to be funnier than they are useful.
Perhaps that is an unfair comparison. Maybe there is something especially challenging about Hebrew-to-English translation. Well, there is — but it is a problem that will also affect Meedan. Computers are taught to translate using pairs of documents that have already been translated by humans. The more pairs a computer studies, the more it learns. Large textual corpora are available for some languages. English-French and English-German software thus does a reasonably good job. For language combinations where the pool of parallel texts is smaller, like English-Hebrew and English-Arabic, the challenge is greater.
English-to-Arabic translation has its own quirks. Arabic is a “head initial” language: verbs are often placed at the beginning of a sentence. Subject and object usually follow, although not always in that order; Arabic speakers use context and meaning to decide which is which, but for a computer it is not so easy. A literal translation from Arabic might read something like “chase dog ball.” It’s obvious to a human that the writer is talking about a dog chasing the ball, but a computer could just as well have the ball chase the dog.
Computers can also be stumped by the ambiguities in written Arabic. Take the word kataba, which means “he wrote.” This is normally written as ktb, as are the words kutiba (it was written) and kutub (books). It is usually straightforward for a reader to judge which meaning is intended, but humans do so by drawing on something that computers lack — an understanding of the meaning of the text.
Given those hurdles, the translations on Meedan are surprisingly good. One Arabic speaker asked a question that appeared as: “Has Pakistan has [sic] become a hotbed of terrorist groups?” Not bad, considering that the original question included the word martaan, an expression that literally means “fertile ground.” Context-dependent meanings are the bane of translation engines, but Meedan translated it appropriately as “hotbed.”
Meedan’s software, which was developed by I.B.M., can dodge many problems because it has had plenty of practice. The company hired some 20 professional translators to create a collection of English-Arabic parallel documents containing more than half a million words. The Meedan software would have seen martaan translated as “hotbed” in news articles about terrorism within the corpus. Since the word “terrorist” appeared in the user’s question, the software could guess the context and choose the appropriate, nonliteral translation.
The site’s translators will monitor activity, so that when the computer slips up, they can adjust the translation. So can users; each human-made change will also be noted by the I.B.M. software and, at least in theory, the error will become less likely to occur again.
Will Meedan live up to its Arabic meaning of “gathering place”? It depends on how you look at it. The translations are certainly not perfect. But translations need only be good enough to satisfy those using them, says Jennifer DeCamp, a machine-translation expert at the Mitre Corporation in McLean, Va. Meedan, she predicts, will attract users committed enough to live with a little clumsy language.
“Languages play a huge role in putting barriers between groups of people,” says Stuart Shieber, a computational linguist at Harvard University. “The question is not whether [Meedan] is ideal, it’s whether it’s better to have it than to not.”
But when the subject is Middle East politics, even a minor misunderstanding can tip polite debate into angry argument. As with any dispute, language matters. Terrorist or freedom fighter? Martyr or murderer? Human editors and translators often wrestle with such terminology, so it is not hard to imagine a clumsy computer translation sparking an ugly — and unnecessary — row. Meedan’s software will have to be good enough to avoid that, or users might decide they were better off living with the language barrier.
Jim Giles is a freelance writer based in San Francisco.