24 Nov 2009
I'm just back from the 31st Translating and the Computer conference in London and have lots of exciting tools and tricks to share with the Meedan translation community.
Safe to say, the conference brings together a fantastic array of people including translators, language service providers (LSPs), translation software developers, researchers and government agencies. There was a big showing too from the EU Commission and NATO.
Although I was speaking on a crowdsourcing panel that raised some fascinating points of debate (ironic, you say - a panel of experts on crowdsourcing!), my main purpose was to listen and learn. So here's what I found out.
MyMemory: Open Access Translation Memory
Who built it? It was built by a Language Service Provider called Translated.
How big is it? MyMemory reports that it is the largest memory in the world with over 230 million professionally translated segments and over 5 billion words.
How can I use it to translate between Arabic and English? MyMemory allows you to explore previous translations of a word, phrase or sentence. That means that, rather than relying on Machine Translation or dictionaries, you can see how a phrase has been translated in context by other translators.
The tool has a clean interface that resembles the Google search bar. Paste in the text you want to search for, choose the appropriate language pair and click search.
The results will show Machine Translation output at the top and Translation Memory matches below.
You can see that with each TM match, MyMemory gives you a source, the date it was uploaded to the memory and a possible rating out of five stars.
When logged in you can vote for these translation matches, delete wrong alignments and contribute new translations.
The site plans to make money by giving its freelance translators greater exposure for translation work in the future.
You can also download your contributions in the form of a translation memory TMX file. I think clearer information about the property rights governing this memory would be helpful though. This is not, however, an open access data memory like the MeedanMemory.
Linguee: The Web as a Dictionary
What is it? Linguee is also an open access translation memory tool on the web.
How does it work? Linguee uses the 'web as a dictionary'. It makes use of available resources on the bilingual web in addition to patent translations and EU documents to provide translation memory matches for phrases in context.
What I find so interesting about this is what it says about the future of translation. I already use Google searches as a way to check a translation or an unfamiliar phrase in context. Similarly, Google Wave is using common search pairs to provide a 'clever' spell checker that is aware of collocation. In other words, I like the idea that the web - not just in the archive of material it contains but also in the way it is used for search - could provide the ultimate translation memory or dictionary of the future.
How useful is Linguee in the Arabic-English space? Currently it's not. It is only available in German-English right now. But it's one to watch.
It shows again how a simple search interface with automated phrase suggestion, now so familiar to us from years of Google searching, can provide a compelling user experience for translators and language learners. Results are ordered by phrase frequency, and TM matches present sourcing and user rating information. This approach is simple but effective, and should help guide Meedan translation thinking for the future as we explore ways to integrate TM matches in comment and article translations.
Translation Memories: What Works and What Doesn't
An exciting second day panel discussion led by Reinhard Schaler of the Center for Next Generation Localisation at the University of Limerick, Ireland, had every delegate in the room giving their opinion on how best to use translation memories.
A few consistent themes came out which should help drive our understanding of the role translation memories should play.
From the point of view of translators using TMs to increase efficiency, there was a strong consensus that the cleanliness and accuracy of TM data is more important than its size. TMs had to be maintained and kept consistent, delegates said.
Some specialists even suggested that TMs have a shelf life of 'two years max' and must be geared to a specialist area.
Interestingly, this view was broadly replicated by Statistical Machine Translation expert Kirti Vashee. Speaking on a pre-recorded video, he said good data was more important than more data for training machine translation.
There was also a view that translators should continue to think critically and not depend too much on translation memories for their work.
Worldwide Lexicon: Crowdsourced Translations for Any Web Page
Behind the scenes there was quite a bit of discussion of Worldwide Lexicon - an open source translation memory which, through a Firefox plugin, let's you contribute translations to any web page.
I was particularly aware of this tool having spent some time talking with Brian McConnell at the Open Translation Tools summit in Amsterdam in June. Meedan is consistently supportive of the work WWL is doing, which is why we've uploaded our MeedanMemory to WWL.
What does it do? WWL is a translation memory that lets you contribute and share translations for any web page.
How does it work? You can download the WWL Firefox plugin and begin submitting translations of web pages. It renders any web page in the language of your choice, through a combination of Machine Translation output and Translation Memory matches. You can edit and rate translations on the page - and all from the root url. So you don't have to share around multiple urls for different translated pages.
How could it be applied? During a discussion of crowdsourcing, it became apparent that some of the government bodies at the conference were unable to translate as much as they would like due to resource constraints. For example, the European Commission is unable to translate all its web content into all the languages of the EU. WWL could help resolve this, and create a more streamlined url architecture to boot. WWL could also be useful for dialogue - helping Meedan's translation community discover texts that people want translating and sharing translations more widely around the web.
The only problem is this all hinges on the firefox plugin. Most Arabic users have Internet Explorer.