[Mailman-Developers] Translation in E-mail

Les Niles les@2pi.org
Wed Nov 6 14:07:13 2002


Last year I hacked a simple translating filter into Mailman's
delivery pipeline.  The underlying engine was AltaVista's
babelfish.  It didn't have all the user-definable bells and
whistles you mention, but did have an entertaining twist: Most
messages went through untranslated, but those from selected senders
were occassionally translated into a randomly-selected language.
This was by design, intended for amusement purposes.  It got only a
very limited deployment. :)

I'd be happy to send the code, but am not sure how useful it would
be since most of what you need to implement has to do with adding
user- and list-specific options, which my implementation didn't
deal with at all.  Adding such options is pretty easy, though,
because of the nice design of Mailman.

One thing you don't mention, but which I think you'll need to deal
with, is identifying the source language of each message.  That'll
be needed unless senders will register the language in which
they'll be posting and can be counted on to always post in that
language.  I've played around a bit with text-based language
identification.  The algorithms do pretty well, even on messages
just a sentence or two in length, but the error rates are on the
order of several percent.  (I was looking at distinguishing between
a dozen or so common European languages.)  Occassionally mis-
identifying the source language will add an element of randomness
to the whole process, which from my experience last year I can tell
you can be somewhat amusing.

Another complication will be dealing with HTML-encoded email.  If
the translating engine itself can handle HTML, great, otherwise
you'll probably need to strip or convert the HTML parts, to just
translate and produce plain text.

  -les

On Tue, 5 Nov 2002 16:22:14 +0100 (CET) "Rafael Cordones Marcos" <rcm@sasaska.net> wrote:
>Hi folks,
>
>I have been searching the Mailman website and have not found any reference
>to what I am looking for, i.e. translating e-mail. I am not talking about
>translating the Mailman software or documentation (i18n) but translating
>the e-mails that people send to the mailing list.
>
>I am looking for the following functionality in a mailing list software:
>
> - As an administrator:
>
>    - set up some default machine translation (MT) engine.
>      For example, I want to use Babelfish for translating between English
>and Spanish but I wat to use some other engine for
>      translating between German and Swedish.
>
>    - specify the subject of the mailing list and thus improving the
>      translation results. For instance, I can specify that the
>      mailing list is about "Technical -> Operating Systems -> Minix"
>(just kidding).
>
> - As a user:
>
>    - I want to subscribe to the mailing list and specify my
>      language (natural) language preferences. Say I want specify
>      {Catalan, Spanish, German}.
>
>    - e-mail going to the list gets sent to a machine translation
>      before it gets sent to me. I can optionaly say to the
>      mailing list software that I want to receive also the original message.
>
>    - If I do not like the translation of an e-mail I can choose some
>      other machine translation engine. I may decide that some other
>translation engine does a better job or even pay to get
>      get better translations. Maybe the administrator forgot to
>      specify a machine translation engine for Euskera<->Farsi and I know
>of one that just started operating.
>
>I would appreciate any information you may have on this topic. Maybe
>somebody has started implementing this in Mailman or is already
>implemented? Send me any comments, problems you see on the aforementioned
>working schema,  flames, ...
>
>Thanks a lot for your time!
>
>Rafa



More information about the Mailman-Developers mailing list