[Mailman-Users] Dealing with multiple charsets (list messages and web archive)

Sat May 10 21:41:42 CEST 2008

* Mark Sapiro <mark at msapiro.net> wrote:
> Stefan Förster wrote:
>> I've read
>> 
>> http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.039.htp
>> 
>> I still would like to do charset conversion on MIME encapsulated
>> messages. We do our best not to let HTML message trhough to the list
>> address, so we will only deal with text/plain and various binary
>> content types.
> 
> What Mailman version are you using?

2.1.9

> Do you have problems with messages which are not multipart?

If message headers contain unencoded special chars. I have just
applied additional rules in our content filter to iron that out, it's
not really mailmans fault.

> Beginning in Mailman 2.1.6, a single part text/plain message should be
> coerced to Unicode, have headers and footers added as Unicode and then
> coerced back to the character set of the list (or of the incoming
> message if that fails).

This behaviour leaves me with UTF-8 as my only choice, because the
default encoding for "German" can't even represent a currency sign
properly.

> Also, have you tried setting Non-digest options -> scrub_nondigest to
> Yes? This may not be satisfactory for you, but if it is, it may help
> with some of your other issues.

After converting the message catalogues (messages/de/mailman.po) and
the template files (templates/de/*.{html,txt}) as described in
README-I18N.en and migrating the lists in question, rebuilding the web
archives etc., I've switched that option on.

The result is somewhat unexpected: The message translation for

'-------------- next part --------------'

is

'-------------- n"achster Teil --------------'

where the '"a" represents an "Umlaut", "ä". This "ä" gets encoded
wrongly in the mail sent out, typcial "two chars" error ("Â~" or
soemthing like that). Interstingly, the next line talking about binary
data which got cut out, which does _also_ include an "ä" gets encoded
correctly. Can you give me any hint on this? Replacement is done in
Scrubber.py, line 392 here:

        # Now join the text and set the payload
        sep = _('-------------- next part --------------\n')
        replace_payload_by_text(msg, sep.join(text), charset)

Do you think there might be other occurances of wrongly encoded
strings read in from the message catalogue or the templates?

Ciao
Stefan

P.S: I can upgrade from 2.1.9 if you think that it will make debugging
easier.
-- 
Stefan Förster     http://www.incertum.net/     Public Key: 0xBBE2A9E9
Am Anfang war sie mein Mäuschen, aber langsam werden die Tiere immer größer.