[Mailman-Users] Dealing with multiple charsets (list messages and web archive)
Stefan Förster
cite at incertum.net
Sat May 10 21:41:42 CEST 2008
* Mark Sapiro <mark at msapiro.net> wrote:
> Stefan Förster wrote:
>> I've read
>>
>> http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.039.htp
>>
>> I still would like to do charset conversion on MIME encapsulated
>> messages. We do our best not to let HTML message trhough to the list
>> address, so we will only deal with text/plain and various binary
>> content types.
>
> What Mailman version are you using?
2.1.9
> Do you have problems with messages which are not multipart?
If message headers contain unencoded special chars. I have just
applied additional rules in our content filter to iron that out, it's
not really mailmans fault.
> Beginning in Mailman 2.1.6, a single part text/plain message should be
> coerced to Unicode, have headers and footers added as Unicode and then
> coerced back to the character set of the list (or of the incoming
> message if that fails).
This behaviour leaves me with UTF-8 as my only choice, because the
default encoding for "German" can't even represent a currency sign
properly.
> Also, have you tried setting Non-digest options -> scrub_nondigest to
> Yes? This may not be satisfactory for you, but if it is, it may help
> with some of your other issues.
After converting the message catalogues (messages/de/mailman.po) and
the template files (templates/de/*.{html,txt}) as described in
README-I18N.en and migrating the lists in question, rebuilding the web
archives etc., I've switched that option on.
The result is somewhat unexpected: The message translation for
'-------------- next part --------------'
is
'-------------- n"achster Teil --------------'
where the '"a" represents an "Umlaut", "ä". This "ä" gets encoded
wrongly in the mail sent out, typcial "two chars" error ("Â~" or
soemthing like that). Interstingly, the next line talking about binary
data which got cut out, which does _also_ include an "ä" gets encoded
correctly. Can you give me any hint on this? Replacement is done in
Scrubber.py, line 392 here:
# Now join the text and set the payload
sep = _('-------------- next part --------------\n')
replace_payload_by_text(msg, sep.join(text), charset)
Do you think there might be other occurances of wrongly encoded
strings read in from the message catalogue or the templates?
Ciao
Stefan
P.S: I can upgrade from 2.1.9 if you think that it will make debugging
easier.
--
Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9
Am Anfang war sie mein Mäuschen, aber langsam werden die Tiere immer größer.
More information about the Mailman-Users
mailing list