[Mailman-i18n] Re: [Mailman-Developers] [
mailman-Patches-646884 ] HyperArch.py multibyte charset
Martin v. Löwis
loewis at informatik.hu-berlin.de
Sun Dec 8 15:47:48 2002
>Here is an example...
>and after the patch was applied:
>You need japanese font installed to examine the difference.
>Utils.uquote() makes a multibyte character into two or more fragmants
>of Latin-1 characters.
I see. That shows that the bug is actually elsewhere: Utils.uquote is
being passed a byte string. This is not supposed to happen, as
Utils.uquote only works correctly on Unicode strings.
While the patch is still correct, it only papers over the problems: the
output will now be correct only if the message encoding is equal to the
list's preferred encoding, since Util.unquote will still receive a byte
string. In turn, it will see whether the byte string happens to decode
correctly in the list's preferred encoding (which may or may not
by coincidence). If decoding succeeds, it will insert the byte string
unmodified into the page; if it fails, it will fall back to uquote.
I think the problem really comes from some encodings ignoring the
Unicode facilities in Mailman, and being carried through the processing
chain. This should be done either correctly (by always accompanying the
byte string with its encoding), or not at all (by converting everything
to Unicode). This is perhaps a little to much asked for Mailman 2.1,
EASY and FREE access to your email anywhere: http://Mailreader.com/
More information about the Mailman-i18n