[Mailman-i18n] Re: [Mailman-Developers] [ mailman-Patches-646884 ] HyperArch.py multibyte charset

Martin v. Löwis loewis at informatik.hu-berlin.de
Sun Dec 8 15:47:48 2002

>Here is an example...
>and after the patch was applied:
>You need japanese font installed to examine the difference.
>Utils.uquote() makes a multibyte character into two or more fragmants
>of Latin-1 characters.

I see. That shows that the bug is actually elsewhere: Utils.uquote is 
being passed a byte string. This is not supposed to happen, as 
Utils.uquote only works correctly on Unicode strings.

While the patch is still correct, it only papers over the problems: the 
output will now be correct only if the message encoding is equal to the 
list's preferred encoding, since Util.unquote will still receive a byte 
string. In turn, it will see whether the byte string happens to decode 
correctly in the list's preferred encoding (which may or may not 
by coincidence). If decoding succeeds, it will insert the byte string 
unmodified into the page; if it fails, it will fall back to uquote.

I think the problem really comes from some encodings ignoring the 
Unicode facilities in Mailman, and being carried through the processing 
chain. This should be done either correctly (by always accompanying the 
byte string with its encoding), or not at all (by converting everything 
to Unicode). This is perhaps a little to much asked for Mailman 2.1, 


