[Bug 1202395] Re: sync_members crashes for UTF-8 real name

Cedders 1202395 at bugs.launchpad.net
Sat Jul 20 02:32:22 CEST 2013


Hi Mark

Thanks for the reply.  By the way, it was you who suggested this
approach, and I still think you were right back then!

Firstly, according to http://wiki.python.org/moin/DefaultEncoding,
sys.getdefaultencoding() is pretty much deprecated and will be removed
in Python 3.0 (as you say "Python's default encoding is ascii regardless
of locale").  Secondly, I don't think the input to sync_members should
be interpreted as a 7-bit message header with possibly RFC 2047
encoding.  Thirdly, add_members does not have this problem.  Fourthly,
if you did escape the non-ASCII characters with base64 or quoted-
printable at some point, then these would presumably show up in the
command output (and possibly the web interface).

Finally, yes, modifying site.py as you describe does fix both problems
(with or without the patch), but in practice are most sysadmins likely
to do that?  If they fail to modify it, should sync_members crash?   And
what if for some reason the system locale changes to, eg iso-8859-1?  On
a site with a UTF-8 encoding, as I unders tand it, all this
functionality does is convert from utf-8 to utf-8.   There is a per-list
encoding, as might be useful on a non-unicode system hosting lists in
both ISO-8859-5 and ISO-8859-1, but as far as I can see, the list
encoding is not taken into account in the command-line scripts.

I did wonder if assigning 
   enc = locale.getdefaultlocale()[1] or locale.getpreferredencoding() or "UTF8"
within the script would help (outputting to correct encoding for console), but it doesn't; as you say it's the implied decode on the output of formataddr and join that is not seen as a Unicode string.  Logically perhaps it should first be decoded from the input encoding and re-encoded as enc, the expected encoding in the system locale; but that's equivalent to doing nothing.

If the defaultencoding approach were to be implemented in Python in
future in a way that doesn't cause this problem (beyond being applied in
concatenation and join), then encoding the strings from (for example) an
ISO-8859-5 to give legible output on a UTF-8 console would be the way to
go.  But it doesn't look to me like that is the way the wind is blowing.

Hope this makes sense.

-- 
You received this bug notification because you are a member of Mailman
Coders, which is subscribed to GNU Mailman.
https://bugs.launchpad.net/bugs/1202395

Title:
  sync_members crashes for UTF-8 real name

To manage notifications about this bug go to:
https://bugs.launchpad.net/mailman/+bug/1202395/+subscriptions


More information about the Mailman-coders mailing list