[Mailman-Users] Trying to understand charset encoding in mailman

Stephen J. Turnbull stephen at xemacs.org
Thu Apr 17 14:28:42 CEST 2014


Hi, Laura!

Laura Creighton writes:

 > But the Europython mailing list is configured so that its messages
 > come out
 > 
 > Content-Type: text/plain; charset="us-ascii"

This isn't from the list or site configuration, this is from the
poster's mail user agent (MUA).  The mailing list does not choose the
charset for the message; the MUA does.  For example, grepping my
archive of python-dev messages I see 3 different variants of UTF-8
(capitalization and quoting), us-ascii, iso-8859-1, and window-1252
(each in several variants).

Mailman already has about 200 lines of logic to handle cases where the
footer charset is incompatible with the message's charset.  Have you
tried simply changing the Python escape to a literal EN DASH in the
web interface?  I hope Mailman is smart enough to convert that to
Unicode internally, and all should Just Work[tm].

If that doesn't work, change the EN DASH to "--", and report it as a
bug.  We'll see what we can do in 2.1.19, before EuroPython is held in
Göteborg or Łódź. :-/

 > Since \x96 is an unrecognised character in us-ascii,

It's not even a character here, it's a raw byte, which may or may not
get recognized correctly by Mailman depending on the list's preferred
charset.  Somebody was way too tricky for their own good.

 > But unless I have overlooked something, there is no way to make a charset
 > change on a per-list basis through the mailman administrative interface.

There's no way to make a charset change in posts at all; it's not
Mailman's job to do that, really.  I suppose we could convert all
posts to UTF-8, which would make the logic mentioned above a lot
simpler, but that would probably annoy a few people and might not work
for some variant charsets.

Steve


More information about the Mailman-Users mailing list