![](https://secure.gravatar.com/avatar/cea26f21aea8bc278fb11fb9450982b4.jpg?s=120&d=mm&r=g)
barry@zope.com (Barry A. Warsaw) writes:
I must be dense because I'm not quite seeing how this will work. [...] This doesn't tell me enough either does it?
You are running into one of the most awful oddities of HTTP and i18n. In short, the encoding of the page that contained the form was used to encoding the form contents :-( The RFC says the browser SHOULD declare the encoding for each field in the per-field MIME header of multipart/form-data message. None of the browsers does that. I filed bug reports for all of them, and Mozilla people responded that they can't do that because many CGI scripts break when they get a charset= (it won't fit their regexp). The RFC says, as a fall-back, the browser should use the encoding of the HTML page which contained the form. Mailman doesn't declare a charset in the administrative pages, but it should. It may happen that the user enters a character which cannot be represented in the charset of the page. In this case, Mozilla sends a '?' (question mark), so you can only tell that there was a character, but not which one. Internet Exploder sends a HTML entity, which gives you more information, but is undistinguishable from the case where the user entered an ampersand-digits sequence. For Mailman, this gives two options: 1. Each administrative page should be encoded in the list's "native" charset. This will allow to add names in that charset. 2. Each page should be encoded in UTF-8. This will allow to enter arbitrary names, but will require recoding to the list's charset later (or using UTF-8 in the To: fields as well). Actually, it appears that mailman already does 1, in the HTTP header. Barry, what is the charset of your admin pages? Regards, Martin