barry@python.org (Barry A. Warsaw) writes:
This seems to work fairly well (with some ugly changes also necessary to the logging system), with one minor kludge. I want to allow non-ASCII characters in real names for English lists. I'm nervous about changing the default charset for English from us-ascii because I'm superstitious about unintended side-effects. So I'm making a couple of special cases for us-ascii. When decoding a string from a web form, if the default charset would be us-ascii, I'll use iso-8859-1 instead. Then when encoding a name in an email header, if the charset is us-ascii, again, I'll use iso-8859-1. This seems like a practical compromise, if a bit ugly. Feedback is welcome.
Do you already send the page that has the form in iso-8859-1, or do you use latin-1 only when interpreting form data? If the latter, I think you gain nothing: the web browser will not transmit latin-1 data if the form was us-ascii, so decoding the data with latin-1 will work, but not allow to transmit latin-1 data. On encoding Unicode names in email messages: I hope you have a general fallback to UTF-8. If all else fails, UTF-8 will still work, and DTRT. Regards, Martin