[Python-Dev] Encoding detection in the standard library?
"Martin v. Löwis"
martin at v.loewis.de
Tue Apr 22 20:06:16 CEST 2008
> When a web browser POSTs data, there is no standard way of communicating
> which encoding it's using.
That's just not true. Web browser should and do use the encoding of the
web page that originally contained the form.
> There are some hints which make it easier
> (accept-charset attributes, the encoding used to send the page to the
> browser), but no guarantees.
Not true. The latter is guaranteed (unless you assume bugs - but if
you do, can you present a specific browser that has that bug?)
> Email is a smaller problem, because it usually has a helpful
> content-type header, but that's no guarantee.
Then assume windows-1252. Mailers who don't use MIME for non-ASCII
characters mostly died 10 years ago; those people who continue to
use them likely can accept occasional moji-bake (or else they would
have switched long ago).
> Now, at the moment, the only data I have to support this claim is my
> experience with DrProject in non-English locations.
> If I'm the only one who has had these sorts of problems, I'll go back to
> "Unicode for Dummies".
For web forms, I always encode the pages in UTF-8, and that always
For email, I once added encoding processing to the pipermail (the
mailman archiver), and that also always works.
> I'll go back and take another look at the problem, then come back if new
> revelations appear.
More information about the Python-Dev