[Web-SIG] HTTP headers encoding
and-py at doxdesk.com
Thu Dec 3 20:11:54 CET 2009
Manlio Perillo wrote:
> I have written a simple WSGI application that asks authentication
Ho ho! This is another area that is Completely Broken Everywhere. It's
actually a similar situation to the cookies:
- Opera and Chrome send non-ASCII cookie characters in UTF-8.
- IE encodes using the system codepage (which can never be UTF-8),
mangling any characters that don't fit in the codepage through the
traditional Windows 'similar replacement character' scheme.
- Mozilla uses the low byte of each UTF-16 code point (so ISO-8859-1
gets through but everything else is mangled)
- Safari uses ISO-8859-1, and refuses to send any cookie containing
characters outside the 8859-1 repertoire.
- Konqueror uses ISO-8859-1, and replaces any non-8859-1 character
with a question mark.
The HTTP standard has nothing to say about the encoding in use *inside*
the base64-encoded Authorization byte-string token. It's anyone's guess,
and every browser has guessed differently. (Safari here is at least
slightly better than its behaviour with the cookies.)
> (and I suspect that [IE] always use this encoding, instead of
It will certainly never send ISO-8859-1, but what it does send is locale
dependent. Type an e-acute in your username on a Western machine and
it'll send one byte sequence; type the same thing on an Eastern European
Windows install and you'll get something quite different.
> Firefox (Iceweasel 3.0.14, Linux Debian Squeeze) sends me a '\xac'
> I don't know where \xac come from
It's the low byte of UCS-2 codepoint U+20AC (EURO SIGN). Firefox simply
discards the top 8 bits of each codepoint.
> Unfortunately I can not test with IE 7 and 8.
The behaviour has not changed.
> This is really a mess.
> How is authorization username handled in common WSGI frameworks?
No-one supports non-ASCII characters in Authentication. Most web authors
simply move to cookies instead.
mailto:and at doxdesk.com
More information about the Web-SIG