[Web-SIG] HTTP headers encoding
Manlio Perillo
manlio_perillo at libero.it
Thu Dec 3 15:49:08 CET 2009
Hi.
I'm doing some tests to try to understand how HTTP headers are encoded
by browsers.
I have written a simple WSGI application that asks authentication
credentials and then print them on the terminal and return the data as
response, as raw bytes
http://paste.pocoo.org/show/154633/
Then I used some browsers to try to send an username with non ascii
characters.
When I try with simple characters in the iso-8859-1 charset, things
works well; the data is encoded using this charset.
However when I try to use some extraneus character, like Euro, there are
problems.
Firefox (Iceweasel 3.0.14, Linux Debian Squeeze) sends me a
'\xac'
I don't know where \xac come from, but it is the last byte in the utf-8
encoded Euro: '\xe2\x82\xac'
Internet Explorer 6.0 sends me a
'\x80'
and this this the Euro characted encoded using cp1252 (and I suspect
that it always use this encoding, instead of iso-8859-1).
Unfortunately I can not test with IE 7 and 8.
With a browser working on a terminal, like lynx, things get worse.
If I enter as user name the string "àè", lynx sends me
'\xc3\xa0\xc3\xa8'
This happens in a GNOME terminal, with an it_IT.utf8 locale.
wget and curl do the same.
Can someone else reproduce this?
Thanks Manlio
More information about the Web-SIG
mailing list