[Web-SIG] HTTP headers encoding

Manlio Perillo manlio_perillo at libero.it
Thu Dec 3 15:49:08 CET 2009


Hi.

I'm doing some tests to try to understand how HTTP headers are encoded
by browsers.

I have written a simple WSGI application that asks authentication
credentials and then print them on the terminal and return the data as
response, as raw bytes
http://paste.pocoo.org/show/154633/

Then I used some browsers to try to send an username with non ascii
characters.


When I try with simple characters in the iso-8859-1 charset, things
works well; the data is encoded using this charset.

However when I try to use some extraneus character, like Euro, there are
problems.

Firefox (Iceweasel 3.0.14, Linux Debian Squeeze) sends me a
'\xac'

I don't know where \xac come from, but it is the last byte in the utf-8
encoded Euro: '\xe2\x82\xac'


Internet Explorer 6.0 sends me a
'\x80'
and this this the Euro characted encoded using cp1252 (and I suspect
that it always use this encoding, instead of iso-8859-1).

Unfortunately I can not test with IE 7 and 8.



With a browser working on a terminal, like lynx, things get worse.
If I enter as user name the string "àè", lynx sends me
'\xc3\xa0\xc3\xa8'

This happens in a GNOME terminal, with an it_IT.utf8 locale.

wget and curl do the same.


Can someone else reproduce this?



Thanks   Manlio


More information about the Web-SIG mailing list