[Web-SIG] WSGI 2
henry at precheur.org
Fri Aug 14 07:36:28 CEST 2009
On Wed, Aug 12, 2009 at 12:05:40AM -0500, Ian Bicking wrote:
> Correct -- you can write any set of % encodings, and I don't think it even
> has to be able to validly url-decode (e.g., /foo%zzz will work). It
> definitely doesn't have to be a valid encoding. However, if you actually
> include unicode characters, they will always be encoded as UTF-8 (as goes
> with the IRI standard). This is in a case like <a href="/some page">, the
> browser will request /some%20page, because it escapes unsafe characters.
> Similarly if you request <a href="/fran??ais"> it will encode that ?? in
> UTF-8, then url-encode it, even if the page itself is ISO-8859-1. Well, at
> least on Firefox. I used this to test:
I have run some tests regarding the encoding issue:
curl doesn't 'url-encode' its URLs:
<e7> latin-1 character
The latin-1 character is send to the server. Lighttpd accepts the URL
and even return a file if it exists. Of course if I try with the same
characters in UTF-8 it doesn't work.
AFAIK RFC 2396 forbid non-ASCII characters in URLs. The problem is that
libcurl is quite popular (it used to be the transport library of
Webkit/GTK+ for example.) It's hard to discard it as a utterly broken &
obscure tool. Many 'simplistic' HTTP clients may have the same problem.
Now let's talk a little bit about cookies...
Cookies can contain whatever 'binary junk' the server send. RFC 2965
> The VALUE is opaque to the user agent and may be anything the origin
> server chooses to send, possibly in a server-selected printable ASCII
Also, cookies can contain 'comments' which contains UTF-8 strings.
> Characters in value MUST be in UTF-8 encoding.
Firefox has no problem with cookies containing non-ASCII characters. It
looks like it assumes cookies are encoded using latin-1, since latin-1
characters are displayed correctly in Firebug, but not UTF-8 ones.
More information about the Web-SIG