urlencode with high characters
"Martin v. Löwis"
martin at v.loewis.de
Wed Nov 2 23:23:42 CET 2005
> My understanding is that I am supposed to be able to urlencode anything
> up to the top half of latin-1 -- decimal 128-255.
I believe your understanding is incorrect. Without being able to quote
RFCs precisely, I think your understanding should be this:
- the URL literal syntax only allows for ASCII characters
- bytes with no meaning in ASCII can be quoted through %hh in URLs
- the precise meaning of such bytes in the URL is defined in the
URL scheme, and may vary from URL scheme to URL scheme
- the http scheme does not specify any interpretation of the bytes,
but apparantly assumes that they denote characters, and follow
some encoding - which encoding is something that the web server
defines, when mapping URLs to resources.
If you get the impression that this is underspecified: your impression
is correct; it is underspecified indeed.
There is a recent attempt to tighten the specification through IRIs.
The IRI RFC defines a mapping between IRIs and URIs, and it uses
UTF-8 as the encoding, not latin-1.
More information about the Python-list