[Python-Dev] urllib.quote and unquote - Unicode issues

André Malo nd at perlig.de
Sun Jul 13 01:55:48 CEST 2008

* Matt Giuca wrote:

> Well from what I've seen, the only time Latin-1 naturally appears on the
> net is when you have a web page in Latin-1 (either explicit or inferred;
> and note that a browser like Firefox will infer Latin-1 if it sees only
> ASCII characters) with a form in it. Submitting the form, the browser
> will use Latin-1 to percent-encode the query string.

This POV is way too browser-centric...

> So if you write a web app and you don't have any non-ASCII characters or
> mention the charset, chances are you'll get Latin-1. But I would argue
> you're leaving things to chance and you deserve to get funny behaviour.
> If you do any of the following:
>    - Use a non-ASCII character, encoded as UTF-8 on the page.
>    - Send a Content-Type: xxxx; charset=utf-8.
>    - In HTML, set a <meta http-equiv="Content-Type: xxxx; charset=utf-8"
> />. - In the form itself, set <form accept-encoding="utf-8">.
> then the browser will encode the form data as UTF-8. And most "proper"
> web pages should get themselves explicitly served as UTF-8.

... because

1) URL encoding is not limited to web forms at all

2) The web form encoding depends on the browser settings as well (for 
example, try playing around with the internet explorer settings regarding 
query encoding)

3) The process submitting the form may not be a browser at all

4) The web form may not be under your own control (Search engine forms are a 
common example here, e.g. "put this google form snippet onto your webpage")

5) Different cultures do not choose necessarily between latin-1 and utf-8. 
They deal more with things like, say KOI8-R or Big5.

etc pp

Besides all that and without any offense: "most proper" and "should do" and 
the implication that all web browsers behave the same way are not a good 
location to argue from when talking about implementing a standard ;)

Wenn nur Ingenieure mit Diplom programmieren würden, hätten wir
wahrscheinlich weniger schlechte Software.
Wir hätten allerdings auch weniger gute Software.
                                   -- Felix von Leitner in dasr

More information about the Python-Dev mailing list