[Python-Dev] urllib.quote and unquote - Unicode issues

Antoine Pitrou solipsis at pitrou.net
Wed Aug 6 18:55:51 CEST 2008


Martin v. Löwis <martin <at> v.loewis.de> writes:
> URLs are just not made for non-ASCII characters.

Perhaps they are not, but every non-English wiki (just to take a simple, generic
example) potentially contains non-ASCII URLs.
e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant
http://wiki.python.org/moin/J%C3%BCrgenHermann
(notice the utf-8 encoding in both)

> Implement IRIs if you want non-ASCII characters; the rules are much clearer
for these.

I think most people would expect something which works with the current World
Wide Web rather than a rigorous implementation of a specific RFC. Implementing
RFCs is fine but it does not magically eliminate all problems, especially when
the RFCs themselves are not in sync with real-world usage.

Regards

Antoine.




More information about the Python-Dev mailing list