[Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7)

Sun Apr 19 23:45:04 CEST 2009

Bill Janssen <janssen <at> parc.com> writes:
> 
> ``The content type "application/x-www-form-urlencoded" is inefficient
> for sending large quantities of binary data or text containing non-ASCII
> characters.

The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding
scheme would) doesn't mean that it doesn't work.

There are millions of Web sites out there which allow you to submit non-ASCII
data without resorting to "multipart/form-data" encoding. The situations where
the submitted text is huge enough that encoding efficiency matters are probably
insanely rare.

> But this is complicated by the fact that most browsers try to use the
> character set the server will understand, and the widely used technique
> to accomplish this is to use the same charset the page the FORM occurs
> in uses.  Unless this is set explicitly, it defaults to Latin-1.

Look out there, many Web pages specify a different character set than
Latin-1... UTF8 is quite a common choice in the modern world.

Also, browsers will encode those characters that cannot be encoded in the
character set using HTML escapes ("&1234;"). This means you can enter any
unicode text into any form, regardless of the encoding of the source page. It's
up to the Web application to decode the text, sure, but any decent Web framework
or toolkit should do it for you.

Regards

Antoine.