[Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7)

Mon Apr 20 05:41:23 CEST 2009

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > ``The content type "application/x-www-form-urlencoded" is inefficient
> > for sending large quantities of binary data or text containing non-ASCII
> > characters.
> 
> The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding
> scheme would) doesn't mean that it doesn't work.

Absolutely.  I'm just quoting the spec to you.  In any case, being able to send
multipart/form-data would be a nice thing to have, if only for file uploads.

> Look out there, many Web pages specify a different character set than
> Latin-1... UTF8 is quite a common choice in the modern world.

Sure.  But nowhere does a spec say that this page charset should be used
in sending the values of a FORM using application/x-www-form-urlencoded
in a new HTTP request.  It's just a convention some browsers use.

> Also, browsers will encode those characters that cannot be encoded in the
> character set using HTML escapes ("&1234;"). This means you can enter any

Sure, some browsers will.  Others will apparently replace them with
question marks.  It's undefined.

> unicode text into any form, regardless of the encoding of the source page. It's
> up to the Web application to decode the text, sure, but any decent Web framework
> or toolkit should do it for you.

Bill