[Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7)

Bill Janssen janssen at parc.com
Sun Apr 19 22:59:44 CEST 2009


Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > This whole discussion seems a bit "rare and obscure" to me.  I've built
> > URLs for years without this method, and never felt the lack.  What bugs me
> > is the lack of a way to build multipart-formdata payloads, the only standard
> > way to send non-Latin1 strings as part of a request.
> 
> ?? What's the problem with sending non-Latin1 data without multipart-formdata?

I should have said, as values for a FORM submission.  There are two ways
to encode form values for a FORM submission,
application/x-www-form-urlencoded, and multipart/form-data.  As per
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4:

``The content type "application/x-www-form-urlencoded" is inefficient
for sending large quantities of binary data or text containing non-ASCII
characters. The content type "multipart/form-data" should be used for
submitting forms that contain files, non-ASCII data, and binary data.''

And we don't support this in the http client-side standard library code.
(Do we?  Haven't looked lately.)

The same section also says:

``Space characters are replaced by `+', and then reserved characters are
escaped as described in [RFC1738], section 2.2: Non-alphanumeric
characters are replaced by `%HH', a percent sign and two hexadecimal
digits representing the ASCII code of the character. Line breaks are
represented as "CR LF" pairs (i.e., `%0D%0A').''

That "the ASCII code of the character" seemingly restricts it to ASCII...

But this is complicated by the fact that most browsers try to use the
character set the server will understand, and the widely used technique
to accomplish this is to use the same charset the page the FORM occurs
in uses.  Unless this is set explicitly, it defaults to Latin-1.

I prefer to avoid all this uncertainty, and use a well-defined format
when submitting a form, so I tend to use multipart/form-data, which
allows explicit control over this.

Bill



More information about the Python-Dev mailing list