[Web-SIG] parsing of urlencoded data and Unicode
manlio_perillo at libero.it
Tue Jul 29 18:39:10 CEST 2008
Bill Janssen ha scritto:
>>> That's probably wrong. We went through this recently on the
>>> python-dev list. While it's possible to tell the encoding of
>> With multipart/form-data the problem should be the same.
>> The content type is defined only for file fields.
> Actually, it's defined for all fields, isn't it? From RFC 2388:
> ``As with all multipart MIME types, each part has an optional
> "Content-Type", which defaults to text/plain.''
> So the type is "text/plain" unless it says something else. And,
> according to RFC 2046, the default charset for "text/plain" is
Ok with theory.
But in practice:
<form action="" method="post" accept-charset="utf-8"
Content-Type: multipart/form-data; boundary=abcde
Content-Disposition: form-data; name="Title"
Content-Disposition: form-data; name="body"
In theory I should assume ascii encoded data for the body field; and
since this data can not be decoded, I should assume it as byte string.
However the body field is encoded in utf-8, and if I add an hidden
_charset_ field, FF and IE add this field in the response, with the
charset used in the encoding.
I think that it is safe to decode data from the QUERY_STRING and POST
data to Unicode, and to return Bad Request in case of errors.
If the user have specialized needs, he can use low level parsing functions.
In wsgix the "high" level functions are parse_query_string and
parse_simple_post_data; the "low" level function is parse_qs.
Thanks Manlio Perillo
More information about the Web-SIG