[Web-SIG] parsing of urlencoded data and Unicode

Manlio Perillo manlio_perillo at libero.it
Mon Jul 28 23:42:26 CEST 2008


Ian Bicking ha scritto:
> Manlio Perillo wrote:
>> Hi.
>>
>> In my WSGI framework:
>> http://hg.mperillo.ath.cx/wsgix
>>
>> I have, in the `http` module, the functions `parse_query_string` and
>> `parse_simple_post_data`.
>>
>> The first parse the query string and return a dictionary of strings, the
>> latter parse the application/x-www-form-urlencoded client body and
>> return a dictionary of strings and the charset used by the client for
>> the unicode encoding.
>>
>>
>> Now, I'm thinking if these two function should instead return Unicode
>> strings instead of plain strings.
>>
>> I think that Unicode strings should be returned, but I would like to
>> know what other web frameworks do.
>>
>> Django seems to convert to Unicode, but the Python standard library 
>> does not (and I would like to know if changes are planned for Python 
>> 3.x).
> 
> WebOb decodes to request data to str, then lazily decodes to unicode 
> based on the request encoding.  The request encoding is a bit fuzzy to 
> calculate, which is part of why the decoding is lazy, so that the 
> request encoding can be set or changed at any time.
> 

Ok, thanks.
In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset 
specified in the POST'ed data (utf-8 or the charset found in the special 
_charset_ field).



Manlio Perillo



More information about the Web-SIG mailing list