[Python-Dev] Can the cgi module be made Unicode-aware?

Skip Montanaro skip@pobox.com
Thu, 11 Apr 2002 00:02:40 -0500


I keep trying to handle various places in my code where I can get input in
non-ASCII encodings.  Today I realized the cgi module does nothing to
translate Unicode data into unicode objects.  I see in one instance that I
am getting data that is clearly utf-8 encoded, but I see nothing in the CGI
script's environment variables to suggest the client web browser told the
server how the data was encoded other than the obvious "Content-Type:
application/x-www-form-urlencoded".  Is utf-8 implied for the data once the
url encoding has been reversed?

Should the cgi module be made Unicode-aware?  If so, how?  I can never
remember the incantation to convert non-ASCII string objects to Unicode
objects and nothing I've tried by trial-and-error so far works.  I *don't*
want to adopt the workaround outlined in FAQ question 4.102 (change the
default site-wide encoding).  Perhaps that question should be extended with
more appropriate information about converting raw strings with non-ASCII
content to unicode.  

Skip