[Baypiggies] urllib.urlencode and encoding

Tung Wai Yip tungwaiyip at yahoo.com
Thu Apr 19 01:51:54 CEST 2007


I may not have the complete context of your question. So I might be  
suggesting something different.

I think you want to encoded unicode characters into a query string or the  
URI. What you are doing is right. Not only do you have to encode a string  
in UTF-8 first, you also need a complementary UTF-8 decoding on the CGI  
side.

urllib.urlencode() cannot encode unicode string itself. RFC 2396 has not  
taken unicode into consideration. So there is no rule on what to do with  
unicode in an URI. It is up to the application to decide on the encoding,  
e.g. UTF-8 first, url encoding next. Others might very well choose to use  
UTF-16 instead.

Wai Yip

> I noticed that urllib.urlencode does the right thing (i.e. it uses
> %xx) if you .encode('utf-8') the parameters first.  I'm wondering if
> it makes sense for urllib.urlencode to automatically encode Unicode
> objects in this case.  I haven't had much luck getting changes into
> Python, so I was going to solicit comments here first.
>
> Thanks,
> -jj
>
> --
> "'Software Engineering' is something of an oxymoron.  It's very
> difficult to have real engineering before you have physics, and there
> isn't anything even close to a physics for software." -- L. Peter
> Deutsch
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies




More information about the Baypiggies mailing list