[Python-Dev] urllib.quote and unquote - Unicode issues

Bill Janssen janssen at parc.com
Thu Jul 31 09:39:29 CEST 2008


> Guido says:
> 
> > Actually, we'd need to look at the various other APIs in Py3k before we can
> > decide whether these should be considered taking or returning bytes or text.
> > It looks like all other APIs in the Py3k version of urllib treat URLs as
> > text.
> 
> 
> Yes, as I said in the bug tracker, I've groveled over the entire stdlib to
> see how my patch affects the behaviour of dependent code. Aside from a few
> minor bits which assumed octets (and did their own encoding/decoding) (which
> I fixed), all the code assumes strings and is very happy to go on assuming
> this, as long as the URIs are encoded with UTF-8, which they almost
> certainly are.

I'm not sure that's sufficient review, though I agree it's necessary.
The major consumers of quote/unquote are not in the Python standard
library.

> (quote will accept either type, while
> unquote will output a str, there will be a new function unquote_to_bytes
> which outputs a bytes - is everyone happy with that?)

No, so don't ask.

Bill


More information about the Python-Dev mailing list