[Python-3000] urllib and bytes

Sat Mar 22 19:07:46 CET 2008

In porting Django, I ran into this problem:

Python 3.0a3+ (py3k:61727, Mar 22 2008, 01:44:52)
[GCC 4.2.3 (Debian 4.2.3-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> import urllib
py> urllib.quote(b"/path")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/tmp/lib/python3.0/urllib.py", line 1161, in quote
     return ''.join(res)
   File "/tmp/lib/python3.0/urllib.py", line 1126, in __call__
     if ord(c) < 256:
TypeError: ord() expected string of length 1, but int found

The problem here is that the elements of bytes are integers,
so the quoting algorithm fails.

Is this supposed to work, ie. should urllib operate on bytes?

I think it should: an URL *is* a sequence of bytes, not
characters, and to support characters, Python would have
to support IRIs (which it currently doesn't).

It might be helpful to still accept strings as the input
of quote, but (until IRIs are implemented) restricting that
to ASCII strings.

I'm skeptical about the entire non-ASCII quoting algorithm:
why does it check for characters below 256? It seems it
attempts something similar to IRIs for characters above 256,
encoding them as UTF-8, but encodes characters below 256
as if they were latin-1 ...

Regards,
Martin