[Python-Dev] urllib.quote and unicode bug resuscitation attempt

Stefan Rank stefan.rank at ofai.at
Thu Jul 13 16:29:15 CEST 2006


on 13.07.2006 10:26 Mike Brown said the following:
> Stefan Rank wrote:
>> on 12.07.2006 07:53 Martin v. Löwis said the following:
>>> Anthony Baxter wrote:
>>>>> The right thing to do is IRIs. 
>>>> For 2.5, should we at least detect that it's unicode and raise a 
>>>> useful error?
>>> That can certainly be done, sure.

<snip>

> Put me down as +1 on raising a useful error instead of a KeyError or whatever,
> and +1 on having an irilib, but -1 on working toward accepting unicode in the
> URI-oriented urllib.quote(), because

<snip convincing explanation>

> See, right now, quote('abc 123%') returns 'abc%20123%25', as you would expect. 
> Similarly, everyone would probably expect u'abc 123%' to return
> u'abc%20123%25', and if we were to implement that, there'd probably be no harm 
> done.

Well, originally, I would have expected it to return a byte str(ing),
BUT
I am now converted and think it is best to raise a TypeError for 
unicode, and leave the encoding decisions to higher level code.

So I'll repeat the "patch" #1, slightly modified::

  if isinstance(s, unicode):
      raise TypeError("quote expects an encoded byte string as argument")

Is it safe to assume that code that breaks because of this change was 
already broken?

stefan



More information about the Python-Dev mailing list