Bug in Python 2.6 urlencode

Ned Deily nad at acm.org
Wed Sep 8 00:56:47 EDT 2010


In article <4c87013f$0$1625$742ec2ed at news.sonic.net>,
 John Nagle <nagle at animats.com> wrote:
> On 9/7/2010 5:43 PM, Terry Reedy wrote:
> > On 9/7/2010 3:02 PM, John Nagle wrote:
> >> There's a bug in Python 2.6's "urllib.urlencode". If you pass
> >> in a Unicode character outside the ASCII range, instead of it
> >> being encoded properly, an exception is raised.
> >>
> >> File "C:\python26\lib\urllib.py", line 1267, in urlencode
> >> v = quote_plus(str(v))
> >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
> >> position 0: ordinal not in range(128)
> >>
> >> This will probably work in 3.x, because there, "str" converts
> >> to Unicode, and quote_plus can handle Unicode. This is one of
> >> those legacy bugs left from the pre-Unicode era.
> >>
> >> There's a workaround. Call urllib.urlencode with a second
> >> parameter of 1. This turns on the optional feature of
> >> accepting tuples in the argument to be encoded, and the
> >> code goes through a newer code path that works.
> >>
> >> Is it worth reporting 2.x bugs any more? Or are we in the
> >> version suckage period, where version N is abandonware and
> >> version N+1 isn't deployable yet.
> >
> > You may report 2.7 bugs, but please verify that the behavior is a bug in
> > 2.7. However, bugs that have been fixed by the switch to switch to
> > unicode for text are unlikely to be fixed a second time in 2.7. You
> > might suggest an enhancement to the doc for urlencode if that workaround
> > is not clear. Or perhaps that workaround suggests that in this case, a
> > fix would not be too difficult, and you can supply a patch.
> >
> > The basic deployment problem is that people who want to use unicode text
> > also want to use libraries that have not been ported to use unicode
> > text. That is the major issue for many porting projects.
> 
>      In other words, we're in the version suckage period.

It took me all of one minute to find where a similar issue was reported 
previously (http://bugs.python.org/issue1349732).  One of the comments 
on the issue explains how to use the "doseq" form and an explicit encode 
to handle Unicode items.  I don't see where that part of the suggestion 
made it into the documentation.  I'm sure if you make a specific doc 
change suggestion, it will be incorporated into the 2.7 docs.  If you 
think a code change is needed, suggest a specific patch.

-- 
 Ned Deily,
 nad at acm.org




More information about the Python-list mailing list