Bug in Python 2.6 urlencode
nad at acm.org
Wed Sep 8 06:56:47 CEST 2010
In article <4c87013f$0$1625$742ec2ed at news.sonic.net>,
John Nagle <nagle at animats.com> wrote:
> On 9/7/2010 5:43 PM, Terry Reedy wrote:
> > On 9/7/2010 3:02 PM, John Nagle wrote:
> >> There's a bug in Python 2.6's "urllib.urlencode". If you pass
> >> in a Unicode character outside the ASCII range, instead of it
> >> being encoded properly, an exception is raised.
> >> File "C:\python26\lib\urllib.py", line 1267, in urlencode
> >> v = quote_plus(str(v))
> >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
> >> position 0: ordinal not in range(128)
> >> This will probably work in 3.x, because there, "str" converts
> >> to Unicode, and quote_plus can handle Unicode. This is one of
> >> those legacy bugs left from the pre-Unicode era.
> >> There's a workaround. Call urllib.urlencode with a second
> >> parameter of 1. This turns on the optional feature of
> >> accepting tuples in the argument to be encoded, and the
> >> code goes through a newer code path that works.
> >> Is it worth reporting 2.x bugs any more? Or are we in the
> >> version suckage period, where version N is abandonware and
> >> version N+1 isn't deployable yet.
> > You may report 2.7 bugs, but please verify that the behavior is a bug in
> > 2.7. However, bugs that have been fixed by the switch to switch to
> > unicode for text are unlikely to be fixed a second time in 2.7. You
> > might suggest an enhancement to the doc for urlencode if that workaround
> > is not clear. Or perhaps that workaround suggests that in this case, a
> > fix would not be too difficult, and you can supply a patch.
> > The basic deployment problem is that people who want to use unicode text
> > also want to use libraries that have not been ported to use unicode
> > text. That is the major issue for many porting projects.
> In other words, we're in the version suckage period.
It took me all of one minute to find where a similar issue was reported
previously (http://bugs.python.org/issue1349732). One of the comments
on the issue explains how to use the "doseq" form and an explicit encode
to handle Unicode items. I don't see where that part of the suggestion
made it into the documentation. I'm sure if you make a specific doc
change suggestion, it will be incorporated into the 2.7 docs. If you
think a code change is needed, suggest a specific patch.
nad at acm.org
More information about the Python-list