[Python-Dev] More on Py3K urllib -- urlencode()

Bill Janssen janssen at parc.com
Sun Mar 1 00:33:17 CET 2009


Bill Janssen <janssen at parc.com> wrote:

> Dan Mahn <dan.mahn at digidescorp.com> wrote:
> 
> > 3) Regarding the following code fragment in urlencode():
> > 
> >            k = quote_plus(str(k))
> >           if isinstance(v, str):
> >                v = quote_plus(v)
> >                l.append(k + '=' + v)
> >            elif isinstance(v, str):
> >                # is there a reasonable way to convert to ASCII?
> >                # encode generates a string, but "replace" or "ignore"
> >                # lose information and "strict" can raise UnicodeError
> >                v = quote_plus(v.encode("ASCII","replace"))
> >                l.append(k + '=' + v)
> > 
> > I don't understand how the "elif" section is invoked, as it uses the
> > same condition as the "if" section.
> 
> This looks like a 2->3 bug; clearly only the second branch should be
> used in Py3K.  And that "replace" is also a bug; it should signal an
> error on encoding failures.  It should probably catch UnicodeError and
> explain the problem, which is that only Latin-1 values can be passed in
> the query string.  So the encode() to "ASCII" is also a mistake; it
> should be "ISO-8859-1", and the "replace" should be a "strict", I think.

Sorry!  In 3.0.1, this whole thing boils down to

   l.append(quote_plus(k) + '=' + quote_plus(v))

Bill


More information about the Python-Dev mailing list