Re: [Python-Dev] More on Py3K urllib -- urlencode()

March 7, 2009

      After a harder look, I concluded there was a bit more work to be done, 
but still very basic modifications.

Attached is a version of urlencode() which seems to make the most sense 
to me.

I wonder how I could officially propose at least some of these 
modifications.

- Dan

Bill Janssen wrote:
...
Bill Janssen <janssen@parc.com> wrote:
...
Dan Mahn <dan.mahn@digidescorp.com> wrote:
...
3) Regarding the following code fragment in urlencode():
k = quote_plus(str(k))
          if isinstance(v, str):
               v = quote_plus(v)
               l.append(k + '=' + v)
           elif isinstance(v, str):
               # is there a reasonable way to convert to ASCII?
               # encode generates a string, but "replace" or "ignore"
               # lose information and "strict" can raise UnicodeError
               v = quote_plus(v.encode("ASCII","replace"))
               l.append(k + '=' + v)
I don't understand how the "elif" section is invoked, as it uses the
same condition as the "if" section.
This looks like a 2->3 bug; clearly only the second branch should be
used in Py3K.  And that "replace" is also a bug; it should signal an
error on encoding failures.  It should probably catch UnicodeError and
explain the problem, which is that only Latin-1 values can be passed in
the query string.  So the encode() to "ASCII" is also a mistake; it
should be "ISO-8859-1", and the "replace" should be a "strict", I think.
Sorry!  In 3.0.1, this whole thing boils down to
l.append(quote_plus(k) + '=' + quote_plus(v))
Bill
def urlencode(query, doseq=0, safe='', encoding=None, errors=None):
    """Encode a sequence of two-element tuples or dictionary into a URL query string.

    If any values in the query arg are sequences and doseq is true, each
    sequence element is converted to a separate parameter.

    If the query arg is a sequence of two-element tuples, the order of the
    parameters in the output will match the order of parameters in the
    input.
    """

    if hasattr(query,"items"):
        # mapping objects
        query = query.items()
    else:
        # it's a bother at times that strings and string-like objects are
        # sequences...
        try:
            # non-sequence items should not work with len()
            # non-empty strings will fail this
            if len(query) and not isinstance(query[0], tuple):
                raise TypeError
            # zero-length sequences of all types will get here and succeed,
            # but that's a minor nit - since the original implementation
            # allowed empty dicts that type of behavior probably should be
            # preserved for consistency
        except TypeError:
            ty,va,tb = sys.exc_info()
            raise TypeError("not a valid non-string sequence or mapping object").with_traceback(tb)

    l = []
    if not doseq:
        # preserve old behavior
        for k, v in query:
            k = quote_plus(k if isinstance(k, (str,bytes)) else str(k), safe, encoding, errors)
            v = quote_plus(v if isinstance(v, (str,bytes)) else str(v), safe, encoding, errors)
            l.append(k + '=' + v)
    else:
        for k, v in query:
            k = quote_plus(k if isinstance(k, (str,bytes)) else str(k), safe, encoding, errors)
            if isinstance(v, str):
                v = quote_plus(v if isinstance(v, (str,bytes)) else str(v), safe, encoding, errors)
                l.append(k + '=' + v)
            else:
                try:
                    # is this a sufficient test for sequence-ness?
                    x = len(v)
                except TypeError:
                    # not a sequence
                    v = quote_plus(str(v))
                    l.append(k + '=' + v)
                else:
                    # loop over the sequence
                    for elt in v:
                        elt = quote_plus(elt if isinstance(elt, (str,bytes)) else str(elt), safe, encoding, errors)
                        l.append(k + '=' + elt)
    return '&'.join(l)