
After a harder look, I concluded there was a bit more work to be done, but still very basic modifications. Attached is a version of urlencode() which seems to make the most sense to me. I wonder how I could officially propose at least some of these modifications. - Dan Bill Janssen wrote:
Bill Janssen <janssen@parc.com> wrote:
Dan Mahn <dan.mahn@digidescorp.com> wrote:
3) Regarding the following code fragment in urlencode():
k = quote_plus(str(k)) if isinstance(v, str): v = quote_plus(v) l.append(k + '=' + v) elif isinstance(v, str): # is there a reasonable way to convert to ASCII? # encode generates a string, but "replace" or "ignore" # lose information and "strict" can raise UnicodeError v = quote_plus(v.encode("ASCII","replace")) l.append(k + '=' + v)
I don't understand how the "elif" section is invoked, as it uses the same condition as the "if" section.
This looks like a 2->3 bug; clearly only the second branch should be used in Py3K. And that "replace" is also a bug; it should signal an error on encoding failures. It should probably catch UnicodeError and explain the problem, which is that only Latin-1 values can be passed in the query string. So the encode() to "ASCII" is also a mistake; it should be "ISO-8859-1", and the "replace" should be a "strict", I think.
Sorry! In 3.0.1, this whole thing boils down to
l.append(quote_plus(k) + '=' + quote_plus(v))
Bill
def urlencode(query, doseq=0, safe='', encoding=None, errors=None): """Encode a sequence of two-element tuples or dictionary into a URL query string. If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter. If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input. """ if hasattr(query,"items"): # mapping objects query = query.items() else: # it's a bother at times that strings and string-like objects are # sequences... try: # non-sequence items should not work with len() # non-empty strings will fail this if len(query) and not isinstance(query[0], tuple): raise TypeError # zero-length sequences of all types will get here and succeed, # but that's a minor nit - since the original implementation # allowed empty dicts that type of behavior probably should be # preserved for consistency except TypeError: ty,va,tb = sys.exc_info() raise TypeError("not a valid non-string sequence or mapping object").with_traceback(tb) l = [] if not doseq: # preserve old behavior for k, v in query: k = quote_plus(k if isinstance(k, (str,bytes)) else str(k), safe, encoding, errors) v = quote_plus(v if isinstance(v, (str,bytes)) else str(v), safe, encoding, errors) l.append(k + '=' + v) else: for k, v in query: k = quote_plus(k if isinstance(k, (str,bytes)) else str(k), safe, encoding, errors) if isinstance(v, str): v = quote_plus(v if isinstance(v, (str,bytes)) else str(v), safe, encoding, errors) l.append(k + '=' + v) else: try: # is this a sufficient test for sequence-ness? x = len(v) except TypeError: # not a sequence v = quote_plus(str(v)) l.append(k + '=' + v) else: # loop over the sequence for elt in v: elt = quote_plus(elt if isinstance(elt, (str,bytes)) else str(elt), safe, encoding, errors) l.append(k + '=' + elt) return '&'.join(l)