[issue3300] urllib.quote and unquote - Unicode issues

Matt Giuca report at bugs.python.org
Thu Jul 10 03:55:01 CEST 2008


Matt Giuca <matt.giuca at gmail.com> added the comment:

OK well here are the necessary changes to the documentation (RST docs
and docstrings in the code).

As I said above, I plan to to extensive testing and add new cases, and I
don't recommend this patch is accepted until that's done.

Patch (parse.py.patch3) is for branch /branches/py3k, revision 64834.

Commit log:

urllib.parse.unquote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the decoding of percent-encoded octets
(previously implicitly decoded as ISO-8859-1). As per RFC 3986, default
is "utf-8".

urllib.parse.quote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the encoding of non-ASCII characters
before being percent-encoded (previously characters in range(128, 256)
were encoded as ISO-8859-1, and characters above that as UTF-8). Also
fixed characters greater than 256 not responding to "safe", and also not
being cached.

Doc/library/urllib.parse.rst: Updated docs on quote and unquote to
reflect new interface.

Lib/test/test_urllib.py, Lib/test/test_http_cookiejar.py: Updated test
cases which expected output in ISO-8859-1, now expects UTF-8.

Lib/email/utils.py: Calls urllib.parse.quote and urllib.parse.unquote
with encoding="latin-1", to preserve existing behaviour (which the whole
email module is dependent upon).

Added file: http://bugs.python.org/file10873/parse.py.patch3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________


More information about the Python-bugs-list mailing list