[issue3300] urllib.quote and unquote - Unicode issues
Matt Giuca
report at bugs.python.org
Thu Jul 10 03:55:01 CEST 2008
Matt Giuca <matt.giuca at gmail.com> added the comment:
OK well here are the necessary changes to the documentation (RST docs
and docstrings in the code).
As I said above, I plan to to extensive testing and add new cases, and I
don't recommend this patch is accepted until that's done.
Patch (parse.py.patch3) is for branch /branches/py3k, revision 64834.
Commit log:
urllib.parse.unquote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the decoding of percent-encoded octets
(previously implicitly decoded as ISO-8859-1). As per RFC 3986, default
is "utf-8".
urllib.parse.quote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the encoding of non-ASCII characters
before being percent-encoded (previously characters in range(128, 256)
were encoded as ISO-8859-1, and characters above that as UTF-8). Also
fixed characters greater than 256 not responding to "safe", and also not
being cached.
Doc/library/urllib.parse.rst: Updated docs on quote and unquote to
reflect new interface.
Lib/test/test_urllib.py, Lib/test/test_http_cookiejar.py: Updated test
cases which expected output in ISO-8859-1, now expects UTF-8.
Lib/email/utils.py: Calls urllib.parse.quote and urllib.parse.unquote
with encoding="latin-1", to preserve existing behaviour (which the whole
email module is dependent upon).
Added file: http://bugs.python.org/file10873/parse.py.patch3
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________
More information about the Python-bugs-list
mailing list