[issue5468] urlencode does not handle "bytes", and could easily handle alternate encodings
Dan Mahn
report at bugs.python.org
Tue Mar 10 15:45:17 CET 2009
New submission from Dan Mahn <dan.mahn at digidescorp.com>:
urllib.parse.urlencode() uses quote_plus() extensively to create a
complete query string, but doesn't effectively/properly take advantage
of the flexibility built into quote_plus(). Namely:
1) Instances of type "bytes" are not properly encoded, as str() is used
prior to passing to quote_plus(). This creates a nonsensical string
such as b'1234', while quote_plus() can handle these types properly if
passed intact. The ability to encode this type is particularly useful
for putting binary data into the query string, or for pre-encoded text
which you may want to encode in a non-standard character encoding.
2) Sometimes it would be desirable to encode query strings entirely in
"latin-1" or possibly "ascii" instead of "utf-8". Adding the extra
parameters now present on quote_plus() can easily give that extra
functionality.
I have attached a new version of urlencode() that provides both of the
above fixes/enhancements. Additionally, an unused codepath in the
existing function has been eliminated/cleaned up. Some doctests are
included as well.
----------
components: Library (Lib)
files: new_urlencode.py
message_count: 1.0
messages: 83434
nosy: dmahn
nosy_count: 1.0
severity: normal
status: open
title: urlencode does not handle "bytes", and could easily handle alternate encodings
type: behavior
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file13294/new_urlencode.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5468>
_______________________________________
More information about the Python-bugs-list
mailing list