[issue5468] urlencode does not handle "bytes", and could easily handle alternate encodings

Tue Mar 10 15:45:17 CET 2009

New submission from Dan Mahn <dan.mahn at digidescorp.com>:

urllib.parse.urlencode() uses quote_plus() extensively to create a
complete query string, but doesn't effectively/properly take advantage
of the flexibility built into quote_plus().  Namely:

1) Instances of type "bytes" are not properly encoded, as str() is used
prior to passing to quote_plus().  This creates a nonsensical string
such as b'1234', while quote_plus() can handle these types properly if
passed intact.  The ability to encode this type is particularly useful
for putting binary data into the query string, or for pre-encoded text
which you may want to encode in a non-standard character encoding.

2) Sometimes it would be desirable to encode query strings entirely in
"latin-1" or possibly "ascii" instead of "utf-8".  Adding the extra
parameters now present on quote_plus() can easily give that extra
functionality.

I have attached a new version of urlencode() that provides both of the
above fixes/enhancements.  Additionally, an unused codepath in the
existing function has been eliminated/cleaned up.  Some doctests are
included as well.

----------
components: Library (Lib)
files: new_urlencode.py
message_count: 1.0
messages: 83434
nosy: dmahn
nosy_count: 1.0
severity: normal
status: open
title: urlencode does not handle "bytes", and could easily handle alternate encodings
type: behavior
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file13294/new_urlencode.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5468>
_______________________________________