[issue2637] urllib.quote() escapes characters unnecessarily and contrary to docs

Senthil report at bugs.python.org
Sun Aug 9 20:24:32 CEST 2009


Senthil <orsenthil at gmail.com> added the comment:

On Sun, Aug 09, 2009 at 03:40:47PM +0000, Nir Soffer wrote:
> for query string. This will break exiting code that assume the default
> safe parameters.
> 
> Other characters may be unsafe in other parts of the url - I did not

I agree with your comments and I had similar thoughts too.

The RFC spec says that different components in URL  have have
different characters that needs to be quoted.

The quote function is documented that it is *intended for path
component* and Python Documention provides a usage overview of quote
assuming that the developer will know what he/she is doing. It does
not deal with the specifics of quote w.r.t to URL components.

My comment was biased from the changes made to urllib.urlopen function
where we explicitly passed on reserved characters to the safe
parameter of quote and we got expected results. this change has been
there for few months now without any breakage reports.  And that
change was not according to any RFC but more based on the practical
issues encountered.

Yes, I agree that changes to quote function is bound to break the
code which relied on the earlier behaviour. I see at least 3 tests in
stdlib breaking, which can be modified without any loss in meaning, if
we want go with the change.

But, I feel it is okay to heed to your objection and leave the
function as it is. 
The need to change it does not have a strong backing in RFC.  It is a
not a bug, considering the documentation.

Only thing to live with will be urlopen's passing of safe characters.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2637>
_______________________________________


More information about the Python-bugs-list mailing list