[New-bugs-announce] [issue2637] urllib.quote() escapes characters unnecessarily and contrary to docs

Tim Lesher report at bugs.python.org
Tue Apr 15 17:09:11 CEST 2008


New submission from Tim Lesher <tlesher at gmail.com>:

The urllib.quote docstring implies that it quotes only characters in RFC
2396's "reserved" set.

However, urllib.quote currently escapes all characters except those in
an "always_safe" list, which consists of alphanumerics and three
punctuation characters, "_.-".

This behavior is contrary to the RFC, which defines "unreserved"
characters as alphanumerics plus "mark" characters, or "-_.!~*'()".  

The RFC also says:

  Unreserved characters can be escaped without changing the semantics
  of the URI, but this should not be done unless the URI is being used
  in a context that does not allow the unescaped character to appear.

This seems to imply that "always_safe" should correspond to the RFC's
"unreserved" set of "alphanum" | "mark".

----------
components: Library (Lib)
messages: 65518
nosy: tlesher
severity: normal
status: open
title: urllib.quote() escapes characters unnecessarily and contrary to docs
type: behavior
versions: Python 2.5

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2637>
__________________________________


More information about the New-bugs-announce mailing list