[issue29651] Inconsistent/undocumented urlsplit/urlparse behavior on invalid inputs
urlparse('http:////::\\\\!!::!!++///') ParseResult(scheme='http', netloc='', path='//::\\\\!!::!!++///',
New submission from Vasiliy Faronov: There is a problem with the standard library's urlsplit and urlparse functions, in Python 2.7 (module urlparse) and 3.2+ (module urllib.parse). The documentation for these functions [1] does not explain how they behave when given an invalid URL. One could try invoking them manually and conclude that they tolerate anything thrown at them: params='', query='', fragment='')
urlparse(os.urandom(32).decode('latin-1')) ParseResult(scheme='', netloc='', path='\x7f¼â1gdä»6\x82', params='', query='', fragment='\n\xadJ\x18+fli\x9cÛ\x9ak*ÄÅ\x02³F\x85Ç\x18')
Without studying the source code, it is impossible to know that there is a very narrow class of inputs on which they raise ValueError [2]:
urlparse('http://[') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/urllib/parse.py", line 295, in urlparse splitresult = urlsplit(url, scheme, allow_fragments) File "/usr/lib/python3.5/urllib/parse.py", line 345, in urlsplit raise ValueError("Invalid IPv6 URL") ValueError: Invalid IPv6 URL
This could be viewed as a documentation issue. But it could also be viewed as an implementation issue. Instead of raising ValueError on those square brackets, urlsplit could simply consider them *invalid* parts of an RFC 3986 reg-name, and lump them into netloc, as it already does with other *invalid* characters:
urlparse('http://\0\0æí\n/') ParseResult(scheme='http', netloc='\x00\x00æí\n', path='/', params='', query='', fragment='')
Note that the raising behavior was introduced in Python 2.7/3.2. See also issue 8721 [3]. [1] https://docs.python.org/3/library/urllib.parse.html [2] https://github.com/python/cpython/blob/e32ec93/Lib/urllib/parse.py#L406-L408 [3] http://bugs.python.org/issue8721 ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 288577 nosy: docs@python, vfaronov priority: normal severity: normal status: open title: Inconsistent/undocumented urlsplit/urlparse behavior on invalid inputs type: behavior versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Terry J. Reedy <tjreedy@udel.edu>: ---------- nosy: +orsenthil stage: -> needs patch versions: -Python 3.3, Python 3.4, Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Raymond Hettinger added the comment: A note in the docs would be useful. This API is far too well established to make any behavioral changes at this point. ---------- nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Howie Benefiel added the comment: I'm going to make a note in the documentation. I should have a PR for it in about 1 day. ---------- nosy: +Howie Benefiel _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Roundup Robot <devnull@psf.upfronthosting.co.za>: ---------- pull_requests: +1263 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Berker Peksag <berker.peksag@gmail.com>: ---------- stage: needs patch -> patch review versions: +Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Senthil Kumaran added the comment: New changeset f6e863d868a621594df2a8abe072b5d4766e7137 by Senthil Kumaran (Howie Benefiel) in branch 'master': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) https://github.com/python/cpython/commit/f6e863d868a621594df2a8abe072b5d4766... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Senthil Kumaran <senthil@uthcode.com>: ---------- pull_requests: +1690 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Senthil Kumaran <senthil@uthcode.com>: ---------- pull_requests: +1691 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Senthil Kumaran added the comment: New changeset 72e5aa1ef812358b3b113e784e7365fec13dfd69 by Senthil Kumaran in branch '3.5': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) (#1597) https://github.com/python/cpython/commit/72e5aa1ef812358b3b113e784e7365fec13... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Senthil Kumaran added the comment: New changeset 75b8a54bcad70806d9dcbbe20786f4d9092ab39c by Senthil Kumaran in branch '3.6': bpo-29651 - Cover edge case of square brackets in urllib docs (#1128) (#1596) https://github.com/python/cpython/commit/75b8a54bcad70806d9dcbbe20786f4d9092... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
Changes by Senthil Kumaran <senthil@uthcode.com>: ---------- resolution: -> fixed stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29651> _______________________________________
participants (7)
-
Berker Peksag -
Howie Benefiel -
Raymond Hettinger -
Roundup Robot -
Senthil Kumaran -
Terry J. Reedy -
Vasiliy Faronov