[issue36338] urlparse of urllib returns wrong hostname

Thu Oct 24 06:38:18 EDT 2019

STINNER Victor <vstinner at python.org> added the comment:

OMG parsing an URL is a can of worms... There are so many open issues related to URL parsing!

* bpo-18191: urllib.parse.splitport("::1")
* bpo-20271: urllib.parse.urlparse('http://[::1]spam:80')
* bpo-28841: urlparse.urlparse() parses invalid URI without generating an error (examples provided)
* bpo-33342: urlsplit("//user:[@host")
* bpo-34360: 'http://[::1]]'
* bpo-35377: urlparse doesn't validate the scheme
* bpo-35748: 'http://www.google.com\@xxx.com'
* bpo-36338 (this issue): urlparse('http://demo.com[attacker.com]')
* bpo-37678: urlparse('http://user:pass#?[word@example.com:80/path')

Related:

* bpo-3647: urlparse - relative url parsing and joins to be RFC3986 compliance
* bpo-16909: urlparse: add userinfo attribute
* bpo-18140: issue with 'http://auser:secr#et@192.168.0.1:8080/a/b/c.html'
* bpo-22234: urllib.parse.urlparse accepts any falsy value as an url
* bpo-22852: "urllib.parse wrongly strips empty #fragment, ?query, //netloc"
* bpo-23328: issue with "http://someuser:a/b@10.11.12.13:1234"
* bpo-23448: "urllib2 needs to remove scope from IPv6 address when creating Host header"
* bpo-23505: [CVE-2015-2104] Urlparse insufficient validation leads to open redirect

There are 124 open issues with "urllib" in their title and 12 open issues with "urlparse" in their title.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36338>
_______________________________________