[Python-Dev] Need help to fix urllib(.parse) vulnerabilities

Fri Jul 21 06:02:39 EDT 2017

Hi,

Recently, two security vulnerabilities were reported in the urllib module:

https://bugs.python.org/issue30500
http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host
=> already fixed in Python 3.6.2

https://bugs.python.org/issue29606
http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
=> not fixed yet

I also proposed a more general protection: "Reject newline character
(U+000A) in URLs in urllib.parse":
http://bugs.python.org/issue30713

The problem with the urllib module is how we handle invalid URL. Right
now, we return the URL unmodified if we cannot parse it. Should we
raise an exception if an URL contains a newline for example?

It's very hard to harden the urllib module without the backward
compatibility. That's why it took 3 weeks to fix "urllib connects to a
wrong host": find how to fix the vulnerability without brekaing the
backward compatibility.

Another proposed approach is to reject invalid data earlier or later,
but not in urllib...

So if you understand URLs, HTTP, etc. : please join these issues to
help us to fix them!

Victor