[Python-Dev] Need help to fix urllib(.parse) vulnerabilities

Serhiy Storchaka storchaka at gmail.com
Sat Jul 22 02:01:57 EDT 2017

21.07.17 13:02, Victor Stinner пише:
> Recently, two security vulnerabilities were reported in the urllib module:
> https://bugs.python.org/issue30500
> http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host
> => already fixed in Python 3.6.2
> https://bugs.python.org/issue29606
> http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
> => not fixed yet
> I also proposed a more general protection: "Reject newline character
> (U+000A) in URLs in urllib.parse":
> http://bugs.python.org/issue30713
> The problem with the urllib module is how we handle invalid URL. Right
> now, we return the URL unmodified if we cannot parse it. Should we
> raise an exception if an URL contains a newline for example?
> It's very hard to harden the urllib module without the backward
> compatibility. That's why it took 3 weeks to fix "urllib connects to a
> wrong host": find how to fix the vulnerability without brekaing the
> backward compatibility.
> Another proposed approach is to reject invalid data earlier or later,
> but not in urllib...

Checking an URL in urllib.parse is too early and not enough. The urllib 
module is general, and different protocols have different limitations. 
There are other ways besides urllib to pass invalid parameters to 
low-level protocol implementations.

I think the only reliable way of fixing the vulnerability is rejecting 
or escaping (as specified in RFC 2640) CR and LF inside sent lines. 
Adding the support of RFC 2640 is a new feature and can be added only in 
3.7. And this feature should be optional since not all servers support 
RFC 2640. https://github.com/python/cpython/pull/1214 does the right thing.

The other way of hardening the Python stdlib implementation of the FTP 
server is making it accepting only CRLF as a line delimiter, not sole CR 
or LF.

Additional sanity checks can be added in FTP.login() for earlier 
detecting and raising more specific errors.

Every protocol (FTP, HTTP, telnet, SMTP, POP3, IMAP, etc) should be 
fixed separately. If they allow escaping special characters, they should 
do this. Otherwise they should be rejected.

More information about the Python-Dev mailing list