[New-bugs-announce] [issue40338] [Security] urllib and anti-slash (\) in the hostname

STINNER Victor report at bugs.python.org
Mon Apr 20 10:43:38 EDT 2020


New submission from STINNER Victor <vstinner at python.org>:

David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

  https://xdavidhu.me\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

* RFC: https://tools.ietf.org/html/rfc3986#appendix-B
* WHATWG: https://url.spec.whatwg.org/#relative-state

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.

----------
components: Library (Lib)
messages: 366832
nosy: vstinner
priority: normal
severity: normal
status: open
title: [Security] urllib and anti-slash (\) in the hostname
type: security
versions: Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40338>
_______________________________________


More information about the New-bugs-announce mailing list