[issue9061] cgi.escape Can Lead To XSS Vulnerabilities

New submission from Craig Younkins <cyounkins@gmail.com>: The method in question: http://docs.python.org/library/cgi.html#cgi.escape http://svn.python.org/view/python/tags/r265/Lib/cgi.py?view=markup # at the bottom http://code.python.org/hg/trunk/file/3be6ff1eebac/Lib/cgi.py#l1031 "Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character ('"') is also translated; this helps for inclusion in an HTML attribute value, as in <A HREF="...">. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead." cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping. The intended use of this method is unclear to me. Up to and including Mako 0.3.3, this method was the HTML escaping method. Used in this manner, single-quoted attributes with user-supplied data are easily susceptible to cross-site scripting vulnerabilities. While the documentation says "if the value to be quoted might include single- or double-quote characters... [use the] xml.sax.saxutils module instead," it also implies that this method will make input safe for HTML. Because this method escapes 4 of the 5 key XML characters, it is reasonable to expect some will use it for HTML escaping. I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML. This is definitely affecting the security of some Python web applications. I already mentioned Mako, but I've found this type of bug in other frameworks and engines because the creators either called cgi.escape directly or modeled their own after it. Craig Younkins ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 108457 nosy: Craig.Younkins, docs@python priority: normal severity: normal status: open title: cgi.escape Can Lead To XSS Vulnerabilities versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Craig Younkins <cyounkins@gmail.com>: ---------- type: -> security _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Craig Younkins <cyounkins@gmail.com> added the comment: Proof of concept: print """<body class='%s'></body>""" % cgi.escape("' onload='alert(1);' bad='") ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Senthil Kumaran <orsenthil@gmail.com> added the comment: On Wed, Jun 23, 2010 at 03:46:35PM +0000, Craig Younkins wrote:
cgi.escape is for HTML attribute escaping only. I guess, you should explain or point out to resources where 'single quotes' representation in a non-entity format in a HTML page has lead to XSS.
The intended use of this method is unclear to me.
Escape HTML characters (most commonly), >,<, & and ". And mostly when constructing responses where these characters are literally required.
"More suitable" for HTML would be the correct interpretation rather make the "input safe". You might check the reference documentation leading to xml.sax.saxutils.
I suggest rewording the documentation for the method making it more clear what it should and should not be used for.
The very next paragraph seems to address the security considerations while using the cgi module itself, rather than limiting it to cgi.escape. It says that: "To be on the safe side, if you must pass a string gotten from a form to a shell command, you should make sure the string contains only alphanumeric characters, dashes, underscores, and periods." With respect your bug report: 1. Any doc change suggestions you propose? (After pointing out the resources requested in first para) 2. If cgi.escape needs to escape single quotes, what should it be as: lsquo/rsquo (for XHTML) and ' or ' for Others? ---------- nosy: +orsenthil _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Craig Younkins <cyounkins@gmail.com> added the comment:
cgi.escape is for HTML attribute escaping only.
It is not safe for HTML attribute escaping because it does not encode single quotes.
"More suitable" for HTML would be the correct interpretation rather make the "input safe".
"More suitable, but not quite secure" Regardless of the intended use of this method, many many people are using it for insecure HTML entity escaping.
print "<body class='%s'></body>" % cgi.escape("' onload='alert(1);' bad='")
The security concerns related to output on the web are very different from the concerns related sending user input to a shell command. The needed escaping is completely different. Also, the security advice above is woefully inadequate.
Any doc change suggestions you propose?
Convert the characters '&', '<' and '>' in string s to their HTML entity encoded values. If the optional flag quote is true, the double-quotation mark character ('"') is also encoded. Note that the output of this method is not safe to put in an HTML attribute because it does not escape single quotes. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead.
If cgi.escape needs to escape single quotes, what should it be as: lsquo/rsquo (for XHTML) and ' or ' for Others?
Sorry, I should have included that in the OP. It should escape to ' It is also advised to escape the forward slash character ('/') to / See OWASP.org for an explanation of the complexities of the escaping: http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_S... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Georg Brandl <georg@python.org>: ---------- priority: normal -> release blocker _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Barry A. Warsaw <barry@python.org> added the comment: Unless someone can upload a specific patch to review in the next couple of hours, I'm going to reduce the priority for 2.6.6rc1. ---------- nosy: +barry _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Georg Brandl <georg@python.org> added the comment: Applied doc patch to 2.6 in r83539. ---------- nosy: +georg.brandl priority: release blocker -> critical versions: -Python 2.5, Python 2.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Éric Araujo <merwok@netwok.org> added the comment: Are 2.6 docs built by an older Sphinx version? I wonder why the text uses “the :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module” and not “:func:`~xml.sax.saxutils.quoteattr” to get a direct link (or even just “consider using :func:`xml.sax.saxutils.quoteattr`.”). ---------- nosy: +merwok _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Georg Brandl <georg@python.org> added the comment: No, that's just a relic from the olden LaTeX days, and I've not paid attention enough to fix it :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Fred L. Drake, Jr. <fdrake@acm.org> added the comment: Such constructs are notoriously tedious to grep for; patches are welcome. ---------- nosy: +fdrake _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Éric Araujo <merwok@netwok.org> added the comment: Markup nit fixed in r83999 (py3k) and r84001 (stupid typo), r84002 (3.1), r84003 (2.7). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Benjamin Peterson <benjamin@python.org>: ---------- resolution: -> duplicate status: open -> closed superseder: -> Copy cgi.escape() to html _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Craig Younkins <cyounkins@gmail.com>: ---------- type: -> security _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Craig Younkins <cyounkins@gmail.com> added the comment: Proof of concept: print """<body class='%s'></body>""" % cgi.escape("' onload='alert(1);' bad='") ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Senthil Kumaran <orsenthil@gmail.com> added the comment: On Wed, Jun 23, 2010 at 03:46:35PM +0000, Craig Younkins wrote:
cgi.escape is for HTML attribute escaping only. I guess, you should explain or point out to resources where 'single quotes' representation in a non-entity format in a HTML page has lead to XSS.
The intended use of this method is unclear to me.
Escape HTML characters (most commonly), >,<, & and ". And mostly when constructing responses where these characters are literally required.
"More suitable" for HTML would be the correct interpretation rather make the "input safe". You might check the reference documentation leading to xml.sax.saxutils.
I suggest rewording the documentation for the method making it more clear what it should and should not be used for.
The very next paragraph seems to address the security considerations while using the cgi module itself, rather than limiting it to cgi.escape. It says that: "To be on the safe side, if you must pass a string gotten from a form to a shell command, you should make sure the string contains only alphanumeric characters, dashes, underscores, and periods." With respect your bug report: 1. Any doc change suggestions you propose? (After pointing out the resources requested in first para) 2. If cgi.escape needs to escape single quotes, what should it be as: lsquo/rsquo (for XHTML) and ' or ' for Others? ---------- nosy: +orsenthil _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Craig Younkins <cyounkins@gmail.com> added the comment:
cgi.escape is for HTML attribute escaping only.
It is not safe for HTML attribute escaping because it does not encode single quotes.
"More suitable" for HTML would be the correct interpretation rather make the "input safe".
"More suitable, but not quite secure" Regardless of the intended use of this method, many many people are using it for insecure HTML entity escaping.
print "<body class='%s'></body>" % cgi.escape("' onload='alert(1);' bad='")
The security concerns related to output on the web are very different from the concerns related sending user input to a shell command. The needed escaping is completely different. Also, the security advice above is woefully inadequate.
Any doc change suggestions you propose?
Convert the characters '&', '<' and '>' in string s to their HTML entity encoded values. If the optional flag quote is true, the double-quotation mark character ('"') is also encoded. Note that the output of this method is not safe to put in an HTML attribute because it does not escape single quotes. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead.
If cgi.escape needs to escape single quotes, what should it be as: lsquo/rsquo (for XHTML) and ' or ' for Others?
Sorry, I should have included that in the OP. It should escape to ' It is also advised to escape the forward slash character ('/') to / See OWASP.org for an explanation of the complexities of the escaping: http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_S... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Georg Brandl <georg@python.org>: ---------- priority: normal -> release blocker _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Barry A. Warsaw <barry@python.org> added the comment: Unless someone can upload a specific patch to review in the next couple of hours, I'm going to reduce the priority for 2.6.6rc1. ---------- nosy: +barry _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Georg Brandl <georg@python.org> added the comment: Applied doc patch to 2.6 in r83539. ---------- nosy: +georg.brandl priority: release blocker -> critical versions: -Python 2.5, Python 2.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Éric Araujo <merwok@netwok.org> added the comment: Are 2.6 docs built by an older Sphinx version? I wonder why the text uses “the :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module” and not “:func:`~xml.sax.saxutils.quoteattr” to get a direct link (or even just “consider using :func:`xml.sax.saxutils.quoteattr`.”). ---------- nosy: +merwok _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Georg Brandl <georg@python.org> added the comment: No, that's just a relic from the olden LaTeX days, and I've not paid attention enough to fix it :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Fred L. Drake, Jr. <fdrake@acm.org> added the comment: Such constructs are notoriously tedious to grep for; patches are welcome. ---------- nosy: +fdrake _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Éric Araujo <merwok@netwok.org> added the comment: Markup nit fixed in r83999 (py3k) and r84001 (stupid typo), r84002 (3.1), r84003 (2.7). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________

Changes by Benjamin Peterson <benjamin@python.org>: ---------- resolution: -> duplicate status: open -> closed superseder: -> Copy cgi.escape() to html _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue9061> _______________________________________
participants (7)
-
Barry A. Warsaw
-
Benjamin Peterson
-
Craig Younkins
-
Fred L. Drake, Jr.
-
Georg Brandl
-
Senthil Kumaran
-
Éric Araujo