[Python-checkins] cpython (merge 3.3 -> default): #20288: merge with 3.3.

ezio.melotti python-checkins at python.org
Sat Feb 1 20:23:15 CET 2014


http://hg.python.org/cpython/rev/92b3928bfde1
changeset:   88888:92b3928bfde1
parent:      88885:b1f214165471
parent:      88887:32097f193892
user:        Ezio Melotti <ezio.melotti at gmail.com>
date:        Sat Feb 01 21:22:26 2014 +0200
summary:
  #20288: merge with 3.3.

files:
  Lib/html/parser.py          |  6 +++---
  Lib/test/test_htmlparser.py |  6 ++++++
  Misc/NEWS                   |  2 ++
  3 files changed, 11 insertions(+), 3 deletions(-)


diff --git a/Lib/html/parser.py b/Lib/html/parser.py
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -264,9 +264,9 @@
                     i = self.updatepos(i, k)
                     continue
                 else:
-                    if ";" in rawdata[i:]: #bail by consuming &#
-                        self.handle_data(rawdata[0:2])
-                        i = self.updatepos(i, 2)
+                    if ";" in rawdata[i:]:  # bail by consuming &#
+                        self.handle_data(rawdata[i:i+2])
+                        i = self.updatepos(i, i+2)
                     break
             elif startswith('&', i):
                 match = entityref.match(rawdata, i)
diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py
--- a/Lib/test/test_htmlparser.py
+++ b/Lib/test/test_htmlparser.py
@@ -167,6 +167,12 @@
             ("data", "&#bad;"),
             ("endtag", "p"),
         ])
+        # add the [] as a workaround to avoid buffering (see #20288)
+        self._run_check(["<div>&#bad;</div>"], [
+            ("starttag", "div", []),
+            ("data", "&#bad;"),
+            ("endtag", "div"),
+        ])
 
     def test_unclosed_entityref(self):
         self._run_check("&entityref foo", [
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -41,6 +41,8 @@
   ValueError instead of assert for forbidden subprocess_{shell,exec}
   arguments.  (More to follow -- a convenience API for subprocesses.)
 
+- Issue #20288: fix handling of invalid numeric charrefs in HTMLParser.
+
 - Issue #20424: Python implementation of io.StringIO now supports lone surrogates.
 
 - Issue #20308: inspect.signature now works on classes without user-defined

-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list