[New-bugs-announce] [issue13987] Handling of broken markup in HTMLParser on 2.7
report at bugs.python.org
Fri Feb 10 14:45:58 CET 2012
New submission from Ezio Melotti <ezio.melotti at gmail.com>:
The attached patch fixes a few problems with HTMLParser on 2.7.
Instead of raising error when invalid markup is detected, the parser now consumes the invalid input and proceeds. This patch is a partial backport of #1486713.
After this two more patches will follow.
The first will get rid of errors raised while parsing declarations and should also solve #13576:
def unknown_decl(self, data):
- self.error("unknown declaration: %r" % (data,))
The second will take care of "bogus comments" (see #13960).
Once this is done HTMLParser should be able to parse (almost) everything. I'm planning to commit this before the release of 2.7.3.
components: Library (Lib)
nosy: benjamin.peterson, eric.araujo, ezio.melotti, r.david.murray
stage: patch review
title: Handling of broken markup in HTMLParser on 2.7
versions: Python 2.7
Added file: http://bugs.python.org/file24475/issue13987.diff
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce