[New-bugs-announce] [issue8885] markerbase declaration errors aren't recoverable
report at bugs.python.org
Thu Jun 3 13:14:11 CEST 2010
New submission from Mark Nottingham <mnot at mnot.net>:
In markupbase.py's ParserBase.parse_declaration, an unexpected character is caught like this:
"unexpected %r char in declaration" % rawdata[j])
However, the position (j) isn't updated, which means that error() will be called again once it returns.
For example, this declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" http://www.w3.org/TR/html4/loose.dtd>
(which I think is generated by MS Office) will trigger this behaviour.
Two possible resolutions:
1) increment J and try the next character in this case
2) document that error() is not recoverable; i.e., it MUST raise an exception.
My preference is strongly for #1 (as HTML parsing should be forgiving, and HTMLParser is based upon markerbase).
components: Library (Lib)
title: markerbase declaration errors aren't recoverable
versions: Python 2.6
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce