[New-bugs-announce] [issue8885] markerbase declaration errors aren't recoverable

Mark Nottingham report at bugs.python.org
Thu Jun 3 13:14:11 CEST 2010

New submission from Mark Nottingham <mnot at mnot.net>:

In markupbase.py's ParserBase.parse_declaration, an unexpected character is caught like this:

                    "unexpected %r char in declaration" % rawdata[j])

However, the position (j) isn't updated, which means that error() will be called again once it returns.

For example, this declaration:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" http://www.w3.org/TR/html4/loose.dtd>

(which I think is generated by MS Office) will trigger this behaviour.

Two possible resolutions:

1) increment J and try the next character in this case

2) document that error() is not recoverable; i.e., it MUST raise an exception.

My preference is strongly for #1 (as HTML parsing should be forgiving, and HTMLParser is based upon markerbase).

components: Library (Lib)
messages: 106938
nosy: mnot
priority: normal
severity: normal
status: open
title: markerbase declaration errors aren't recoverable
type: behavior
versions: Python 2.6

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list