Handling bad tags with SGMLParser
Sean 'Shaleh' Perry
shalehperry at attbi.com
Thu Mar 7 12:12:26 EST 2002
>
> The user of SGMLParser needs to be able to handle invalid tags. This
> handling may be complex or as simple as just ignoring it and asking
> SGMLParser to skip this tag and move along. As far as I can tell this
> is not an option.
>
> As a side note, the text error message thrown is particularly
> uninformation as it simply includes the first letter of the tag, in
> other words always '<'.
>
match = special.match(rawdata, i)
if match:
if self.literal:
self.handle_data(rawdata[i])
i = i+1
continue
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>").
k = self.parse_declaration(i)
if k < 0: break
i = k
continue
is the offending code. 'special' is defined as re.compile(r'<![^<>]*>').
I see two options:
1) change the definition of special to a noop match. Something that is
relatively cheap but can never match.
2) write your own parse_declaration() method.
More information about the Python-list
mailing list