Handling bad tags with SGMLParser

Sean 'Shaleh' Perry shalehperry at attbi.com
Thu Mar 7 12:12:26 EST 2002


> 
> The user of SGMLParser needs to be able to handle invalid tags.  This
> handling may be complex or as simple as just ignoring it and asking
> SGMLParser to skip this tag and move along.  As far as I can tell this
> is not an option.
> 
> As a side note, the text error message thrown is particularly
> uninformation as it simply includes the first letter of the tag, in
> other words always '<'.
> 

match = special.match(rawdata, i)
if match:
    if self.literal:
        self.handle_data(rawdata[i])
        i = i+1
        continue
    # This is some sort of declaration; in "HTML as
    # deployed," this should only be the document type
    # declaration ("<!DOCTYPE html...>").
    k = self.parse_declaration(i)
    if k < 0: break
    i = k
    continue

is the offending code.  'special' is defined as re.compile(r'<![^<>]*>').

I see two options:

1) change the definition of special to a noop match.  Something that is
relatively cheap but can never match.

2) write your own parse_declaration() method.




More information about the Python-list mailing list