stripping HTML comments in the face of programmer errors

Dennis Reinhardt DennisR at dair.com
Fri Nov 8 12:36:03 EST 2002


> Anybody out there got a bit of code which implements a useful heuristic
for
> that case?  Ideally, stripping comments from the above would yield
I don't have code, but I do have a heuristic.  Rather than nest, I would
scan for the next token in the input stream using a two state algorithm:

Step 1: if <!-- is earliest in input, emit unprocessed text to left of <!--
and go to step 2
           if --> is earliest in input, delete unprocessed text and remain
in step 1
           if end of file is earliest in input, emit remaining unprocessed
text and exit

Step 2: if <!-- is earliest in input, delete unprocessed text including <!--
and remain in step 2
           if --> is earliest in input, delete unprocessed text
including --> and go to step 1
           if end of file is earliest in input, delete remaining unprocessed
text and exit

hth
--

Dennis Reinhardt

http://www.dair.com





More information about the Python-list mailing list