[Tutor] How do I get text from an HTML document.
alan.gauld@bt.com
alan.gauld@bt.com
Wed, 14 Aug 2002 16:06:25 +0100
> I have HTML docs that have text between the comment tags:
> <!--Story-->
> Some text here
> <!--Story-->
>
> What would be the simplest way to get this text.
Pseudo code:
while 1: # find opening tag
line = file.readline
if line == "<!--Story-->": break
while 1: # collect lines up to closing tag
line = file.readline
if line == "<!--Story-->":
break
lines.append(line)
Now use the HTML parser module to eliminate HTML tags
leaving just the text....
HTH,
Alan G.