[Tutor] How do I get text from an HTML document.

alan.gauld@bt.com alan.gauld@bt.com
Wed, 14 Aug 2002 16:06:25 +0100


> I have HTML docs that have text between the comment tags:
> <!--Story-->
> Some text here
> <!--Story-->
> 
> What would be the simplest way to get this text. 

Pseudo code:

while 1: # find opening tag 
   line = file.readline
   if line == "<!--Story-->": break 

while 1: # collect lines up to closing tag 
   line = file.readline
   if line == "<!--Story-->": 
       break
   lines.append(line)  

Now use the HTML parser module to eliminate HTML tags 
leaving just the text....

HTH,

Alan G.