[Tutor] Re: Stripping HTML tags.

Andrei project5 at redrival.net
Fri Apr 16 20:46:24 EDT 2004


Dave S wrote on Saturday 17 April 2004 01:28:

> Alan Gauld wrote:
> 
>>>It works but seems a bit messy. Is there a neater way to do this ?
>>
>>Yes use the html parser module. There is some sample code that shows
>>how
>>to strip all tags to get plain text from an html file. And your
>>code is less reliable(tags spanning lines, nested tags etc) than
>>the html parser code...
> Thanks for that ... I looked at that module but did not think it could
> do what I wanted.
> I will take another look :-)

I think it (the SGML parser might be even easier in this case as you don't
care about the tags) would do the job just fine. Manual parsing means
you'll have to perform quite a lot of debugging to catch special cases, as
Danny and Alan mentioned before me. I've tried parsing HTML with regexes
too (for a different purpose though) and gave up on it in favor of a parser
from the batteries :).

-- 
Yours,

Andrei

=====
Real contact info (decode with rot13):
cebwrpg5 at jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.




More information about the Tutor mailing list