Newby: How do I strip HTML tags?
Andy McKay
amckay at merlintechnologies.com
Mon Jun 10 13:08:48 EDT 2002
> Standalone "<" and ">" indicate invalid HTML code, one should use <
> and > instead. You are of course right, in the end a use of
> predefined classes is almost always better than reinventing the wheel
> yourself.
Of course the main problem is there is a lot of invalid HTML out there ;)
I cant remember where the article was, but I found a good one a while ago
that showed a list of valid, but admittedly rare situations that can crop up
a simple html parser. Should make set of unit tests for any parser
actually...
--
Andy McKay
Merlin Technologies
More information about the Python-list
mailing list