Newby: How do I strip HTML tags?

Andy McKay amckay at merlintechnologies.com
Mon Jun 10 19:08:48 CEST 2002


> Standalone "<" and ">" indicate invalid HTML code, one should use <
> and > instead. You are of course right, in the end a use of
> predefined classes is almost always better than reinventing the wheel
> yourself.

Of course the main problem is there is a lot of invalid HTML out there ;)

I cant remember where the article was, but I found a good one a while ago
that showed a list of valid, but admittedly rare situations that can crop up
a simple html parser. Should make set of unit tests for any parser
actually...

-- 
  Andy McKay
  Merlin Technologies





More information about the Python-list mailing list