Newby: How do I strip HTML tags?

Andy McKay amckay at
Mon Jun 10 19:08:48 CEST 2002

> Standalone "<" and ">" indicate invalid HTML code, one should use <
> and > instead. You are of course right, in the end a use of
> predefined classes is almost always better than reinventing the wheel
> yourself.

Of course the main problem is there is a lot of invalid HTML out there ;)

I cant remember where the article was, but I found a good one a while ago
that showed a list of valid, but admittedly rare situations that can crop up
a simple html parser. Should make set of unit tests for any parser

  Andy McKay
  Merlin Technologies

More information about the Python-list mailing list