How do I get to *all* of the groups of an re search?

Daniel Yoo dyoo at
Fri Jan 10 22:55:54 CET 2003

Kyler Laird <Kyler at> wrote:

:>    Get an HTML parser--then be ready to
:>    tweak it to accept all the junk that roams
:>    around in the wild.

: Exactly.  I think I've thrown up my hands most times I've
: attempted to use an HTML parser.  I considered it for this
: task but after thinking about it for awhile I decided that an
: RE would be far more elegant.

Hi Kyle,

I know this isn't quite addressing your question, but have you seen

This utility can enforce a kind of structure to even weird HTML, so
that you can more easily use 'sgmllib' or an HTML parser on tidy'ed
HTML.  Of course, it's not perfect, but it does work admirably well.

There is a Python interface to HTML-Tidy by the author of the

Good luck to you!

More information about the Python-list mailing list