Which HTMLParser?

Tuang tuanglen at hotmail.com
Thu Dec 18 17:27:22 EST 2003


The library docs show that there is an HTMLParser module and an
htmllib module, both of which apparently contain classes named
"HTMLParser". There is a bit of decription of differences, but it
still doesn't seem clear to me what the intent is.

Which one is the best choice for parsing arbitrary real-life Web
pages? I get the feeling that maybe the HTMLParser module is the more
recent, more practical utility, while the htmllib version is the older
one, retained for backward compatibility, but I'm not sure. The docs
don't exactly say that.

Any recommendations or clarifications of what's going on would be
helpful.

Thanks.




More information about the Python-list mailing list