reply.in.the.newsgroup at my.address.is.invalid
Sat Dec 20 01:36:11 CET 2003
>The library docs show that there is an HTMLParser module and an
>htmllib module, both of which apparently contain classes named
>"HTMLParser". There is a bit of decription of differences, but it
>still doesn't seem clear to me what the intent is.
I think the intent is to use HTMLParser. Its newer, and its documentation
doesn't scare you off with phrases like "HTML 2.0" and "SGML" :-)
>Which one is the best choice for parsing arbitrary real-life Web pages?
Neither! Real-life web pages are typically not HTML-parseable. Try tyding
it up a bit first. See http://groups.google.nl/groups?th=58cd394d2e71137f
More information about the Python-list