Difference between HTMLParser and htmllib.HTMLParser

Emile van Sebille emile at fenx.com
Thu Jun 27 10:14:35 EDT 2002


"Dave Swegen" <dswegen at software.plasmon.com> wrote in message
news:mailman.1025178934.31606.python-list at python.org...
> I've just started playing with these, and I'm curious which is the
> better of the two (I noticed the HTMLParser module only came in in
2.2).
> And is there any reason why the names are so confusingly similiar? I
> tried searching in google for info on this, but the daft naming made
it
> rather hard ;)
>

A quick look at the code shows that HTMLParser is part of a family that
subclasses markupbase.ParserBase and doesn't need to specify each of the
individual tags it understands, allowing it to adapt more easily.

>From the initial cvs checkin just over a year ago:
(ref
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Li
b/HTMLParser.py )

"""
A much improved HTML parser -- a replacement for sgmllib.  The API is
derived from but not quite compatible with that of sgmllib, so it's a
new file.  I suppose it needs documentation, and htmllib needs to be
changed to use this instead of sgmllib, and sgmllib needs to be
declared obsolete.  But that can all be done later.

This code was first published as part of TAL (part of Zope Page
Templates), but that was strongly based on sgmllib anyway.  Authors
are Fred drake and Guido van Rossum.
"""

HTH,

--

Emile van Sebille
emile at fenx.com

---------




More information about the Python-list mailing list