[Web-SIG] Are both htmllib and HTMLParser needed?

Brett Cannon brett at python.org
Wed Feb 20 21:27:34 CET 2008


On Feb 20, 2008 7:43 AM, Fred Drake <fdrake at gmail.com> wrote:
> On Feb 20, 2008 9:35 AM, Guido van Rossum <guido at python.org> wrote:
> > ISTR that HTMLParser was the preferred one. It is certainly newer, and
> > doesn't carry the baggage of sgmllib which I would discard together
> > with htmllib). Maybe Fred Drake remembers (he's listed as the
> > co-author on the initial checkin message).
>
> I was thinking I'd said something on the stdlib-sig list, but I can't
> find it in the archive, so I must be having a senior moment (brought
> on early by kids).
>
> I'd be in favor of keeping only HTMLParser, with a compliant module
> name ("htmlparser" doesn't seem unreasonable).  The code was
> originally derived from htmllib for the Grail webbrowser, mostly to
> make things like attribute handling less painful.
>
> Merging _markupbase into HTMLParser to create htmlparser would be
> pretty mechanical.  Removing sgmllib and htmllib does not depend on
> that, and can be done at any time if there's agreement.
>

Works for me. Then the current plan is:

HTMLParser -> html.parser
htmlentitydefs -> html.entities

And remove both htmllib and sgmllib.

I will run this by the stdlib-sig as well.

-Brett


More information about the Web-SIG mailing list