Fwd: Re: [Web-SIG] Client-side support: what are we aiming for?
janssen at parc.com
Fri Oct 24 18:41:27 EDT 2003
> There has been an HTML parser in the standard library for *YEARS*. I don't
> think there is an action item here.
It's not a particularl *good* HTML parser, though. It's just a simple
syntax framework. It doesn't know about things like block elements,
which elements take IDs and which don't, etc. When I was working on
the Plucker distiller (a web crawler and HTML parser), I had to add
oodles of code to it.
Looking at the documentation for 2.3, I see "class HTMLParser: This is
the basic HTML parser class. It supports all entity names required by
the HTML 2.0 specification (RFC 1866). It also defines handlers for
all HTML 2.0 and many HTML 3.0 and 3.2 elements." We can do better
than that. 4.01, at least.
More information about the Web-SIG