On Wed, 2004-08-25 at 14:56, Chuck Bearden wrote:
> On Mon, Aug 23, 2004 at 10:31:11AM -0600, Uche Ogbuji wrote:
> >
> > Honestly, I don't think DOM is the way I would personally go about
> > processing HTML, which is why I was trying to get at whether there was
> > another way for you to meet your needs.
> I think I understand what you are getting at, but personally I have
> found twisted.web.microdom with 'beExtremelyLenient=True', with perhaps
> an mx.Tidying stage beforehand, to be invaluable in mining data from
> database-generated webpages built with crappy HTML.  Consider the pages
> displaying individual patent records at the USPTO, e.g. [1].  If you 
> need to treat such pages as if they were XML records to be parsed and
> loaded into a database, something like twisted.web.microdom is a big 
> help.

Is this available without installing all of Twisted?

