Python web client anyone?

Paul Rubin phr-n2001d at nightsong.com
Sun Oct 14 20:25:54 EDT 2001


Richard Jones <richard at bizarsoftware.com.au> writes:
> > Thanks, this appears to include an HTTP client, which is a start, but
> > I was looking for something that actually parses the HTML on the
> > retrieved page like LWP does.  I wonder if there's some way to do that
> > with the XML libraries (though HTML is generally not well-formed
> > XML--for example it usually has unterminated <P> tags).  Any
> > thoughts?
> 
> htmllib?
> 
> If you want quick and simple DOM extraction, I have a module that extends 
> HTMLParser...

Perl LWP is a module for easily writing robot web clients.  It doesn't
exactly make a DOM, but it's the same idea, so DOM extraction would be
fine.  What I *really* want is to be able to easily find link objects
(anchor tags) based on the anchor text, which LWP for some reason
doesn't do, but DOM extraction would be a start.  By "anchor text" I
mean the text in <a href=blah.html>this is the anchor text</a>.  The
client should be able to find some "underlined" text on the page it
retrieves, and "click" on the linked document.

I may not have read the htmllib docs carefuly enough but it looks more
intended for formatting/displaying HTML than parsing it.  Are your
DOM extensions available?

Thanks




More information about the Python-list mailing list