Python web client anyone?
Paul Rubin
phr-n2001d at nightsong.com
Sun Oct 14 20:25:54 EDT 2001
Richard Jones <richard at bizarsoftware.com.au> writes:
> > Thanks, this appears to include an HTTP client, which is a start, but
> > I was looking for something that actually parses the HTML on the
> > retrieved page like LWP does. I wonder if there's some way to do that
> > with the XML libraries (though HTML is generally not well-formed
> > XML--for example it usually has unterminated <P> tags). Any
> > thoughts?
>
> htmllib?
>
> If you want quick and simple DOM extraction, I have a module that extends
> HTMLParser...
Perl LWP is a module for easily writing robot web clients. It doesn't
exactly make a DOM, but it's the same idea, so DOM extraction would be
fine. What I *really* want is to be able to easily find link objects
(anchor tags) based on the anchor text, which LWP for some reason
doesn't do, but DOM extraction would be a start. By "anchor text" I
mean the text in <a href=blah.html>this is the anchor text</a>. The
client should be able to find some "underlined" text on the page it
retrieves, and "click" on the linked document.
I may not have read the htmllib docs carefuly enough but it looks more
intended for formatting/displaying HTML than parsing it. Are your
DOM extensions available?
Thanks
More information about the Python-list
mailing list