Python web client anyone?

Paul Rubin phr-n2001d at nightsong.com
Mon Oct 15 14:38:13 EDT 2001


ngps at madcap.dyndns.org (Ng Pheng Siong) writes:
> According to Paul Rubin  <phr-n2001d at nightsong.com>:
> > What I *really* want is to be able to easily find link objects
> > (anchor tags) based on the anchor text, which LWP for some reason
> > doesn't do, but DOM extraction would be a start.  By "anchor text" I
> > mean the text in <a href=blah.html>this is the anchor text</a>.  The
> > client should be able to find some "underlined" text on the page it
> > retrieves, and "click" on the linked document.
> 
> Surely, you find the tags by parsing "<a href=blah.html>" (sic), not by
> looking for "this is the anchor text"?

No, I meant looking for "this is the anchor text".  I don't understand
why LWP doesn't do that.  If I tell you that you can read about LWP by
browsing www.cpan.org and clicking on the link that says "The LWP Web
Client", that's natural and you can follow the instructions very
easily.  So why should it be hard to program a robot to do that?

> htmllib parses fine enough. Here's a demo from M2Crypto. It seems to work,
> too. ;-)

Thanks, that's the kind of thing I have in mind--a more convenient
interface would have been nice, but it's a start.



More information about the Python-list mailing list