Extract links from html-page

Tony J Ibbs (Tibs) tony at lsl.co.uk
Thu May 11 07:48:04 EDT 2000


5[HH575-UAZWKVVP-7H2H48V3 (or so it appears in Outlook <fx:spit>) wrote:
> I`d like to extract just the links in a web-page or html-document. It
> would be nice if relative links [were expanded]

> The result should be a list of links.

Hmm. Well, the attached Python script produces a file containing the links,
from a file, and it doesn't expand relative links, so you'd need to glue the
innards into something else if you also want it to *retrieve* the page
first, but that shouldn't be hard to do (and I'm fairly sure there are
plenty of "grab that URL" scripts around - I certainly have several
variants).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: anchors.py
Type: application/octet-stream
Size: 4076 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20000511/0e38792d/attachment.obj>


More information about the Python-list mailing list