web crawler help?
Syver Enstad
syver-en+usenet at online.no
Thu Sep 12 16:19:48 EDT 2002
s_gherman at yahoo.com (Sorin Gherman) writes:
> "koko" <kokohh at hotmail.com> wrote in message
> news:<JARd9.6927$yt3.3340577 at newssrv26.news.prodigy.com>...
>
> > is there any sample for basic web crawler, that ask for a starting
> url and
>
> > log the url and extract the hyperlinks?
> > thx
>
> It's a very simple one in Mark Pilgrim's "Dive into Python" book,
> whose text is freely available at: http://diveintopython.org/
> Check the "HTML processing" chapter. It contains a urllister.py 9
> lines program, followed by a 7 lines usage example which does just
> that: given a URL for a HTML file, it lists the hyperlinks inside it.
>
> /sorin gherman
You could also check out Aahz's slides about multithreading, they show
the development of different threading architectures for a spider.
http://starship.python.net/crew/aahz/OSCON2001/index.html
--
Vennlig hilsen
Syver Enstad
More information about the Python-list
mailing list