web crawler help?

Syver Enstad syver-en+usenet at online.no
Thu Sep 12 16:19:48 EDT 2002


s_gherman at yahoo.com (Sorin Gherman) writes:

> "koko" <kokohh at hotmail.com> wrote in message
> news:<JARd9.6927$yt3.3340577 at newssrv26.news.prodigy.com>...
> 
> > is there any sample for basic web crawler, that ask for a starting
> url and
> 
> > log the url and extract the hyperlinks?
> > thx
> 
>     It's a very simple one in Mark Pilgrim's "Dive into Python" book,
> whose text is freely available at: http://diveintopython.org/
>     Check the "HTML processing" chapter. It contains a urllister.py 9
> lines program, followed by a 7 lines usage example which does just
> that: given a URL for a HTML file, it lists the hyperlinks inside it.
> 
> /sorin gherman

You could also check out Aahz's slides about multithreading, they show
the development of different threading architectures for a spider.

http://starship.python.net/crew/aahz/OSCON2001/index.html

-- 

Vennlig hilsen 

Syver Enstad



More information about the Python-list mailing list