Not necessarily related to python Web Crawlers

defn noob circularfunc at yahoo.se
Sat Jul 5 11:07:04 CEST 2008


just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/spider_simpleCrawler.asp



More information about the Python-list mailing list