crawler in python and mysql
Adam Pletcher
adam at volition-inc.com
Mon Nov 12 14:36:30 EST 2007
In the standard Python install (Windows 2.5, at least), there's there's a couple example scripts you might find useful:
<python>\Tools\webchecker\webchecker.py
Crawls specified URL, checking for broken links.
<python>\Tools\webchecker\websucker.py
Variant on the above that archives the specified site locally. Including images, but you could probably limit it to HTML easily enough.
I haven't used either extensively, but they appear to work as advertised. It should be easy to modify one and tie it into the MySQLdb extensions:
http://sourceforge.net/projects/mysql-python
--
Adam Pletcher
Technical Art Director
Volition/THQ <http://www.volition-inc.com/>
From: python-list-bounces+adam=volition-inc.com at python.org [mailto:python-list-bounces+adam=volition-inc.com at python.org] On Behalf Of Fabian López
Sent: Monday, November 12, 2007 12:33 PM
To: Python-list at python.org
Subject: crawler in python and mysql
Hi,
I would like to write a code that needs to crawl an url and take all the HTML code. I have noticed that there are different opensource webcrawlers, but they are very extensive for what I need. I only need to crawl an url, and don't know if it is so easy as using an html parser. Is it? Which libraries would you recommend me?
Thanks!!
Fabian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071112/d6126978/attachment.html>
More information about the Python-list
mailing list