HTML Parsing
Stefan Behnel
stefan_ml at behnel.de
Sun Jun 29 01:26:05 EDT 2008
disappearedng at gmail.com wrote:
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
>
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Python too? That would be great.
Try lxml.html. It parses broken HTML, supports HTTP, is much faster than
BeautifulSoup and threadable, all of which should be helpful for your crawler.
http://codespeak.net/lxml/
Stefan
More information about the Python-list
mailing list