HTML Parsing

Sebastian "lunar" Wiesner basti.wiesner at
Sun Jun 29 11:23:09 CEST 2008

Stefan Behnel <stefan_ml at>:

> disappearedng at wrote:
>> I am trying to build my own web crawler for an experiement and I don't
>> know how to access HTTP protocol with python.
>> Also, Are there any Opensource Parsing engine for HTML documents
>> available in Python too? That would be great.
> Try lxml.html. It parses broken HTML, supports HTTP, is much faster than
> BeautifulSoup and threadable, all of which should be helpful for your
> crawler.

You should mention its powerful features like XPATH and CSS selection
support and its easy API here, too ;)

Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)

More information about the Python-list mailing list