Sebastian "lunar" Wiesner
basti.wiesner at gmx.net
Sun Jun 29 11:23:09 CEST 2008
Stefan Behnel <stefan_ml at behnel.de>:
> disappearedng at gmail.com wrote:
>> I am trying to build my own web crawler for an experiement and I don't
>> know how to access HTTP protocol with python.
>> Also, Are there any Opensource Parsing engine for HTML documents
>> available in Python too? That would be great.
> Try lxml.html. It parses broken HTML, supports HTTP, is much faster than
> BeautifulSoup and threadable, all of which should be helpful for your
You should mention its powerful features like XPATH and CSS selection
support and its easy API here, too ;)
Freedom is always the freedom of dissenters.
More information about the Python-list