HTML Parsing
Sebastian "lunar" Wiesner
basti.wiesner at gmx.net
Sun Jun 29 05:23:09 EDT 2008
Stefan Behnel <stefan_ml at behnel.de>:
> disappearedng at gmail.com wrote:
>> I am trying to build my own web crawler for an experiement and I don't
>> know how to access HTTP protocol with python.
>>
>> Also, Are there any Opensource Parsing engine for HTML documents
>> available in Python too? That would be great.
>
> Try lxml.html. It parses broken HTML, supports HTTP, is much faster than
> BeautifulSoup and threadable, all of which should be helpful for your
> crawler.
You should mention its powerful features like XPATH and CSS selection
support and its easy API here, too ;)
--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
More information about the Python-list
mailing list