Parsing HTML/XML documents

Max M maxm at
Thu Apr 26 21:57:47 CEST 2007

Stefan Behnel skrev:
> pabloski at wrote:
>> I need to parse real world HTML/XML documents and I found two nice python
>> solution: BeautifulSoup and Tidy.
> There's also lxml, in case you want a real XML tool.

I have used both BeautiullSoup and lxml. They are both good tools.

lxml is blindingly fast compared to BeautifulSoup though.

A simple tool for importing contact information from 6000 xml files of 
23 MBytes into Zope runs in about 30 seconds. No optimisations at all. 
Just inefficient xpath expressions.

That is pretty good in my book.


hilsen/regards Max M, Denmark
IT's Mad Science

More information about the Python-list mailing list