Parsing HTML/XML documents

pabloski at pabloski at
Thu Apr 26 12:41:29 CEST 2007

I need to parse real world HTML/XML documents and I found two nice python
solution: BeautifulSoup and Tidy.

However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.

I'm interested in using Mozilla DOM from inside a Python script, however
I'm a bit confused about how can I use pyXPCOM to accomplish this job.

Any suggestions?

More information about the Python-list mailing list