Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

Stefan Behnel stefan_ml at behnel.de
Wed Jul 30 13:40:42 EDT 2008


Simon Willison wrote:
> Follow up question: what's the best way of incrementally consuming XML
> in Python that's character encoding aware?

iterparse(), as implemented in (c)ElementTree and lxml. Note that ElementTree
and cElementTree are part of Python 2.5, in the xml.etree package.


> I have a very large file to
> consume but I'd rather not have to fall back to the raw SAX API.

Large is fairly relative. Both cElementTree and lxml are pretty memory
friendly, even when parsing into an in-memory tree.

Stefan



More information about the Python-list mailing list