Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

Wed Jul 30 13:40:42 EDT 2008

Simon Willison wrote:
> Follow up question: what's the best way of incrementally consuming XML
> in Python that's character encoding aware?

iterparse(), as implemented in (c)ElementTree and lxml. Note that ElementTree
and cElementTree are part of Python 2.5, in the xml.etree package.

> I have a very large file to
> consume but I'd rather not have to fall back to the raw SAX API.

Large is fairly relative. Both cElementTree and lxml are pretty memory
friendly, even when parsing into an in-memory tree.

Stefan