XML parsing: SAX/expat & yield

Peter Otten __peter__ at web.de
Wed Aug 4 13:22:54 EDT 2010


kj wrote:

> I want to write code that parses a file that is far bigger than
> the amount of memory I can count on.  Therefore, I want to stay as
> far away as possible from anything that produces a memory-resident
> DOM tree.
> 
> The top-level structure of this xml is very simple: it's just a
> very long list of "records".  All the complexity of the data is at
> the level of the individual records, but these records are tiny in
> size (relative to the size of the entire file).
> 
> So the ideal would be a "parser-iterator", which parses just enough
> of the file to "yield" (in the generator sense) the next record,
> thereby returning control to the caller; the caller can process
> the record, delete it from memory, and return control to the
> parser-iterator; once parser-iterator regains control, it repeats
> this sequence starting where it left off.

How about

http://effbot.org/zone/element-iterparse.htm#incremental-parsing

Peter



More information about the Python-list mailing list