10GB XML Blows out Memory, Suggestions?

fuzzylollipop jarrod.roberson at gmail.com
Thu Jun 8 12:30:37 EDT 2006


Fredrik Lundh wrote:
> fuzzylollipop wrote:
>
> > SAX style or a pull-parser has to be used when the data is "large" or
> > when you don't really need to process every element and attribute.
> >
> > This problem looks like it is just a data export / import problem. In
> > that case you will either have to use a sax style parser and parse the
> > 10GB file. Or as I suggested in another reply, export the data in
> > smaller chunks
>
> or use a parser that can do the chunking for you, on the way in...
>
> in Python, incremental parsers like cET's iterparse and the one in Amara
> gives you *better* performance than SAX (including "raw" pyexpat) in
> many cases, and offers a much simpler programming model.
>
> </F>

thats good to know, I haven't worked with cET yet. Haven't had time to
get it installed :-(




More information about the Python-list mailing list