10GB XML Blows out Memory, Suggestions?

fuzzylollipop jarrod.roberson at gmail.com
Wed Jun 7 11:27:11 EDT 2006


Fredrik Lundh wrote:
> fuzzylollipop wrote:
>
> > you got no idea what you are talking about, anyone knows that something
> > like this is IO bound.
>
> which of course explains why some XML parsers for Python are a 100 times
> faster than other XML parsers for Python...
>

dependes on the CODE and the SIZE of the file, in this case

processing 10GB of file, unless that file is heavly encrypted or
compressed will, the process will be IO bound PERIOD!

And in the case of XML unless the PARSER is extremely inefficient, and
I assume, that would be an edge case, the parser is NOT the bottle neck
in this case.

The relativel performance of Python XML parsers is irrelvant in
relationship to this being an IO bound process, even the slowest parser
could only process the data as fast as it can be read off the disk.

Anyone saying that using C instead of Python will be faster when 99% of
the time in this case is just waiting on the disk to feed a buffer, has
no idea what they are talking about.

I work with TeraBytes of files, and all our Python code is just as fast
as equivelent C code for IO bound processes.




More information about the Python-list mailing list