[Tutor] Trying to parse a HUGE(1gb) xml file in python

Tue Dec 21 15:11:24 CET 2010

"Stefan Behnel" <stefan_ml at behnel.de> wrote

>> And I thought a 1G file was extreme... Do these people stop to 
>> think that
>> with XML as much as 80% of their "data" is just description (ie the 
>> tags).
>
> As I already said, it compresses well. In run-length compressed XML 
> files, the tags can easily take up a negligible amount of space 
> compared to the more widely varying data content

I understand how compression helps with the data transmission aspect.

> compress rather well). And depending on how fast your underlying 
> storage is, decompressing and parsing the file may still be faster 
> than parsing a huge uncompressed file directly.

But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?

There are ways of processing xml to reduce the tag space (a bit like
tinyurl does with long urls) but then the parsing code has to know
about the tag translations too - and usually the savings are small.

Curious,

Alan G.