[Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel stefan_ml at behnel.de
Tue Dec 21 16:41:14 CET 2010


David Hutto, 21.12.2010 16:11:
> On Tue, Dec 21, 2010 at 10:03 AM, Stefan Behnel wrote:
>> I meant
>> uncompressing the data *while* parsing it. Just like you have to decode it
>> for parsing, it's just an additional step to decompress it before decoding.
>> Depending on the performance relation between I/O speed and decompression
>> speed, it can be faster to load the compressed data and decompress it into
>> the parser on the fly. lxml.etree (or rather libxml2) internally does that
>> for you, for example, if it detects compressed input when parsing from a
>> file.
>>
>> Note that these performance differences are tricky to prove in benchmarks,
>
> Tricky and proven, then tell me what real time, and this is in
> reference to a recent c++ discussion, is python used in ,andhow could
> it be utilized in....say an aviation system to avoid a collision when
> milliseconds are on the line?

I doubt that there are many aviation systems that send around gigabytes of 
compressed XML data milliseconds before a collision.

I even doubt that air plane collision detection is time critical anywhere 
in the milliseconds range. After all, there's a pilot who has to react to 
the collision warning, and he or she will certainly need more than a couple 
of milliseconds to react, not to mention the time that it takes for the air 
plane to adapt its flight direction. If you plan the system in a way that 
makes milliseconds count, you can just as well replace it by a 
jack-in-the-box. Oh, and that might even speed up the reaction of the pilot. ;)

So, no, if these systems ever come close to a somewhat recent state of 
technology, I wouldn't mind if they were written in Python. The CPython 
runtime is pretty predictable in its performance characteristics, after all.

Stefan



More information about the Tutor mailing list