[Tutor] Trying to parse a HUGE(1gb) xml file in python

David Hutto smokefloat at gmail.com
Tue Dec 21 09:55:45 CET 2010


On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Chris Fuller, 21.12.2010 03:27:
>>
>> This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
>> Good thing you're only after one class of tags.  Here's what I'd do.  I'll
>> give a general solution, but there are two parameters / four cases that
>> could
>> make the code simpler, I'll just point them out at the end.
>>
>> Iterate over the file descriptor, reading in line-by-line.  This will be
>> slow
>> on a huge file, but probably not so bad if you're only doing it once.
>
> Note that it's not unlikely that this is actually *slower* than using a real
> XML parser:
>

Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?


> http://effbot.org/zone/celementtree.htm#benchmarks
>


More information about the Tutor mailing list