[Tutor] Trying to parse a HUGE(1gb) xml file in python

David Hutto smokefloat at gmail.com
Tue Dec 21 12:41:10 CET 2010


On Tue, Dec 21, 2010 at 6:19 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> David Hutto, 21.12.2010 12:02:
>>
>> On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld wrote:
>>>
>>> 8 bytes to describe an int which could be represented in
>>> a single byte in binary (or even in CSV).
>
> Well, "CSV" indicates that there's at least one separator character
> involved, so make that an asymptotic 2 bytes on average. But obviously,
> compression applies to CSV and other 'readable' formats as well.
>
>
>> But that byte can't describe the tag
>
> Yep, that's an argument that Alan already presented.

Didn't see that, but that would make the minimal format for parsing a
comma, or any other single character marker, and the minimal would
still be a specific marker in a file, but does not answer my question
about the assignment to another file's variable.

If file a.xml has simple tagged xml like <a>, and file b.config has
tags that represent the a.xml(i.e.<a> = <antonym>) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simple <a-z> simple format?


More information about the Tutor mailing list