[Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel stefan_ml at behnel.de
Tue Dec 21 13:26:56 CET 2010


David Hutto, 21.12.2010 13:09:
> On Tue, Dec 21, 2010 at 6:59 AM, Stefan Behnel wrote:
>> David Hutto, 21.12.2010 12:45:
>>>>
>>>> If file a.xml has simple tagged xml like<a>, and file b.config has
>>>> tags that represent the a.xml(i.e.<a>    =<antonym>) as greater tags,
>>>> does this pattern optimize the process by limiting the size of the
>>>> tags to be parsed in the xml, then converting those simpler tags that
>>>> are found to the b.config values for the simple<a-z>    simple format?
>>>
>>> In other words I'm lazy and asking for the experiment to be performed
>>> for me(or, more importantly, if it has been), but since I'm not new to
>>> this, if no one has a specific case, I'll timeit when I get to it.
>>
>> I'm still not sure I understand what you are trying to describe here
>
> a.xml has tags with simplistic forms, like was argued above, with<a>,
> or<b>. b.config has variables for the simple tags in a.xml so that
> <a>  =<alpha>  in b.config.
>
> So when parsing a.xml, you parse it, then use more complex tags to
> define with b.config.. I'll review the url's a little later.

Ok, I'd call that simple renaming, that's what I meant with "indirection" 
and "mapping" (basically the two concepts that computer science is all 
about ;).

Sure, run your own benchmarks, but don't expect anyone to be interested in 
the results. If your interest is to obfuscate the tag names, why not just 
use a binary (or less readable) format? That gives you much better 
obfuscation in the first place.

Stefan



More information about the Tutor mailing list