[Tutor] Trying to parse a HUGE(1gb) xml file in python
stefan_ml at behnel.de
Tue Dec 21 13:26:56 CET 2010
David Hutto, 21.12.2010 13:09:
> On Tue, Dec 21, 2010 at 6:59 AM, Stefan Behnel wrote:
>> David Hutto, 21.12.2010 12:45:
>>>> If file a.xml has simple tagged xml like<a>, and file b.config has
>>>> tags that represent the a.xml(i.e.<a> =<antonym>) as greater tags,
>>>> does this pattern optimize the process by limiting the size of the
>>>> tags to be parsed in the xml, then converting those simpler tags that
>>>> are found to the b.config values for the simple<a-z> simple format?
>>> In other words I'm lazy and asking for the experiment to be performed
>>> for me(or, more importantly, if it has been), but since I'm not new to
>>> this, if no one has a specific case, I'll timeit when I get to it.
>> I'm still not sure I understand what you are trying to describe here
> a.xml has tags with simplistic forms, like was argued above, with<a>,
> or<b>. b.config has variables for the simple tags in a.xml so that
> <a> =<alpha> in b.config.
> So when parsing a.xml, you parse it, then use more complex tags to
> define with b.config.. I'll review the url's a little later.
Ok, I'd call that simple renaming, that's what I meant with "indirection"
and "mapping" (basically the two concepts that computer science is all
Sure, run your own benchmarks, but don't expect anyone to be interested in
the results. If your interest is to obfuscate the tag names, why not just
use a binary (or less readable) format? That gives you much better
obfuscation in the first place.
More information about the Tutor