[Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel stefan_ml at behnel.de
Tue Dec 21 12:59:58 CET 2010


David Hutto, 21.12.2010 12:45:
>> If file a.xml has simple tagged xml like<a>, and file b.config has
>> tags that represent the a.xml(i.e.<a>  =<antonym>) as greater tags,
>> does this pattern optimize the process by limiting the size of the
>> tags to be parsed in the xml, then converting those simpler tags that
>> are found to the b.config values for the simple<a-z>  simple format?
>
> In other words I'm lazy and asking for the experiment to be performed
> for me(or, more importantly, if it has been), but since I'm not new to
> this, if no one has a specific case, I'll timeit when I get to it.

I'm still not sure I understand what you are trying to describe here, but I 
think you want to look into the Wikipedia articles on indexing, hashing and 
compression.

http://en.wikipedia.org/wiki/Index_%28database%29
http://en.wikipedia.org/wiki/Index_%28information_technology%29
http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Data_compression

Terms like "indirection" and "mapping" also come to my mind when I try to 
make sense out of your hints.

Stefan



More information about the Tutor mailing list