Trying to parse a HUGE(1gb) xml file

Nobody nobody at nowhere.com
Thu Dec 23 15:27:59 EST 2010


On Wed, 22 Dec 2010 23:54:34 +0100, Stefan Sonnenberg-Carstens wrote:

> Normally (what is normal, anyway?) such files are auto-generated,
> and are something that has a apparent similarity with a database query 
> result, encapsuled in xml.
> Most of the time the structure is same for every "row" thats in there.
> So, a very unpythonic but fast, way would be to let awk resemble the 
> records and write them in csv format to stdout.

awk works well if the input is formatted such that each line is a record;
it's not so good otherwise. XML isn't a line-oriented format; in
particular, there are many places where both newlines and spaces are just
whitespace. A number of XML generators will "word wrap" the resulting XML
to make it more human readable, so line-oriented tools aren't a good idea.





More information about the Python-list mailing list