[Tutor] Trying to parse a HUGE(1gb) xml file in python

Alan Gauld alan.gauld at btinternet.com
Tue Dec 21 10:46:15 CET 2010

"David Hutto" <smokefloat at gmail.com> wrote

>> And from what I recall XML is intended for data transfer in respect 
>> to
>> HTML(from a recent brushup, nothing more),
> Apologies that is browser based transfer,

I'm not sure what that last bit means.
XML is a self-describing data format. It is usually used for files
but can be used in data streams or in-memory strings.

It's natural competitors are TLV (Tag,Lenth,Value) and
CSV(Comma Seperated Value) files but neither is as rich
in structure.  Alternative options include ASN.1, Edifact and
IDL but these are not self-describing(*) (although they are all
more compact and faster to parse, but only IDL is free.)

>> sure has been displayed as a data transfer mechanism,

You don't have to use it for data transfer - eg MS's use
as a document storage format in Office - but frankly if
you use XML to store large volumes of data you are mad,
a database is a much more sensible option being far more
space efficient and faster to work with.

(*)ASN.1, IDL etc all rely on a shared definition, and
often shared code library, at both sender and receiver.
The library is a compiled version of the data definition
which enables complex data structures to be read from
the file in a single chunk very efficiently.


Alan Gauld
Author of the Learn to Program web site

More information about the Tutor mailing list