[Tutor] Trying to parse a HUGE(1gb) xml file in python

Steven D'Aprano steve at pearwood.info
Tue Dec 21 14:54:51 CET 2010


Alan Gauld wrote:

> XML is a self-describing data format. It is usually used for files
> but can be used in data streams or in-memory strings.
> 
> It's natural competitors are TLV (Tag,Lenth,Value) and
> CSV(Comma Seperated Value) files but neither is as rich
> in structure.  Alternative options include ASN.1, Edifact and
> IDL but these are not self-describing(*) (although they are all
> more compact and faster to parse, but only IDL is free.)

I would have thought that both JSON and YAML are competitors to XML, 
although of course it depends on exactly what you are using XML for. For 
example, Gnome uses XML files extensively for their poor-man's Registry, 
which is a shame as (in my opinion) simple Windows-style INI files or 
Unix/Linux style config files would be a far better and more natural choice.

Basically, people shouldn't make the mistake of thinking that because 
XML is text-based it is meant as a human-readable (let alone 
human-editable) format. It's not. It's a machine format that happens to 
be *just barely* human-readable and -editable in simple cases due to 
using ASCII text


-- 
Steven


More information about the Tutor mailing list