Trying to parse a HUGE(1gb) xml file

Tim Harig usernet at ilthio.net
Mon Dec 27 18:02:44 EST 2010


On 2010-12-27, Alan Meyer <ameyer2 at yahoo.com> wrote:
> On 12/26/2010 3:15 PM, Tim Harig wrote:
> ...
>> The problem is that XML has become such a defacto standard that it
>> used automatically, without thought, even when there are much better
>> alternatives available.
>
> I agree with you but, as you say, it has become a defacto standard.  As 
> a result, we often need to use it unless there is some strong reason to 
> use something else.

XML should be used where it makes sense to do so.  As always, use the
proper tool for the proper job.  XML became such a defacto standard, in
part, because it was abused for many uses in the first place so using it
because it is a defacto standard is just piling more and more mistakes
on top of each other.

> The same thing can be said about relational databases.  There are 
> applications for which a hierarchical database makes more sense, is more 
> efficient, and is easier to understand.  But anyone who recommends a 
> database that is not relational had better be prepared to defend his 
> choice with some powerful reasoning because his management, his 
> customers, and the other programmers on his team are probably going to 
> need a LOT of convincing.

I have no particular problem with using other database models in
theory.  In practice, at least until recently, there were few decent
implementations for alternative model databases.  That is starting to
change with the advent of the so-called NoSQL databases.  There are a few
models that I really do like; but, there are also a lot of failed models.
A large part of the problem was the push towards object databases which
is one of the failed models IMNSHO.  Its failure tended to give some of
the other datase models a bad name.

> And of course there are many applications where XML really is the best. 
>   It excels at representing complex textual documents while still 
> allowing programmatic access to individual items of information.

Much agreed.  There are many things that XML does very well.  It works
great for XMP-RPC style interfaces.  I prefer it over binary formats
for documents.  It does suitibly for exporting discreet amounts of
information.

There are however a number of things that it does poorly.  I don't
condone its use for configuration files.  I don't condone its use as a
data store and when you have data approaching gigabytes, that is exaclty
how you are using it.



More information about the Python-list mailing list