Trying to parse a HUGE(1gb) xml file
Roy Smith
roy at panix.com
Tue Dec 28 11:02:59 EST 2010
In article <ifcmru$abn$1 at news.eternal-september.org>,
"BartC" <bc at freeuk.com> wrote:
> Still, that's 27 times as much as it need be. Readability is fine, but why
> does the full, expanded, human-readable textual format have to be stored on
> disk too, and for every single instance?
Well, I know the answer to that one. The particular XML feed I'm
working with is a dump from an SQL database. The element names in the
XML are exactly the same as the column names in the SQL database.
The difference being that in the database, the string
"Parental-Advisory" appears in exactly one place, in some schema
metadata table. In the XML, it appears (doubled!) once per row.
It's still obscene. That fact that I understand the cause of the
obscenity doesn't make it any less so.
Another problem with XML is that some people don't use real XML tools to
write their XML files. DTD? What's that? So you end up with tag soup
that the real XML tools can't parse on the other end.
More information about the Python-list
mailing list