Writing big XML files where beginning depends on end.
Laurent Pointal
laurent.pointal at limsi.fr
Mon Nov 28 07:34:30 EST 2005
Magnus Lycka wrote:
> We're using DOM to create XML files that describes fairly
> complex calculations. The XML is structured as a big tree,
> where elements in the beginning have values that depend on
> other values further down in the tree. Imagine something
> like below, but much bigger and much more complex:
>
> <node sum="15">
> <node sum="10">
> <leaf>7</leaf'>
> <node sum="3">
> <leaf>2</leaf'>
> <leaf>1</leaf>
> </node>
> </node>
> <node sum="5">
> <leaf>5</leaf>
> </node>
> </node>
>
> We have to stick with this XML structure for now.
>
> In some cases, building up a DOM tree in memory takes up
> several GB of RAM, which is a real showstopper. The actual
> file is maybe a magnitute smaller than the DOM tree. The
> app is using libxml2. It's actually written in C++. Some
> library that used much less memory overhead could be
> sufficient.
>
> We've thought of writing a file that looks like this...
>
> <node sum="#1">
> <node sum="#1.1">
> <leaf>7</leaf'>
> <node sum="#1.1.1">
> <leaf>2</leaf'>
> <leaf>1</leaf>
> </node>
> </node>
> <node sum="#1.2">
> <leaf>5</leaf>
> </node>
> </node>
>
> ...and store {"#": "15", "#1.1", "10" ... } in a map
> and then read in a piece at a time and performs some
> simple change and replace to get the correct values in.
> Then we need something that allows parts of the XML file
> to be written to file and purged from RAM to avoid the
> memory problem.
>
> Suggestions for solutions are appreciated.
An idea.
Put spaces in your names <node sum="#1 ">
Store { "#1": <its offset in the result file> }.
When you have collected all your final data, go throught your stored #,
seek at their position in the file, and replace the data (writting same
amount of chars than reserved).
A kind of direct access to nodes in an XML document file.
A+
Laurent.
More information about the Python-list
mailing list