[XML-SIG] Large xml databases and python

Mon Aug 21 13:16:37 CEST 2006

Hi,

I'm a newbie at python but we are using it daily in our research, so I'm getting the hang of it.

My RA and I are developing about 30 large unicode xml databases, averaging about 100 Mb each. They have to retain their XML format as they are used in a specialised program designed to read them in as they are.

We use Python to do our programming logic, which so far is mainly data manipulation. For example, each day we read through the data, which we save *for the time being* in python dictionaries (and thus they are about 10Mb each!) many times, processing various values to modify existing and create new values.

Our question is this: when we finish porting our 300Mb "python" data into 3Gb of XML data, how can we continue to read it from disk in its xml format and manipulate it?

We are looking at Berkeley XML with the Python API, but are concerned this is not the best solution. we have also dabbled with Amara and ElementTree, but the size our our XML is giving us problems.

We want to focus on the programming logic, which is all we do in python, and working with python structures is great, but this is not a viable option (as far as we can tell) once we move to XML. 

Or is there a way to read large XML files into python structures and then write them out again? 

Our ideal solution (we think!) would be to have an xml database whose elements we could directly modify using python scripts and where the disk and memory handling is done for us somehow.

Any advice would be appreciated. 

Regards,
Matthew
Postdoctoral Fellow, Charles Sturt University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060821/7aad9fe1/attachment.html