[XML-SIG] Using xpath/xslt on proprietary object structures.

Xhaus Main Account pyxml@xhaus.com
Thu, 20 Sep 2001 14:10:39 +0100


Greetings all,

I'm using python/pyxml/4suite to process a collection of XML files to
generate a web site (www.paratuberculosis.org), and I'm finding that I
have problems with memory usage. I have several hundred XML files which
cross-reference each other, and I have to load them all into memory to
process them (they are all xlink'ed together)

I'm thinking that one solution would be to eliminate the use of the DOM,
and store the information in a hierarchy of python objects which I build
up myself from the XML files, using SAX.

I know that it should be possible to achieve this by making my object
hierarchy present a full DOM interface, i.e. implement the Node,
Element, Attribute, etc, interfaces.

However, there would be a large amount of effort involved in
implementing all of the DOM methods. So I'm looking for possible
shortcuts, if there are any.

A few months ago, someone posted on this list (I think it was Eliot
Kimber) that there were small modifications that could made that would
make xpath expressions usable against proprietary object
hierarchies/models. But there was no indication given of how to achieve
that.

Has anyone got any pointers as to what those modifications might be, or
how I would go about implementing such a system.

Failing that, some pointers to documentation on the architecture of
4XPath and 4Xslt would be most useful.

Thanks in advance for any help rendered.

BTW, to the authors of the excellent Python XML tools: Please don't take
any of this as criticism. I continue to be amazed by the flexibility and
power of Python and available collection of XML processing tools. A huge
Thank You to all involved in the Python XML effort!

Regards,

Alan Kennedy.

P.S. Out of interest, I have approximately 1335 XML files, taking 2.1meg
of space. Not all of these files need to processed at the same time,
although things would be much faster if they could be. I'm finding that
once the I've loaded in say, 250 members (each of which is represented
by two or more XML files), and constructed a membership directory XML
file (listing all of the members), I'm hitting about 130Meg of memory
usage(!), and my poor little PII-233/160 Meg RAM/NT4 machine starts to
thrash. I fully realise that my design is not the most efficient, but
the enormous memory requirements of DOM don't help. Hence why I'm trying
to find an alternative memory representation foir the documents. I could
do a total rewrite, but the web site I do for is an unpaid voluntary
effort, and I can't spare it that much time.