[XML-SIG] Optimising/strategies for DOM/XSL lookup/parsing
Sun, 21 Mar 1999 14:42:20 +0000 (/etc/localtime)
> One thing that I've noticed is that the initial DOM 'parsing' is slow
> relative to the XSL pattern matching. On my iMac 266, DOM parsing a 76k XML
> file took 4-5 seconds (utils.FileReader), whilst the XSL pattern matching
> took 1.8 seconds (tp = Parser(pattern), topics =
> tp.select(reader.document)). By the way, is there any way of telling how
> much memory a DOM tree is occupying?
There are several parsers, different (e.g.) in speed.
For example, "pyexapt" is a Python interface to James Clarks "expat"
parser written in "C". It should be quite fast.
On the other hand, "xmlproc" is purely written in Python. One
would expect this parser to be considerably slower.
Which one are you using?
> The way I think things are likely to happen is that there will be large
> numbers of XSL queries and very few DOM creations. However, there are
> something like 140 documents that need to be 'available' for XSL querying
> and subsequent transformation into HTML/RTF. In addition, there will be
> times when an XSL query across all 140 documents will definitely happen.
> Would one strategy be to load up all 140 documents into memory on startup,
> do the DOM processing then and then when an XSL query comes along, 'route'
> it to the appropriate DOM tree (now in memory)?
Neither Python nor PyDOM will prevent you to do this;
only your Mac (or other computer, but Mac are especially sensitive
wrt high memory consumption) may feel uncomfortable.
> If this isn't possible, is it possible to 'save' a DOM tree to an external
> file and re-read it in once a relevant XSL query is ready to be acted upon?
You could try to [c]pickle the documents. [c]pickle is
a standard module that allows you to serialize (e.g. write/read)
complex python objects ("cpickle" is implemented in "C"
and much faster than "pickle"). However, objects must obey
some restrictions to be picklable. I am not sure, whether
DOM objects fulfill them, just try it.