Evgeny Turnaev, 27.09.2011 15:27:
2011/9/27 Stefan Behnel:
Evgeny Turnaev, 27.09.2011 12:09:
My question if related to XSLT document() function and processing
multiple input documents in XSLT.
Currently in our application fetches 3 to 7 separate xml documents merges
all of them into single tree using append() or SubElement and passes merged tree
into XSLT transformation.
Is it possible in lxml to pass multiple trees into XSLT
transformation and access them
for example using document() function? If so then: will a document accessed by
document() function be parsed for each access?
It will be cached during the lifetime of one XSLT execution.
So for the first time it will be parsed?
Yes.
Or i can pass already parsed tree using custom resolver?
No, not currently. It could work to enable that, but given the way libxslt
works here, it would always have to get deep copied internally.
See the function _xslt_resolve_from_python() in xslt.pxi.
Hmm. There seems no method like Resolver.resolve_document()
What is a result of resolve_string() ? Is it a parsed tree?
The return values are opaque reference objects that should only be passed
back from the user provided resolver method. They do not contain documents.
You suggesting to cache
result of resolve_string() and return cached tree for calls to
document('my_doc') ?
No, I was saying that libxslt caches the documents it parses during an XSLT
run. They will be discarded afterwards.
Will i have to save document to disk to be
able to load in in document() function or i can load already parsed
tree from memory?
You can use custom resolvers (see the docs) to pass arbitrary sources into
lxml's parsers and XSLT engine.
Will it be faster to use document() than appending 3-4 of 40kb xml
trees and 4-5 small (1kb)?
Maybe not, but it depends on what you do. You should benchmark it.
One other reason why i am asking it: we have a lot of merging of the
same tries (<1kb) into different
documents and a few merging of 40kb tries. So i thinked: why cant lxml
use the same tree using document()
instead of explicitly appending it into each xml before transformation.
Yes, that sounds like you could simplify your processing. However, if that
makes it any faster, cleaner or 'better' by whatever metric, depends
entirely on your exact code.
Is there any other any other performance hints?
First question: do you really have a performance problem? If so, where?
Or is your question more about refactoring the code to keep more of it in
XSLT for some design reason?
No we don`t have any performance issues. Our application is IO bound
(mostly waiting, although in some situations
fetching is done from memcache (around 2ms) and in this case xslt
transform time matters).
Application is a bit chaotic in code and i am taking some
investigation of how i can rewrite
the whole thing and maybe also speedup. The profiling says that about
a half of actual CPU time
is in xslt transformation (not much in absolute value) and i am
wondering if i can "cache" subtrees and
pass them into xslt instead of appending to each xml individually. I
will surely benchmark. (i think i will be
faster than tree merging, although maybe less readable and more
complicated in python part)
Ok, I take it that your focus in on code cleanup rather than optimisation.
As I said, passing in multiple subtrees isn't guaranteed to be any faster
than what you currently have, and it may just as well be slower. I may be a
way to clean up the code, though, but since I don't see the code, I can
only guess.
Stefan