Evgeny Turnaev, 27.09.2011 15:27:
> 2011/9/27 Stefan Behnel:
>> Evgeny Turnaev, 27.09.2011 12:09:
>>> My question if related to XSLT document() function and processing
>>> multiple input documents in XSLT.
>>> Currently in our application fetches 3 to 7 separate xml documents merges
>>> all of them into single tree using append() or SubElement and passes merged tree
>>> into XSLT transformation.
>>> Is it possible in lxml to pass multiple trees into XSLT
>>> transformation and access them
>>> for example using document() function? If so then: will a document accessed by
>>> document() function be parsed for each access?
>> It will be cached during the lifetime of one XSLT execution.
> So for the first time it will be parsed?
> Or i can pass already parsed tree using custom resolver?
No, not currently. It could work to enable that, but given the way libxslt
works here, it would always have to get deep copied internally.
See the function _xslt_resolve_from_python() in xslt.pxi.
> Hmm. There seems no method like Resolver.resolve_document()
> What is a result of resolve_string() ? Is it a parsed tree?
The return values are opaque reference objects that should only be passed
back from the user provided resolver method. They do not contain documents.
> You suggesting to cache
> result of resolve_string() and return cached tree for calls to
> document('my_doc') ?
No, I was saying that libxslt caches the documents it parses during an XSLT
run. They will be discarded afterwards.
>>> Will i have to save document to disk to be
>>> able to load in in document() function or i can load already parsed
>>> tree from memory?
>> You can use custom resolvers (see the docs) to pass arbitrary sources into
>> lxml's parsers and XSLT engine.
>>> Will it be faster to use document() than appending 3-4 of 40kb xml
>>> trees and 4-5 small (1kb)?
>> Maybe not, but it depends on what you do. You should benchmark it.
>>> One other reason why i am asking it: we have a lot of merging of the
>>> same tries (<1kb) into different
>>> documents and a few merging of 40kb tries. So i thinked: why cant lxml
>>> use the same tree using document()
>>> instead of explicitly appending it into each xml before transformation.
>> Yes, that sounds like you could simplify your processing. However, if that
>> makes it any faster, cleaner or 'better' by whatever metric, depends
>> entirely on your exact code.
>>> Is there any other any other performance hints?
>> First question: do you really have a performance problem? If so, where?
>> Or is your question more about refactoring the code to keep more of it in
>> XSLT for some design reason?
> No we don`t have any performance issues. Our application is IO bound
> (mostly waiting, although in some situations
> fetching is done from memcache (around 2ms) and in this case xslt
> transform time matters).
> Application is a bit chaotic in code and i am taking some
> investigation of how i can rewrite
> the whole thing and maybe also speedup. The profiling says that about
> a half of actual CPU time
> is in xslt transformation (not much in absolute value) and i am
> wondering if i can "cache" subtrees and
> pass them into xslt instead of appending to each xml individually. I
> will surely benchmark. (i think i will be
> faster than tree merging, although maybe less readable and more
> complicated in python part)
Ok, I take it that your focus in on code cleanup rather than optimisation.
As I said, passing in multiple subtrees isn't guaranteed to be any faster
than what you currently have, and it may just as well be slower. I may be a
way to clean up the code, though, but since I don't see the code, I can