Re: [lxml] Multiple input documents
Am i asked something wrong or obvious? 2011/9/27 Evgeny Turnaev <turnaev.e@gmail.com>:
2011/9/27 Stefan Behnel <stefan_ml@behnel.de>:
Evgeny Turnaev, 27.09.2011 12:09:
Hi. My question if related to XSLT document() function and processing multiple input documents in XSLT.
Currently in our application fetches 3 to 7 separate xml documents merges all of them into single tree using append() or SubElement and passes merged tree into XSLT transformation.
Is it possible in lxml to pass multiple trees into XSLT transformation and access them for example using document() function? If so then: will a document accessed by document() function be parsed for each access?
It will be cached during the lifetime of one XSLT execution.
So for the first time it will be parsed? Or i can pass already parsed tree using custom resolver? Hmm. There seems no method like Resolver.resolve_document() What is a result of resolve_string() ? Is it a parsed tree? You suggesting to cache result of resolve_string() and return cached tree for calls to document('my_doc') ?
Will i have to save document to disk to be able to load in in document() function or i can load already parsed tree from memory?
You can use custom resolvers (see the docs) to pass arbitrary sources into lxml's parsers and XSLT engine.
Will it be faster to use document() than appending 3-4 of 40kb xml trees and 4-5 small (1kb)?
Maybe not, but it depends on what you do. You should benchmark it.
One other reason why i am asking it: we have a lot of merging of the same tries (<1kb) into different documents and a few merging of 40kb tries. So i thinked: why cant lxml use the same tree using document() instead of explicitly appending it into each xml before transformation.
Yes, that sounds like you could simplify your processing. However, if that makes it any faster, cleaner or 'better' by whatever metric, depends entirely on your exact code.
Is there any other any other performance hints?
First question: do you really have a performance problem? If so, where?
Or is your question more about refactoring the code to keep more of it in XSLT for some design reason?
No we don`t have any performance issues. Our application is IO bound (mostly waiting, although in some situations fetching is done from memcache (around 2ms) and in this case xslt transform time matters). Application is a bit chaotic in code and i am taking some investigation of how i can rewrite the whole thing and maybe also speedup. The profiling says that about a half of actual CPU time is in xslt transformation (not much in absolute value) and i am wondering if i can "cache" subtrees and pass them into xslt instead of appending to each xml individually. I will surely benchmark. (i think i will be faster than tree merging, although maybe less readable and more complicated in python part)
Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
-- -------------------------------------------- Турнаев Евгений Викторович +7 906 875 09 43 --------------------------------------------
-- -------------------------------------------- Турнаев Евгений Викторович +7 906 875 09 43 --------------------------------------------
participants (1)
-
Evgeny Turnaev