Mailman 3 Re: [lxml] Multiple input documents - lxml - The Python XML Toolkit

29 Sep 2011


      Am i asked something wrong or obvious?

2011/9/27 Evgeny Turnaev :
...
2011/9/27 Stefan Behnel :
...
Evgeny Turnaev, 27.09.2011 12:09:
...
Hi.
    My question if related to XSLT document() function and processing
multiple input documents in XSLT.
Currently in our application fetches 3 to 7 separate xml documents merges
all of them into single tree using append() or SubElement and passes merged tree
into XSLT transformation.
    Is it possible in lxml to pass multiple trees into XSLT
transformation  and access them
for example using document() function? If so then: will a document accessed by
document() function be parsed for each access?
It will be cached during the lifetime of one XSLT execution.
So for the first time it will be parsed? Or i can pass already parsed
tree using custom resolver?
Hmm. There seems no method like Resolver.resolve_document()
What is a result of resolve_string() ? Is it a parsed tree? You
suggesting to cache
result of resolve_string() and return cached tree for calls to
document('my_doc') ?
...
...
Will i have to save
document to disk to be
able to load in in document() function or i can load already parsed
tree from memory?
You can use custom resolvers (see the docs) to pass arbitrary sources into
lxml's parsers and XSLT engine.
...
Will it be faster to use document() than appending 3-4 of 40kb xml
trees and 4-5 small (1kb)?
Maybe not, but it depends on what you do. You should benchmark it.
...
One other reason why i am asking it: we have a lot of merging of the
same tries (<1kb) into different
documents and a few merging of 40kb tries. So i thinked: why cant lxml
use the same tree using document()
instead of explicitly appending it into each xml before transformation.
Yes, that sounds like you could simplify your processing. However, if that
makes it any faster, cleaner or 'better' by whatever metric, depends
entirely on your exact code.
...
Is there any other any other performance hints?
First question: do you really have a performance problem? If so, where?
Or is your question more about refactoring the code to keep more of it in
XSLT for some design reason?
No we don`t have any performance issues. Our application is IO bound
(mostly waiting, although in some situations
fetching is done from memcache (around 2ms) and in this case xslt
transform time matters).
Application is a bit chaotic in code and i am taking some
investigation of how i can rewrite
the whole thing and maybe also speedup. The profiling says that about
a half of actual CPU time
is in xslt transformation (not much in absolute value) and i am
wondering if i can "cache" subtrees and
pass them into xslt instead of appending to each xml individually. I
will surely benchmark. (i think i will be
faster than tree merging, although maybe less readable and more
complicated in python part)
...
Stefan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
--
--------------------------------------------
Турнаев Евгений Викторович
+7 906 875 09 43
--------------------------------------------
-- 
--------------------------------------------
Турнаев Евгений Викторович
+7 906 875 09 43
--------------------------------------------

Re: [lxml] Multiple input documents

Evgeny Turnaev

tags

participants (1)