[lxml-dev] etree.XSLT gets stuck

this has been a suspiciously calm week on the list Well, I wanted to write this later, but as I see this line... Yes,
Hello everybody, I'm a very happy lxml user, just in case :) But as my project is not public yet, I'd better not give the address yet. Anyway, we're close to the release. When it's ready, I promise to give my testimonials too, maybe along with the usage statistics and performance overview. there is a problem I encounter. I'm using lxml-2.0beta1 at the moment. Did not upgrade to beta2 as I've waited for the release. As I have mentioned before, I use xslt for html generation out of xml. I have a bunch of templates and several machines with one and the same configuration. libxml2-2.6.30, libxslt-1.1.22, python 2.5.1, apache 2.2/mod_python 3.3.1, amd64, FreeBSD 6.2 . Every machine runs the same routine when my process is initialised - read a template, compile it & store it in the dictionary. The same code, the same data. It looks like this: xslt_parser = etree.XMLParser(no_network = False, resolve_entities = True, load_dtd = True) xslt_doc = etree.parse( urllib2.urlopen( xslt_path , xslt_parser ) transformations[ xslt_path ] = etree.XSLT(xslt_doc) Normally etree.XSLT runs in ~0.05 secons. Nevertheless, on one machine sometimes it gets stuck for 3-5 seconds. I'm sure it's not because of the urlopen, as I've measured the compilation time itself. Could there be any visible reasons for this? Cheers, Dmitri

Hi Dmitri, Dmitri Fedoruk wrote:
I'm a very happy lxml user, just in case :)
:)
this has been a suspiciously calm week on the list Well, I wanted to write this later, but as I see this line... Yes, there is a problem I encounter.
I just knew I shouldn't have said that... :)
I'm using lxml-2.0beta1 at the moment. Did not upgrade to beta2 as I've waited for the release. As I have mentioned before, I use xslt for html generation out of xml.
I have a bunch of templates and several machines with one and the same configuration. libxml2-2.6.30, libxslt-1.1.22, python 2.5.1, apache 2.2/mod_python 3.3.1, amd64, FreeBSD 6.2 .
Note that libxml2 2.6.30 has a security relevant bug, just in case you cannot control where your XML files come from.
Every machine runs the same routine when my process is initialised - read a template, compile it & store it in the dictionary. The same code, the same data. It looks like this:
xslt_parser = etree.XMLParser(no_network = False, resolve_entities = True, load_dtd = True) xslt_doc = etree.parse( urllib2.urlopen( xslt_path , xslt_parser )
If "xslt_path" is a local filename or an HTTP/FTP URL, then there's no need to deploy urlopen() as libxml2 can handle those for you. Just pass the path or URL right in, that's simpler and should be quite a bit faster (it also frees the GIL, although that might not help you here).
transformations[ xslt_path ] = etree.XSLT(xslt_doc)
Normally etree.XSLT runs in ~0.05 secons. Nevertheless, on one machine sometimes it gets stuck for 3-5 seconds. I'm sure it's not because of the urlopen, as I've measured the compilation time itself.
Hmmm, I wouldn't know where the XSLT() call could hang. I mean, you said you are using processes, not threads, right? And the stylesheet is always the exact same one? When I run 1000 XSLT() calls over one and the same tree, I get almost predictable numbers, somewhere within 160-210 msecs for a 400KB XSL file with some 5500 XSL Elements (actually I didn't even know libxslt was *that* fast). And I don't see any spikes anywhere. But you said "on one machine". Is that always the same one? Maybe there's something wrong with the setup, or some background task is running, or it swaps, or ... Stefan
participants (2)
-
Dmitri Fedoruk
-
Stefan Behnel