xpath suddenly slow on new laptop/install?
Hi, I just got a new laptop, and after setting it up, my xpath queries with lxml are going very, very much slower. As in -- on my old laptop it finishes going through a lot of XML files in 30 seconds, on the new one I have to wait for hours without it completing. Both are Core i7 with lots of memory, just with 5 years between them. The XML files are a couple of megabytes each. On the new laptop I've both tried lxml from Ubuntu, from Anaconda, and building it myself.. Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines, or build old versions of lxml, or profile lxml... didn't do that yet. Dag Sverre
Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines, or build old versions of lxml, or profile lxml... didn't do that yet.
Do the libxml2/libxslt versions (that are actually used by lxml) on the machines differ? That's probably the first thing I'd look at and then try to use pure-libxml2/libxslt (xmllint/xsltproc)and see if the performance problems manifest without the lxml layer. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
Holger Joukl schrieb am 08.06.2016 um 12:18:
Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines, or build old versions of lxml, or profile lxml... didn't do that yet.
Do the libxml2/libxslt versions (that are actually used by lxml) on the machines differ?
That's probably the first thing I'd look at and then try to use pure-libxml2/libxslt (xmllint/xsltproc)and see if the performance problems manifest without the lxml layer.
Yes, that's the first thing that comes to mind. There were major algorithmic improvements in the XPath implementation of libxml2 2.9 (IIRC) which could easily explain such a difference. Dag, make sure lxml is using the latest libxml2 release on your side: http://lxml.de/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do If not, try upgrading, or use a static build: STATIC_DEPS=true pip install lxml Stefan
Thanks for your pointers! Just to conclude this thread for the archives: After playing with xmllint it seems that libxml v 20902 had a performance degradation over 20901 for my particularly stupid query of //A/B[parent::*[C = "foo"] which takes nearly 2 minutes, while the equivalent //A[C = "foo"]/B takes 0.09 seconds. On my old laptop with libxml version 20901, both queries execute in 0.09 seconds. Anyway, the solution is to not write a stupid query... Dag Sverre ________________________________________ From: lxml <lxml-bounces@lxml.de> on behalf of Stefan Behnel <stefan_ml@behnel.de> Sent: 10 June 2016 08:38 To: lxml@lxml.de Subject: Re: [lxml] xpath suddenly slow on new laptop/install? Holger Joukl schrieb am 08.06.2016 um 12:18:
Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines, or build old versions of lxml, or profile lxml... didn't do that yet.
Do the libxml2/libxslt versions (that are actually used by lxml) on the machines differ?
That's probably the first thing I'd look at and then try to use pure-libxml2/libxslt (xmllint/xsltproc)and see if the performance problems manifest without the lxml layer.
Yes, that's the first thing that comes to mind. There were major algorithmic improvements in the XPath implementation of libxml2 2.9 (IIRC) which could easily explain such a difference. Dag, make sure lxml is using the latest libxml2 release on your side: http://lxml.de/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do If not, try upgrading, or use a static build: STATIC_DEPS=true pip install lxml Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
On 06/19/2016 09:13 PM, Dag Sverre Seljebotn wrote:
Thanks for your pointers! Just to conclude this thread for the archives: After playing with xmllint it seems that libxml v 20902 had a performance degradation over 20901 for my particularly stupid query of
//A/B[parent::*[C = "foo"]
which takes nearly 2 minutes, while the equivalent
//A[C = "foo"]/B
takes 0.09 seconds
Sorry, made a mistake here, the slow query was in fact A//B[parent::*[C = "foo"] so not the same, although the second works in my particular case. Dag Sverre On my old laptop with libxml version 20901, both queries execute in 0.09 seconds. Anyway, the solution is to not write a stupid query...
Dag Sverre
________________________________________ From: lxml <lxml-bounces@lxml.de> on behalf of Stefan Behnel <stefan_ml@behnel.de> Sent: 10 June 2016 08:38 To: lxml@lxml.de Subject: Re: [lxml] xpath suddenly slow on new laptop/install?
Holger Joukl schrieb am 08.06.2016 um 12:18:
Is there any obvious things (fallback to Python implementation if a package is missing or similar), or do I need to dig deeper? I necessary I'll debug further by moving installs back and forth between the machines, or build old versions of lxml, or profile lxml... didn't do that yet.
Do the libxml2/libxslt versions (that are actually used by lxml) on the machines differ?
That's probably the first thing I'd look at and then try to use pure-libxml2/libxslt (xmllint/xsltproc)and see if the performance problems manifest without the lxml layer.
Yes, that's the first thing that comes to mind. There were major algorithmic improvements in the XPath implementation of libxml2 2.9 (IIRC) which could easily explain such a difference.
Dag, make sure lxml is using the latest libxml2 release on your side:
http://lxml.de/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do
If not, try upgrading, or use a static build:
STATIC_DEPS=true pip install lxml
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (3)
-
Dag Sverre Seljebotn
-
Holger Joukl
-
Stefan Behnel