-----Original Message----- From: lxml [mailto:lxml-bounces@lxml.de] On Behalf Of Stefan Behnel Sent: 26 November 2013 07:13 To: lxml@lxml.de Subject: Re: [lxml] use of libxml2 xmlXPathOrderDocElems routine? Paul Tremberth, 25.11.2013 13:47:
I've read in a few places that libxml2's xmlXPathOrderDocElems can help speed up XPath queries when doing read-only lookups on documents. http://www.xmlsoft.org/html/libxml-xpath.html#xmlXPathOrderDocElems
Has anyone tried it? Would it make sense to have a method like .xpath_speedup() or something on the _Element class?
Hadn't heard about it yet. I could imagine having a static method on the XPath class, something like "XPath.freeze_tree(element_or_tree)". Although with a big, fat warning that any later modification to the tree will break XPath. (Ok, lxml could also remember this operation in the _Document and run through the tree to clean it up before doing any modifications to it, but I don't mind requiring users to be aware of the tradeoff...) That being said, I would also guess that using the find*() methods instead of XPath would provide a similar speedup in many cases. Their path language is a lot simpler, but they don't need to do any node sorting by design and can avoid tag name string comparisons during deep subtree traversal (using el.iter()). So, in cases where your plain path expression is more selective than any "[...]" conditions (e.g. attribute value or text comparisons), the .find*() methods should win. Plus, they use iterators instead of one-shot collectors, so you can nicely short-circuit your search with them. Stefan _________________________________________________________________ If it's just a matter of calling a method on the root node and you get a speed increase (subject to the restrictions you mentioned) then I'm +1 on this (anything that helps with speed is good :) If the tradeoff is documented with the method and it's entirely optional then why not? Having said that, in a quick test I've found that using findtext() is definitely quicker than using xpath() - does anyone know would xmlXPathOrderDocElems give a comparable speedup (better/worse)? And would it speed up the find* methods too? Brian