Paul Tremberth, 25.11.2013 13:47:
I've read in a few places that libxml2's xmlXPathOrderDocElems can help speed up XPath queries when doing read-only lookups on documents. http://www.xmlsoft.org/html/libxml-xpath.html#xmlXPathOrderDocElems
Has anyone tried it? Would it make sense to have a method like .xpath_speedup() or something on the _Element class?
Perl LibXML has it: http://search.cpan.org/~shlomif/XML-LibXML-2.0107/lib/XML/LibXML/Document.po... Mentioned here: http://code.activestate.com/lists/perl-xml/8161/
So you see, the greatest performance boost is achieved by indexing the elements and only after that there is a small but noticable benefit of pre-compiling the XPath expressions.
This Zori project (I know nothing about) is using it: https://www.assembla.com/spaces/Zori/wiki https://www.assembla.com/code/Zori/subversion/nodes/372/trunk/src/ens/ens_re...
ens_read_xml_popen( &doc, &ctx, voff_proc, ens_opt ); //This supposedly speeds up XPath evaluation for static documents //Update: not just supposed, but vast improvement for reading (many) walkers xmlXPathOrderDocElems( ctx->doc );
Hadn't heard about it yet. I could imagine having a static method on the XPath class, something like "XPath.freeze_tree(element_or_tree)". Although with a big, fat warning that any later modification to the tree will break XPath. (Ok, lxml could also remember this operation in the _Document and run through the tree to clean it up before doing any modifications to it, but I don't mind requiring users to be aware of the tradeoff...) That being said, I would also guess that using the find*() methods instead of XPath would provide a similar speedup in many cases. Their path language is a lot simpler, but they don't need to do any node sorting by design and can avoid tag name string comparisons during deep subtree traversal (using el.iter()). So, in cases where your plain path expression is more selective than any "[...]" conditions (e.g. attribute value or text comparisons), the .find*() methods should win. Plus, they use iterators instead of one-shot collectors, so you can nicely short-circuit your search with them. Stefan