[lxml-dev] XML_PARSE_COMPACT

Hi there, I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it, but it creates a more compact tree which can lead to performance gains, apparently. This seems to be describing how it works: http://mail.gnome.org/archives/xml/2005-August/msg00138.html Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently
one of which is that it is not available before 2.6.21, so we'd need a compile work-around for older versions.
so we should carefully check whether lxml can work with it, but it creates a more compact tree which can lead to performance gains, apparently.
This seems to be describing how it works:
http://mail.gnome.org/archives/xml/2005-August/msg00138.html
"Seems to describe" sounds like the right way of putting it. From what I read, it requires code to access text content through the API, which is at least partially the case in lxml. I actually added the COMPACT option when I rewrote the parser and removed it when I saw that a) it did not seem to have an immediate effect and b) it was not available in 2.6.16, which we still support AFAICT. I did not check any memory effects, though. I'll have a look at how to integrate it. It may not become a parser option: if it works, we should just always switch it on, as lxml's API is not text node aware anyway and according to Daniel's Mail, the effect should never be negative. The obvious patch against the trunk is attached. If you like, you can run some performance benchmarks with it to see if it makes any difference (it should, since it reduces the number of expensive malloc calls). Note, however, that the XPath tests currently crash when you apply it. I'll take a look at that. The normal API tests (elementtree, etree) seem to pass nicely. Stefan

Stefan Behnel wrote:
Hi Martijn,
Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently
one of which is that it is not available before 2.6.21, so we'd need a compile work-around for older versions.
Of course. At some point we will want to change the required libxml2 version to a newer release. I like the situation where we are now compatible with libxml2 versions shipped with OS X 10.4 and common Linux distributions, so this a requirements change that should be managed very carefully.
so we should carefully check whether lxml can work with it, but it creates a more compact tree which can lead to performance gains, apparently.
This seems to be describing how it works:
http://mail.gnome.org/archives/xml/2005-August/msg00138.html
"Seems to describe" sounds like the right way of putting it. From what I read, it requires code to access text content through the API, which is at least partially the case in lxml.
Right, that's my understanding as well.
I actually added the COMPACT option when I rewrote the parser and removed it when I saw that a) it did not seem to have an immediate effect and b) it was not available in 2.6.16, which we still support AFAICT. I did not check any memory effects, though.
Ah, so you already knew about it. That was the main intent of my email, to make others aware. :)
I'll have a look at how to integrate it. It may not become a parser option: if it works, we should just always switch it on, as lxml's API is not text node aware anyway and according to Daniel's Mail, the effect should never be negative.
Agreed, always on is the right approach.
The obvious patch against the trunk is attached. If you like, you can run some performance benchmarks with it to see if it makes any difference (it should, since it reduces the number of expensive malloc calls).
Note, however, that the XPath tests currently crash when you apply it. I'll take a look at that. The normal API tests (elementtree, etree) seem to pass nicely.
Don't have time to look into it now unfortunately! Thanks for the patch, though! Regards, Martijn

Hi again, Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it
No, it cannot. The problem is a bug in libxml2's xmlReconciliateNs that's in 2.6.24 and most likely all previous versions. That function is not COMPACT aware and simply crashes when called on the respective nodes. I'll file a bug report on this. Stefan

Hi again, Stefan Behnel wrote:
Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it
No, it cannot. The problem is a bug in libxml2's xmlReconciliateNs that's in 2.6.24 and most likely all previous versions. That function is not COMPACT aware and simply crashes when called on the respective nodes.
Since a bug fix in libxml2 will not help us with lxml in the near future, I quickly rewrote the xmlReconciliateNs function in Pyrex. It may be a little faster also as it only checks element nodes (and their attributes). With this used to replace xmlReconciliateNs, the test suite passes nicely with COMPACT enabled by default. I post this here with the question if I miss anything important if I only fix element nodes. I know that there are also things like XINCLUDE_START/END, but I don't know if they need fixing. And I'm pretty sure that's not tested in the test suite. Any comments? Stefan cdef void _fixNamespaces(xmlNode* c_element): """Fix the xmlNs pointers of a node and its subtree that were moved. Mainly copied from libxml2's xmlReconciliateNs, except for certain bugs. """ cdef xmlDoc* c_doc cdef xmlNode* c_start_node cdef xmlNode* c_node cdef xmlNs** c_ns_new_cache cdef xmlNs** c_ns_old_cache cdef xmlNs* c_ns cdef int i, c_cache_size, c_cache_last c_doc = c_element.doc c_start_node = c_element c_ns_new_cache = NULL c_ns_old_cache = NULL c_cache_size = 0 c_cache_last = 0 tree.BEGIN_FOR_EACH_ELEMENT_FROM(c_element, c_element, 1) c_node = c_element while c_node is not NULL: if c_node.ns is not NULL: c_ns = c_node.ns for i from 0 <= i < c_cache_last: if c_ns is c_ns_old_cache[i]: c_node.ns = c_ns_new_cache[i] c_ns = NULL break if c_ns is not NULL: # not in cache, must find a replacement from this document c_ns = tree.xmlNewReconciliedNs(c_doc, c_start_node, c_ns) if c_ns is not NULL: # otherwise we don't care ... :) if c_cache_last >= c_cache_size: # must resize cache if c_cache_size == 0: c_cache_size = 20 else: c_cache_size = c_cache_size * 2 c_ns_new_cache = <xmlNs**> python.PyMem_Realloc( c_ns_new_cache, c_cache_size * sizeof(xmlNs*)) if c_ns_new_cache is NULL: python.PyMem_Free(c_ns_old_cache) PyErr_NoMemory() c_ns_old_cache = <xmlNs**> python.PyMem_Realloc( c_ns_old_cache, c_cache_size * sizeof(xmlNs*)) if c_ns_old_cache is NULL: python.PyMem_Free(c_ns_new_cache) PyErr_NoMemory() c_ns_new_cache[c_cache_last] = c_ns c_ns_old_cache[c_cache_last] = c_node.ns c_cache_last = c_cache_last + 1 c_node.ns = c_ns if c_node is c_element: c_node = <xmlNode*>c_element.properties else: c_node = c_node.next tree.END_FOR_EACH_ELEMENT_FROM(c_element) if c_ns_new_cache is not NULL: python.PyMem_Free(c_ns_new_cache) if c_ns_old_cache is not NULL: python.PyMem_Free(c_ns_old_cache)

Stefan Behnel wrote:
Hi again,
Stefan Behnel wrote:
Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it No, it cannot. The problem is a bug in libxml2's xmlReconciliateNs that's in 2.6.24 and most likely all previous versions. That function is not COMPACT aware and simply crashes when called on the respective nodes.
Since a bug fix in libxml2 will not help us with lxml in the near future, I quickly rewrote the xmlReconciliateNs function in Pyrex. It may be a little faster also as it only checks element nodes (and their attributes). With this used to replace xmlReconciliateNs, the test suite passes nicely with COMPACT enabled by default.
I post this here with the question if I miss anything important if I only fix element nodes. I know that there are also things like XINCLUDE_START/END, but I don't know if they need fixing. And I'm pretty sure that's not tested in the test suite.
Any comments?
I think Kasimier would be someone who could give a good comment to this (if he could read Pyrex). I don't know whether he's still reading any of this though. You could try discussing this with him on the libxml2 list though. I assume this is all trunk work by the way - let's not risk any of this in 1.0. (I'm only saying this because I know you really really like adding performance improvements - which is great in itself :) Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Stefan Behnel wrote:
I quickly rewrote the xmlReconciliateNs function in Pyrex. I post this here with the question if I miss anything important if I only fix element nodes. I know that there are also things like XINCLUDE_START/END, but I don't know if they need fixing. And I'm pretty sure that's not tested in the test suite.
Any comments?
I think Kasimier would be someone who could give a good comment to this (if he could read Pyrex). I don't know whether he's still reading any of this though. You could try discussing this with him on the libxml2 list though.
Sure, we talked about this function a while ago, I have to look up the links he posted. He had some comments about the implementation in libxml2.
I assume this is all trunk work by the way - let's not risk any of this in 1.0.
Sure, this is totally 1.1 stuff. 1.0 is in maintenance mode, as the changelog of 1.0.1 suggests. (even the .attrib bit was a long waiting fix that didn't work without the modified GC stuff). Stefan

Hi again, Stefan Behnel wrote:
I post this here with the question if I miss anything important if I only fix element nodes. I know that there are also things like XINCLUDE_START/END, but I don't know if they need fixing. And I'm pretty sure that's not tested in the test suite.
I checked the XInclude code in libxml2 and found that this is not a problem. The XInclude node itself is converted to an XINCLUDE_START node and a new END node is added as a sibling. The new fragment is then included between the two. Tree traversal will simply ignore these non-element nodes, which is the right thing to do. We do not care about them anymore. Since this was my main concern, I'm convinced now that the _fixNamespace function should work as expected. Stefan

Stefan Behnel wrote:
Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it
No, it cannot. The problem is a bug in libxml2's xmlReconciliateNs that's in 2.6.24 and most likely all previous versions. That function is not COMPACT aware and simply crashes when called on the respective nodes.
I'll file a bug report on this.
Answer: --------------- XML_PARSE_COMPACT should really not be used for document which are modified programmatically after parsing ! This may get fixed, but really you should not do that, back to normal priority ! --------------- I wouldn't know why, though. As long as the libxml2 API can handle compacted nodes (i.e. doesn't free strings without checking), everything should be fine. Stefan

Hi all, I tested the modified trunk with the PARSE_COMPACT option and found that there are certain speed-ups in the parser, although not huge ones. With COMPACT: lxe: write_utf8_parse_stringIO (S- T1 ) 193.6010 msec/pass lxe: write_utf8_parse_stringIO (SA T1 ) 192.3040 msec/pass lxe: write_utf8_parse_stringIO (U- T1 ) 192.4012 msec/pass lxe: write_utf8_parse_stringIO (UA T1 ) 193.8337 msec/pass lxe: write_utf8_parse_stringIO (S- T2 ) 200.5041 msec/pass lxe: write_utf8_parse_stringIO (SA T2 ) 223.5265 msec/pass lxe: write_utf8_parse_stringIO (U- T2 ) 204.4489 msec/pass lxe: write_utf8_parse_stringIO (UA T2 ) 219.6537 msec/pass lxe: write_utf8_parse_stringIO (S- T3 ) 10.3241 msec/pass lxe: write_utf8_parse_stringIO (SA T3 ) 57.1146 msec/pass lxe: write_utf8_parse_stringIO (U- T3 ) 10.3838 msec/pass lxe: write_utf8_parse_stringIO (UA T3 ) 56.8747 msec/pass lxe: write_utf8_parse_stringIO (S- T4 ) 0.4999 msec/pass lxe: write_utf8_parse_stringIO (SA T4 ) 1.6819 msec/pass lxe: write_utf8_parse_stringIO (U- T4 ) 0.5105 msec/pass lxe: write_utf8_parse_stringIO (UA T4 ) 1.7604 msec/pass Without COMPACT: lxe: write_utf8_parse_stringIO (S- T1 ) 195.6990 msec/pass lxe: write_utf8_parse_stringIO (SA T1 ) 201.1346 msec/pass lxe: write_utf8_parse_stringIO (U- T1 ) 196.2893 msec/pass lxe: write_utf8_parse_stringIO (UA T1 ) 195.1105 msec/pass lxe: write_utf8_parse_stringIO (S- T2 ) 203.9327 msec/pass lxe: write_utf8_parse_stringIO (SA T2 ) 224.8351 msec/pass lxe: write_utf8_parse_stringIO (U- T2 ) 206.2940 msec/pass lxe: write_utf8_parse_stringIO (UA T2 ) 226.3990 msec/pass lxe: write_utf8_parse_stringIO (S- T3 ) 10.5374 msec/pass lxe: write_utf8_parse_stringIO (SA T3 ) 63.3481 msec/pass lxe: write_utf8_parse_stringIO (U- T3 ) 10.5375 msec/pass lxe: write_utf8_parse_stringIO (UA T3 ) 63.6911 msec/pass lxe: write_utf8_parse_stringIO (S- T4 ) 0.5175 msec/pass lxe: write_utf8_parse_stringIO (SA T4 ) 1.9483 msec/pass lxe: write_utf8_parse_stringIO (U- T4 ) 0.5276 msec/pass lxe: write_utf8_parse_stringIO (UA T4 ) 1.9449 msec/pass So, there are no cases where this gets slower and the performance improvement is somewhere between 1-10%. Note, however, that this is on a 64bit machine, which can compactly store strings of up to 15 bytes. May be worth switching on if we can figure out if it is safe... Stefan

Stefan Behnel wrote: [snip]
So, there are no cases where this gets slower and the performance improvement is somewhere between 1-10%. Note, however, that this is on a 64bit machine, which can compactly store strings of up to 15 bytes.
May be worth switching on if we can figure out if it is safe...
Yeah, but in my opinion not worth it right now, until we get confirmation from libxml2 that it is safe. Apparently, seeing your other message, libxml2 developers right now consider it unsafe. :) One could also imagine possible speedsups elsewhere in the benchmarks, due to better processor cache behavior. Did you notice any? Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
One could also imagine possible speedsups elsewhere in the benchmarks, due to better processor cache behavior. Did you notice any?
Sadly, bench.py is not usable here as it does not parse the trees. It builds them 'by hand' - without compaction. I tried the ot/findall benchmark and it goes down from 258 msec to 254 msec. That's some 2%. ot/xpath stays the same, ot/xpath_all goes from 210 to 205 msec in some runs. But this is a natural text dominated XML file. If you were to run the same with an XML file that is full of small numbers, you could expect to see a bigger difference. As I said, if it's "safe enough", then it's a benefit that comes for free and it's worth switching on. 1.1 will most likely also have the threading stuff integrated, so I'm already for having alpha *and* beta versions before releasing it anyway. That should catch the bigger problems. Also, the difference between having it in or not is a comment in one line, so I'll just commit it for now - we can always take it back out if we see problems (famous last words). Regards, Stefan

Hi Martijn, Martijn Faassen wrote:
I just ran into the XML_PARSE_COMPACT option for libxml2. There are some caveats to using it apparently so we should carefully check whether lxml can work with it, but it creates a more compact tree which can lead to performance gains, apparently.
I finally decided to revert this back to the original implementation. COMPACT will not be used by lxml. The performance gain is not substantial enough to accept the increase in crash probability. At least my bug report got Kasimier to update the docs so that changing a compacted tree is now officially discouraged. Stefan
participants (2)
-
Martijn Faassen
-
Stefan Behnel