Discrepancies between pickled objectified object and source objectified object ?
data:image/s3,"s3://crabby-images/d7bc8/d7bc80fc71c49835a3c07ad8cb9a72f7fc983cf2" alt=""
Dear all, I have found some severe discrepancies between an objectified tree and its pickle-dump and -loaded cache. It might be reported somewhere but I could not find it. My original issue, if it interests anyone, was found in https://github.com/Capitains/MyCapytain/blob/improvement/buggy.sh as I was benchmarking the effect of pickle. This resulted in pickled object not having the same behavior as the original object (There is a loop in there which checks for sibling to reconstruct a partial tree. Pickled object would not stop "normally" and would get much more that original asked). The issue with the original code, which you can check out, is that it's a "big" codebase and some part of it is undocumented (I unfortunately forgot to document two or three functions as I pushed them and now find myself decrypting it). Most of all, it is a long loop and I am not sure how it would be easy to debug for people. But please if you wish, feel free to do so ;) To show the bug in a limited fashion, in a readable form, I made a repo there : https://github.com/PonteIneptique/test-lxml. It runs on the latest 3.6.4 of LXML on Python 3.5 (did not test on Python 2.7). When I loop over the same tree in three forms different (etree.parse, objectify, pickled objectify) the first and the second are alright but the third shows differences really quick (at a node level). I would love to know if this is an "expected" behavior (ie no focus has been set on checking pickled working, which I would totally understand) or if it is an unknown bug. Best Thibault
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
IMHO opinion you're using the wrong parser. lxml.objectify uses its own dedicated XML parser, to support its specialized element lookup. See lxml.objectify.pyx: https://github.com/lxml/lxml/blob/master/src/lxml/lxml.objectify.pyx#L1735-L... So you'd rather need to use X = objectify.makeparser() instead of X = XMLParser and then use this for objectify.parse(), etree.parse(), ... for comparison. I haven't looked too closely but I think pickle support uses objectify's default parser, so only Tree3 is actually an "objectified" tree:
In other words you wouldn't want to mix "standard" and "objectified" lxml trees. It's usually a bad idea. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
data:image/s3,"s3://crabby-images/d7bc8/d7bc80fc71c49835a3c07ad8cb9a72f7fc983cf2" alt=""
Hi,
Unfortunately, that does not reproduce my original code base bug and so it means I need to figure out exactly where the pickled object changes in behavior compared to the original object. I assume I'll be back later with a more clearer explanation of my issue. Thanks, Thibault
data:image/s3,"s3://crabby-images/d7bc8/d7bc80fc71c49835a3c07ad8cb9a72f7fc983cf2" alt=""
Dear all, dear Holger, Thank you for your pointer. It made me move away from the idea that pickle was the issue. Reading the doc more, now that pickle is not the breaking fact, I discovered that iter(node) does not behaves the same way : | | |for node in lxml.Element| iters on children of said Element |for node in lxml.ObjectifiedElement| iters on siblings of said Element Knowing that, I was able to fix my implementation for Objectified ones. This seems like a odd difference between the two and if someone from the team could explain it, I would be eager to know more. Anyway, thanks for your help Best Thibault On 09/19/2016 01:35 PM, Holger Joukl wrote:
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi Thibault,
The reason for this behaviourial difference is objectify's intended list-like behaviour of siblings with the same name, to make .dot-operator attribute getattr/setattr access behave much like any regular Python object:
Best, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi,
IMHO opinion you're using the wrong parser. lxml.objectify uses its own dedicated XML parser, to support its specialized element lookup. See lxml.objectify.pyx: https://github.com/lxml/lxml/blob/master/src/lxml/lxml.objectify.pyx#L1735-L... So you'd rather need to use X = objectify.makeparser() instead of X = XMLParser and then use this for objectify.parse(), etree.parse(), ... for comparison. I haven't looked too closely but I think pickle support uses objectify's default parser, so only Tree3 is actually an "objectified" tree:
In other words you wouldn't want to mix "standard" and "objectified" lxml trees. It's usually a bad idea. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
data:image/s3,"s3://crabby-images/d7bc8/d7bc80fc71c49835a3c07ad8cb9a72f7fc983cf2" alt=""
Hi,
Unfortunately, that does not reproduce my original code base bug and so it means I need to figure out exactly where the pickled object changes in behavior compared to the original object. I assume I'll be back later with a more clearer explanation of my issue. Thanks, Thibault
data:image/s3,"s3://crabby-images/d7bc8/d7bc80fc71c49835a3c07ad8cb9a72f7fc983cf2" alt=""
Dear all, dear Holger, Thank you for your pointer. It made me move away from the idea that pickle was the issue. Reading the doc more, now that pickle is not the breaking fact, I discovered that iter(node) does not behaves the same way : | | |for node in lxml.Element| iters on children of said Element |for node in lxml.ObjectifiedElement| iters on siblings of said Element Knowing that, I was able to fix my implementation for Objectified ones. This seems like a odd difference between the two and if someone from the team could explain it, I would be eager to know more. Anyway, thanks for your help Best Thibault On 09/19/2016 01:35 PM, Holger Joukl wrote:
data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi Thibault,
The reason for this behaviourial difference is objectify's intended list-like behaviour of siblings with the same name, to make .dot-operator attribute getattr/setattr access behave much like any regular Python object:
Best, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (2)
-
Holger Joukl
-
Thibault Clerice