[lxml-dev] [lxml][objectify] optimization questions

Hi, sorry for the inconvenience, I now put this into a new thread. And I'd have gotten back to that sooner but have been ill.
I probably have some misunderstandings how the reuse of elements works. When I "visit" a node, like:
the Python Element object for "i" is being created. Will that Python Element be garbage-collected afterwards, if I do not explicitly delete "i" from the xml tree? I thought this element survived in the element proxy.
I see, but why would "manual access" of the nodes not have the same effect: Runs slow: ========== python2.4 -m timeit -v -s""" from lxml import etree from lxml import objectify parser = etree.XMLParser(remove_blank_text=True) lookup = etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup()) parser.setElementClassLookup(lookup) objectify.setDefaultParser(parser) objectify.enableRecursiveStr() root = objectify.Element('root') root.i = 17 root.f = 238.3343 root.s = 'what' root.d = '2006-03-03' print root.i print root.f print root.s print root.d """ "n = root.i; n = root.f; n = root.s; n = root.d" 17 238.3343 what 2006-03-03 10 loops -> 0.0102 secs 17 238.3343 what 2006-03-03 100 loops -> 0.101 secs 17 238.3343 what 2006-03-03 1000 loops -> 1.02 secs 17 238.3343 what 2006-03-03 17 238.3343 what 2006-03-03 17 238.3343 what 2006-03-03 raw times: 1.03 1.02 1.02 1000 loops, best of 3: 1.02 msec per loop Runs fast: ========== python2.4 -m timeit -v -s""" from lxml import etree from lxml import objectify parser = etree.XMLParser(remove_blank_text=True) lookup = etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup()) parser.setElementClassLookup(lookup) objectify.setDefaultParser(parser) objectify.enableRecursiveStr() root = objectify.Element('root') root.i = 17 root.f = 238.3343 root.s = 'what' root.d = '2006-03-03' print root """ "n = root.i; n = root.f; n = root.s; n = root.d" root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] 10 loops -> 0.00109 secs root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] 100 loops -> 0.00928 secs root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] 1000 loops -> 0.0897 secs root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] 10000 loops -> 0.905 secs root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] root = None [ObjectifiedElement] i = 17 [IntElement] f = 238.33430000000001 [FloatElement] s = 'what' [StringElement] d = '2006-03-03' [StringElement] raw times: 0.893 0.911 0.911 10000 loops, best of 3: 89.3 usec per loop Recursively outputting root before accessing its child elements really speeds things up, even though I accessed all elements in the slow example, too. Why is this? I'm clueless. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
I think I can give an answer here. The difference lies in the two cleanup modes in the Python interpreter: GC and ref counting. Ref-counted objects disappear immediately after loosing the last reference, however, when there are circular references between elements, the GC is required to clean them up. These objects can be garbage collected at any time, but they are usually kept until there is a good opportunity to clean them up, i.e. enough time has passed to merit the GC overhead or memory is filling up so it has to run. Now, the way recursive dumping is currently implemented instantiates an additional object for each element: _Attrib. This generates circular references between the element and its attribute proxy which enforces use of the GC instead of the normal ref-count algorithm. So elements that were recursively printed stay alive until the next run of the GC. Elements that do not have an _Attrib dictionary proxy can be deleted when ref-counting them out. You should be able to reproduce the behaviour observed after recursive printing with elements for which you called ".attrib". You should not rely on either behaviour as this deals with implementation details in both lxml and Python. However, I would not object to patches that make the behaviour more predictable. Stefan

are circular references between elements, the GC is required to clean
Hello Stefan, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 23.10.2006 22:53:51: there them up. that do
Thanks for this good explanation! This of course also explains why my experiments with an additional "visit" function (a stripped down _dump, essentially) only worked when there was a call to element.items() included. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi again, I rewrote the current recursive string printing implementation to use a real iterator for attribute access, which also lead to much shorter code in _Attrib (after a cleanup). This should remove the difference you see, although moving towards the slower variant. However, if it worked, this means that the elements are immediately garbage collected, which is the right thing to do from a memory perspective. Please test on your machine a) if the two code snippets still differ in performance and b) if the new implementation resulted in any noticeable slow down. If you feel ambitious, take a look at the benchmark directory and try to come up with a new benchmark suite "bench_objectify.py". The benchmark framework makes new benchmarks extremely easy to write and the four test XML trees should be well suited for objectify already. Thanks, Stefan

Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 24.10.2006 00:23:54:
I can confirm a) no performance difference between recursive element printing and "manual element access" any more b) no significant slow down using the little timeit snippets for benchmarking. framework
makes new benchmarks extremely easy to write and the four test XML trees should be well suited for objectify already.
Will take a look. Some more need for clarification: If I understand correctly the lxml element proxy only speeds up things if - I hold a python reference to the element object or - a circular reference to the element in question prevents it from being gc-ed To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree. Would it also be possible to modify objectify in a way that the lifespan of the python _Element, once it has been instantiated, is tied to the existence of the underlying _c_node (xmlNode)? Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
Good, thanks.
Correct. However, as I said: do not rely on the second thing. GC runs are unpredictable (unless you run it by hand).
To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree.
... the fastest approach likely being cache[root] = list(root.getiterator())
Hmm, I don't know if that's a good thing in general. It eats substantially more memory than the C-tree does already. I mean, feel free to fill a cache like the above when XML comes in and delete it when it goes back out during processing. It should not be that much slower than doing it inside objectify, but it's simple enough to not require a dedicated API and it gives you absolute control over the trade-off between space and speed. Stefan

Hi Holger, Holger Joukl wrote:
To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree.
I added a FAQ entry on performance tweaking in objectify. If you find other things to add, I'd be happy to hear about them. Stefan

Hi Stefan, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 25.10.2006 08:49:08:
Very helpful! Thanks a lot. I think the first two hints might also be helpful for someone using classic lxml.etree. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
Well, it wasn't quite that well suited after all. I added some smaller benchmarks myself and adapted the benchmark trees to simplify their use through objectify. Tree 3 still doesn't fit, but trees 1,2 and 4 can be used for benchmarking. Just look at bench_objectify.py to see how that works. So, as you're testing anyway, I'd be happy if you could come up with some additional benchmarks. That way, we could put some more results up on the performance web page. Stefan

Hi Holger, Holger Joukl wrote:
I think I can give an answer here. The difference lies in the two cleanup modes in the Python interpreter: GC and ref counting. Ref-counted objects disappear immediately after loosing the last reference, however, when there are circular references between elements, the GC is required to clean them up. These objects can be garbage collected at any time, but they are usually kept until there is a good opportunity to clean them up, i.e. enough time has passed to merit the GC overhead or memory is filling up so it has to run. Now, the way recursive dumping is currently implemented instantiates an additional object for each element: _Attrib. This generates circular references between the element and its attribute proxy which enforces use of the GC instead of the normal ref-count algorithm. So elements that were recursively printed stay alive until the next run of the GC. Elements that do not have an _Attrib dictionary proxy can be deleted when ref-counting them out. You should be able to reproduce the behaviour observed after recursive printing with elements for which you called ".attrib". You should not rely on either behaviour as this deals with implementation details in both lxml and Python. However, I would not object to patches that make the behaviour more predictable. Stefan

are circular references between elements, the GC is required to clean
Hello Stefan, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 23.10.2006 22:53:51: there them up. that do
Thanks for this good explanation! This of course also explains why my experiments with an additional "visit" function (a stripped down _dump, essentially) only worked when there was a call to element.items() included. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi again, I rewrote the current recursive string printing implementation to use a real iterator for attribute access, which also lead to much shorter code in _Attrib (after a cleanup). This should remove the difference you see, although moving towards the slower variant. However, if it worked, this means that the elements are immediately garbage collected, which is the right thing to do from a memory perspective. Please test on your machine a) if the two code snippets still differ in performance and b) if the new implementation resulted in any noticeable slow down. If you feel ambitious, take a look at the benchmark directory and try to come up with a new benchmark suite "bench_objectify.py". The benchmark framework makes new benchmarks extremely easy to write and the four test XML trees should be well suited for objectify already. Thanks, Stefan

Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 24.10.2006 00:23:54:
I can confirm a) no performance difference between recursive element printing and "manual element access" any more b) no significant slow down using the little timeit snippets for benchmarking. framework
makes new benchmarks extremely easy to write and the four test XML trees should be well suited for objectify already.
Will take a look. Some more need for clarification: If I understand correctly the lxml element proxy only speeds up things if - I hold a python reference to the element object or - a circular reference to the element in question prevents it from being gc-ed To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree. Would it also be possible to modify objectify in a way that the lifespan of the python _Element, once it has been instantiated, is tied to the existence of the underlying _c_node (xmlNode)? Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
Good, thanks.
Correct. However, as I said: do not rely on the second thing. GC runs are unpredictable (unless you run it by hand).
To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree.
... the fastest approach likely being cache[root] = list(root.getiterator())
Hmm, I don't know if that's a good thing in general. It eats substantially more memory than the C-tree does already. I mean, feel free to fill a cache like the above when XML comes in and delete it when it goes back out during processing. It should not be that much slower than doing it inside objectify, but it's simple enough to not require a dedicated API and it gives you absolute control over the trade-off between space and speed. Stefan

Hi Holger, Holger Joukl wrote:
To speed up my usecase I could force-create and hold python references to every node before starting to operate on the tree.
I added a FAQ entry on performance tweaking in objectify. If you find other things to add, I'd be happy to hear about them. Stefan

Hi Stefan, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 25.10.2006 08:49:08:
Very helpful! Thanks a lot. I think the first two hints might also be helpful for someone using classic lxml.etree. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
Well, it wasn't quite that well suited after all. I added some smaller benchmarks myself and adapted the benchmark trees to simplify their use through objectify. Tree 3 still doesn't fit, but trees 1,2 and 4 can be used for benchmarking. Just look at bench_objectify.py to see how that works. So, as you're testing anyway, I'd be happy if you could come up with some additional benchmarks. That way, we could put some more results up on the performance web page. Stefan
participants (2)
-
Holger Joukl
-
Stefan Behnel