Mailman 3 [lxml-dev] [lxml][objectify] optimization questions - lxml - The Python XML Toolkit

Oct. 23, 2006

      Hi,
sorry for the inconvenience, I now put this into a new thread.
And I'd have gotten back to that sooner but have been ill.
...
Then: what you observe are most likely GC 'issues'. The thing is: if the
element already exists as Python object, it is reused, which is much
faster
then creating a new one. So in the cases where your code runs faster, you
can
assume that the object survived a larger portion of your code without
being
re-instantiated.
I probably have some misunderstandings how the reuse of elements works.
When I "visit" a node, like:
...
...
...
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.i
<Element i at 1e94b0>
the Python Element object for "i" is being created.
Will that Python Element be garbage-collected afterwards, if I do not
explicitly delete "i"
from the xml tree? I thought this element survived in the element proxy.
...
Especially recursive printing instantiates the entire tree, so if the
objects
are not deleted directly afterwards, this has a performance effect on code
that runs afterwards.
I see, but why would "manual access" of the nodes not have the same effect:

Runs slow:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root.i
print root.f
print root.s
print root.d
""" "n = root.i; n = root.f; n = root.s; n = root.d"
17
238.3343
what
2006-03-03
10 loops -> 0.0102 secs
17
238.3343
what
2006-03-03
100 loops -> 0.101 secs
17
238.3343
what
2006-03-03
1000 loops -> 1.02 secs
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
17
238.3343
what
2006-03-03
raw times: 1.03 1.02 1.02
1000 loops, best of 3: 1.02 msec per loop

Runs fast:
==========
python2.4 -m timeit -v -s"""
from lxml import etree
from lxml import objectify
parser = etree.XMLParser(remove_blank_text=True)
lookup =
etree.ElementNamespaceClassLookup(objectify.ObjectifyElementClassLookup())
parser.setElementClassLookup(lookup)
objectify.setDefaultParser(parser)
objectify.enableRecursiveStr()
root = objectify.Element('root')
root.i = 17
root.f = 238.3343
root.s = 'what'
root.d = '2006-03-03'
print root
""" "n = root.i; n = root.f; n = root.s; n = root.d"
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
10 loops -> 0.00109 secs
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
100 loops -> 0.00928 secs
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
1000 loops -> 0.0897 secs
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
10000 loops -> 0.905 secs
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
root = None [ObjectifiedElement]
    i = 17 [IntElement]
    f = 238.33430000000001 [FloatElement]
    s = 'what' [StringElement]
    d = '2006-03-03' [StringElement]
raw times: 0.893 0.911 0.911
10000 loops, best of 3: 89.3 usec per loop

Recursively outputting root before accessing its child elements
really speeds things up, even though I accessed all elements in
the slow example, too.
Why is this? I'm clueless.

Holger

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this  e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail.  Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.

[lxml-dev] [lxml][objectify] optimization questions

Holger Joukl

Stefan Behnel

Holger Joukl

Stefan Behnel

Holger Joukl

Stefan Behnel

Stefan Behnel

Holger Joukl

Stefan Behnel

Stefan Behnel

Holger Joukl

Stefan Behnel

Holger Joukl

Stefan Behnel

Stefan Behnel

Holger Joukl

Stefan Behnel

tags

participants (2)