Hi,
I had some fun during the last days implementing low-level object freelist
support in Cython and gave it a try in lxml. It's implemented as a class
decorator, so all you have to do is to add a "@cython.freelist(8)"
declaration to get an 8 item freelist for the class. Even relatively short
freelists tend to bring a lot as they hit typical iteration use cases, for
example.
For benchmarking, I initially used "Element.attrib", which is a property
that instantiates a new object on each request (not cached in order to
avoid reference cycles). Most use cases of this class consist of a one-shot
operation to read some XML attributes, so it's a classical "create and
throw away" thing, the ideal candidate for a freelist.
The code that I ran through timeit is this:
Setup:
'import lxml.etree as et; el = et.Element("tag", a="1", b="2")'
Benchmarked code:
'el.attrib'
Initial timings:
10000000 loops, best of 3: 0.0856 usec per loop
Replacing "_Attrib(element)" by "_Attrib.__new__(_Attrib, element)":
10000000 loops, best of 3: 0.0781 usec per loop
Adding only the freelist decorator to the _Attrib class:
10000000 loops, best of 3: 0.0737 usec per loop
Combining both "__new__()" and the freelist:
10000000 loops, best of 3: 0.0608 usec per loop
That gives an overall speedup of almost 30%, just by changing two lines of
code.
The main Element proxy class in lxml is another case where it really makes
sense to apply this decorator. It gets instantiated whenever Python access
to an XML element is requested, e.g. during iteration. Here are the timings
for iterating over a large XML document.
Benchmarked code:
'list(islice(tree.iter(), 10000000, 10000001))'
Initial timings:
100 loops, best of 3: 15.6 msec per loop
Enabling the freelist:
100 loops, best of 3: 13.2 msec per loop
That's 15% faster, even all the way through a non-trivial tree iteration
operation. And with just one additional declaration in the code.
Note that this change will require Cython 0.19, which isn't currently close
to release (we just recently released 0.18). So this change may take a bit
of time to make it into an lxml release.
Stefan