[Cython] [cython-users] freelist benchmarks

Sun Feb 24 22:45:13 CET 2013

mark florisson, 24.02.2013 18:56:
> On 24 February 2013 17:50, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 24 February 2013 15:58, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 24.02.2013 15:50:
>>>> On 24 February 2013 13:52, Stefan Behnel wrote:
>>>>> for those who haven't notice my other e-mail, I implemented a new extension
>>>>> type decorator "@cython.freelist(N)" that replaces the normal object
>>>>> creation and deallocation with a freelist of N recently freed objects.
>>>>> Currently, it's only supported for types that do not have a base class (and
>>>>> lifting that restriction is not all that easy).
>>>>
>>>> Very cool! I've been wanting that for a while now :)
>>>
>>> So did I.
>>>
>>>
>>>> What's the hurdle with base classes?
>>>
>>> The problem is that the current way types are being instantiated is
>>> recursive. The top-most base class calls tp_alloc() and then each step in
>>> the hierarchy adds its bit of initialisation. If you want to introduce a
>>> freelist into this scheme, then it's still the top-most class that does the
>>> allocation, so it would need to manage all freelists of all of its children
>>> in order to return the right object struct for a given instantiation request.
>>>
>>> This cannot really be done at compile time. Imagine a subtype in a
>>> different module, for example, for which the code requests a freelist. The
>>> compilation of the base type wouldn't even know that it's supposed to
>>> manage a freelist at all, only the subtype knows it.
>>>
>>> There are a couple of ways to deal with this. One is to replicate the
>>> freelist in the base type for all subtypes that it finds at runtime. That
>>> might actually be the easiest way to do it, but it requires a bit of memory
>>> management in order to add a new freelist when a new subtype is found at
>>> runtime. It also means that we'd have to find the right freelist before we
>>> can get an object from it (or not, if it's empty), which would likely be an
>>> operation that's linear with the number of subtypes. And the freelist set
>>> would better be bounded in size to prevent users from flooding it with lots
>>> and lots of subtypes.
>>>
>>> Another option would be to split the initialisation up into two functions,
>>> one that allocates *and* initialises the instance and one that *only*
>>> initialises it. That would allow each hierarchy level to manage its own
>>> freelists and to take its own decision about where to get the object from.
>>> This approach comes with a couple of tricky details, as CPython doesn't
>>> provide support for this. So we'd need to find a way to handle type
>>> hierarchies that are implemented across modules.
>>
>> Thanks for the explanation Stefan, this is the one I was thinking of,
>> but I suppose it'd need an extra pointer to the pure init function in
>> the type.
> 
> Hm, since extension types don't do multiple inheritance (and excluding
> Python subclasses), couldn't you import those init functions across
> modules through capsules?

Well, yes, I suppose you could. However, that's quite some overhead. I
think it's way easier to just provision a couple of freelists in advance
and assign them to different subtype sizes as they come in. Even in
somewhat large hierarchies, I doubt that the object structs will have all
that many different sizes. Remember, the size only changes when you add
cdef attributes, and only once when you start adding cdef methods. And even
structs that appear in different subtrees of the hierarchy and that carry
different attributes may end up having the same struct size due to layout
coincidences. I would expect that even a type hierarchy of, say, 20 types,
would have at most some 4-8 different struct sizes. Most of the time,
subtypes are there to change behaviour, not state.

The only real drawback is that you need to enable the base type to do all
that's necessary, which you may not have control over in a few cases. But
then again, if it's worth using a freelist on one subtype, it's probably
worth using it in general, so it's best to fix the base type in any way.

>>> Maybe the best approach would be to let the base type manage everything and
>>> just statically limit the maximum number of subtypes for which it provides
>>> separate freelists, at a first come, first serve basis. And the freelist
>>> selection could be based on the object struct size (tp_basicsize) instead
>>> of the specific type. As long as we don't support inheriting from variable
>>> size objects (like tuple/bytes), that would cut down the problem quite
>>> nicely. I think I should just give it a try at some point.

I changed the current type pointer check to look at tp_basicsize instead.
That made it work for almost all classes in lxml's own Element hierarchy,
with only a couple of exceptions in lxml.objectify that have one additional
object field. So, just extending the freelist support to use two different
lists for different struct sizes instead of just one would make it work for
all of lxml already. Taking a look at Sage to see how the situation appears
over there would be interesting, I guess.

Stefan