[Cython] [cython-users] freelist benchmarks

Robert Bradshaw robertwb at gmail.com
Wed Feb 27 09:54:24 CET 2013


On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 26.02.2013 21:16:
>> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote:
>>> David Roe, 25.02.2013 00:00:
>>>> I changed the current type pointer check to look at tp_basicsize instead.
>>>>
>>>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>>>> with only a couple of exceptions in lxml.objectify that have one additional
>>>>> object field. So, just extending the freelist support to use two different
>>>>> lists for different struct sizes instead of just one would make it work for
>>>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>>>> over there would be interesting, I guess.
>>>>
>>>> I found some chains of length 5.  This could be shortened to 4 by putting
>>>> the freelist at the level of Element (which is where you most care about
>>>> speed of object creation).
>>>
>>> It's substantially easier to keep it in the top-level base class, though.
>>> Otherwise, we'd need a new protocol between inheriting types as I
>>> previously described. That add a *lot* of complexity.
>>>
>>>
>>>> SageObject
>>>>     -> Element (_parent attribute and cdef methods)
>>>>     -> Vector (_degree)
>>>>     -> FreeModuleElement (_is_mutable)
>>>>     -> FreeModuleElement_generic_dense (_entries)
>>>>
>>>> SageObject
>>>>     -> Element (_parent attribute and cdef methods)
>>>>     ->sage.structure.element.Matrix (_nrows)
>>>>     -> sage.matrix.matrix.Matrix (_base_ring)
>>>>     -> Matrix_integer_dense (_entries)
>>
>> I don't know that (expensive) matrices are the best example, and often
>> the chains are larger for elements one really cares about.
>>
>> sage: def base_tree(x): return [] if x is None else [x] +
>> base_tree(x.__base__)
>>    ...:
>>
>> sage: base_tree(Integer)
>>  [<type 'sage.rings.integer.Integer'>, <type
>> 'sage.structure.element.EuclideanDomainElement'>, <type
>> 'sage.structure.element.PrincipalIdealDomainElement'>, <type
>> 'sage.structure.element.DedekindDomainElement'>, <type
>> 'sage.structure.element.IntegralDomainElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>>
>> sage: base_tree(RealDoubleElement)
>>  [<type 'sage.rings.real_double.RealDoubleElement'>, <type
>> 'sage.structure.element.FieldElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>>
>> sage: base_tree(type(mod(1, 10)))
>>  [<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>, <type
>> 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, <type
>> 'sage.rings.finite_rings.element_base.FiniteRingElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>
> My original question was if they have differently sized object structs or
> not. Those that don't would currently go into the same freelist.

They all add to the struct at the leaf subclass.

>>> Ok, so even for something as large as Sage, we'd apparently end up with
>>> just a couple of freelists for a given base type. That really makes it
>>> appear reasonable to make that number a compile time constant as well. I
>>> mean, even if you *really* oversize it, all you loose is the static memory
>>> for a couple of pointers. On a 64 bit system, if you use a freelist size of
>>> 8 objects and provision freelists for 8 differently sized subtypes, that's
>>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
>>> times that size shouldn't hurt anyone. Unused subtype freelists really take
>>> almost no space and won't hurt performance either.
>>
>> Elements in Sage are typically larger than 8 bytes
>
> I wasn't adding up the size of the objects, only of the pointers in the
> freelists. If the objects end up in the freelist, they'll also be used on
> the next instantiation, so their size doesn't really matter.

It does if you use a lot of them, then they just sit around forever,
but I suppose if you use them once you're willing to pay the price. It
also doesn't make sense for a lot of them that are rather expensive
anyways (e.g. every kind of matrix or polynomial specialization).

>> and our
>> experiments for Integer showed that the benefit (for this class)
>> extended well beyond 8 items. On the other hand lots of elements are
>> so expensive that they don't merit this at all.
>>
>> I think one thing to keep in mind is that Python's heap is essentially
>> a "freelist" of objects of every size up to 128(?) bytes, so what are
>> we trying to save by putting it at the base type and going up and down
>> the __cinit__/__dealloc__ chain?
>
> Allocation still seems to take its time.

Yes, it does.

>> I suppose we save the zero-ing out of
>> memory and a function call or two, but that's not the expensive part.
>
> And I noticed now that we still have to do the zeroing out in order to
> initialise C typed attributes.

Yep.

> And that it's actually not trivial to figure
> out in what cases we can safely put a subtype into the freelist. There are
> a couple of special cases in CPython's object allocation, e.g. for heap
> types. Their instances own a reference to the type, which is not the case
> for static types.
>
>
>> For our Integer free list, we save going up the __cinit__/__dealloc__
>> call, initializing a couple of members, setting the vtable pointers,
>> which turns out to be the bulk of the cost.
>
> And your hierarchy examples above show that that they are implemented
> across multiple modules. I can imagine that being a major problem as the C
> compiler can't inline the tp_new calls in that case, can't really reorder
> the struct field assignments, etc.

Yes. And some of those modules are already 1000s of lines, so it's not
like we should just put them all in one (though perhaps someday we'll
support some kind of static linking...).

> I imagine that the freelist could leave the initial vtable untouched in
> some cases, but that would mean that we need a freelist per actual type,
> instead of object struct size.
>
> Now, if we move the freelist handling into each subtype (as you and Mark
> proposed already), we'd get some of this for free, because the objects that
> get freed are already properly set up for the specific type, including
> vtable etc. All that remains to be done is to zero out the (known) C typed
> attributes, set the (known) object attributes to None, and call any
> __cinit__() methods in the super types to do the rest for us. We might have
> to do it in the right order, i.e. initialise some attributes, call the
> corresponding __cinit__() method, initialise some more attributes, ...
>
> So, basically, we'd manually inline the bottom-up aggregation of all tp_new
> functions into the current one, skipping those operations that we don't
> consider necessary in the freelist case, such as the vtable setup.
>
> Now, the only remaining issue is how to get at the __cinit__() functions if
> the base type isn't in the same module, but as Mark proposed, that could
> still be done if we require it to be exported in a C-API (and assume that
> it doesn't exist if not?). Would be better to know it at compile time,
> though...

Yes, and that's still going to (potentially) be expensive. I'd rather
have a way of controlling what, if anything, gets zero'd out/set to
None, as most of that (in Sage's case at least) will still be valid
for the newly-reused type or instantly over-written (though perhaps
the default could be to call __dealloc__/__cinit__). With this we
could skip going up and down the type hierarchy at all.

- Robert


More information about the cython-devel mailing list