[Python-Dev] Optimization Clue

Sat Aug 16 08:53:07 EDT 2003

> There is a 2:1 speed difference between dict.has_key() 
> and dict.__contains__():
> 
> >>> setup = """
> d = dict.fromkeys(range(1000))
> hk = d.has_key
> cn = d.__contains__
> """
> >>> import timeit
> >>> min(timeit.Timer('hk(500)', setup).repeat(5))
> 0.55875391810121755
> >>> min(timeit.Timer('cn(500)', setup).repeat(5))
> 1.0645585745654671

OTOH, on my machine, with the same setup:

    >>> # repeat the above for calibration
    >>> min(timeit.Timer('hk(500)', setup).repeat(5))
    0.86988496780395508
    >>> min(timeit.Timer('cn(500)', setup).repeat(5))
    1.8778510093688965
    >>> # try some other things
    >>> min(timeit.Timer('d.has_key(500)', setup).repeat(5))
    1.2841780185699463
    >>> min(timeit.Timer('500 in d', setup).repeat(5))
    0.83904504776000977
    >>> 

i.e. while explicitly calling __contains__ is slower than calling
has_key, using the 'in' operator is faster than any of these.

This is because when you explicitly use __contains__ in Python, you
call a wrapper that adds extra overhead, while if the operator is
used, everything stays at the C level.

> The implementations are almost identical except that has_key()
> is implemented as a METH_O PyCFunction attached to the method
> table and __contains__() is a regular C function attached to
> the sq_contains slot which, AFAICT, gets wrapped in a
> PyCFunction_Type.  Then both has_key() and __contains__()
> get bundled in a method-wrapper object.
> 
> I'm at the outer limits of my understanding here, but the above
> indicates to me that *all* of the slots with a (PyCFunction) signature
> could halve their access time, if instead of being wrapped in a 
> PyCFunction_Type, they could just be added to the method
> table with the appropriate name for the slot, a METH_O flag,
> and a generic docstring.

My question is, is it worth it?  As I showed above, when you call the
operation directly rather than spelling out its special name in
Python, you don't pay the wrapper overhead.

> My first thought for an implementation is to append a forwarding 
> function, and append it to the method definition table (manually
> at first, but using PyType_Ready later to do it automagically):

The wrappers that are currently used are exactly what you propose:
PyType_Ready automagically puts the right wrappers in the class dict.

The wrappers are in typeobject.c; __contains__ uses wrap_objobjproc.
But there's a second-tier wrapper used by the machinery that I can't be
bothered to look up and describe right now; this probably causes the
slowdown, and if you can find a way to avoid this, I'd be happy to
consider it!

[...]
> static PyMethodDef mapp_methods[] = {
>  {"__contains__", (PyCFunction)directcontains,      METH_O,
>   directcontains__doc__},
>   .  .  .
> 
> In theory, this makes the two methods equally fast.
> In practice, the new function doesn't get run when d.__contains__()
> is called.
> 
> Somewhere, I need to tell it that __contains__ refers to the 
> new function instead of the old wrapper.  If anyone knows
> where or has a better idea, please let me know.

That's because when there's a C slot, a wrapper gets stuffed into the
class dict even if there is already something there.  (There's an
exception somewhere if the C slot is a wrapper for whatever's in the
class dict.)

--Guido van Rossum (home page: http://www.python.org/~guido/)