[Python-Dev] Optimization Clue

Raymond Hettinger raymond.hettinger at verizon.net
Sat Aug 16 04:52:37 EDT 2003

There is a 2:1 speed difference between dict.has_key() 
and dict.__contains__():

>>> setup = """
d = dict.fromkeys(range(1000))
hk = d.has_key
cn = d.__contains__
>>> import timeit
>>> min(timeit.Timer('hk(500)', setup).repeat(5))
>>> min(timeit.Timer('cn(500)', setup).repeat(5))

The implementations are almost identical except that has_key()
is implemented as a METH_O PyCFunction attached to the method
table and __contains__() is a regular C function attached to
the sq_contains slot which, AFAICT, gets wrapped in a
PyCFunction_Type.  Then both has_key() and __contains__()
get bundled in a method-wrapper object.

I'm at the outer limits of my understanding here, but the above
indicates to me that *all* of the slots with a (PyCFunction) signature
could halve their access time, if instead of being wrapped in a 
PyCFunction_Type, they could just be added to the method
table with the appropriate name for the slot, a METH_O flag,
and a generic docstring.

My first thought for an implementation is to append a forwarding 
function, and append it to the method definition table (manually
at first, but using PyType_Ready later to do it automagically):

static PyObject *
directcontains(PyObject *mp, PyObject *key)
    register long ok = mp->ob_type->tp_as_sequence->sq_contains(mp, key);
    return PyInt_FromLong(ok);  /* Note this really should be PyBool */

"builtin __contains__ method");

static PyMethodDef mapp_methods[] = {
 {"__contains__", (PyCFunction)directcontains,      METH_O,
  .  .  .

In theory, this makes the two methods equally fast.
In practice, the new function doesn't get run when d.__contains__()
is called.

Somewhere, I need to tell it that __contains__ refers to the 
new function instead of the old wrapper.  If anyone knows
where or has a better idea, please let me know.

Raymond Hettinger

P.S.  Here's the source for dict.has_key() and dict.__contains__():

static PyObject *
dict_has_key(register dictobject *mp, PyObject *key)
    long hash;
    register long ok;
    if (!PyString_CheckExact(key) ||
        (hash = ((PyStringObject *) key)->ob_shash) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return NULL;
    ok = (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
    return PyBool_FromLong(ok);

static int
dict_contains(dictobject *mp, PyObject *key)
    long hash;

    if (!PyString_CheckExact(key) ||
        (hash = ((PyStringObject *) key)->ob_shash) == -1) {
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    return (mp->ma_lookup)(mp, key, hash)->me_value != NULL;

More information about the Python-Dev mailing list