[Python-Dev] Optimization Clue
Raymond Hettinger
raymond.hettinger at verizon.net
Sat Aug 16 04:52:37 EDT 2003
There is a 2:1 speed difference between dict.has_key()
and dict.__contains__():
>>> setup = """
d = dict.fromkeys(range(1000))
hk = d.has_key
cn = d.__contains__
"""
>>> import timeit
>>> min(timeit.Timer('hk(500)', setup).repeat(5))
0.55875391810121755
>>> min(timeit.Timer('cn(500)', setup).repeat(5))
1.0645585745654671
The implementations are almost identical except that has_key()
is implemented as a METH_O PyCFunction attached to the method
table and __contains__() is a regular C function attached to
the sq_contains slot which, AFAICT, gets wrapped in a
PyCFunction_Type. Then both has_key() and __contains__()
get bundled in a method-wrapper object.
I'm at the outer limits of my understanding here, but the above
indicates to me that *all* of the slots with a (PyCFunction) signature
could halve their access time, if instead of being wrapped in a
PyCFunction_Type, they could just be added to the method
table with the appropriate name for the slot, a METH_O flag,
and a generic docstring.
My first thought for an implementation is to append a forwarding
function, and append it to the method definition table (manually
at first, but using PyType_Ready later to do it automagically):
static PyObject *
directcontains(PyObject *mp, PyObject *key)
{
register long ok = mp->ob_type->tp_as_sequence->sq_contains(mp, key);
return PyInt_FromLong(ok); /* Note this really should be PyBool */
}
PyDoc_STRVAR(directcontains__doc__,
"builtin __contains__ method");
static PyMethodDef mapp_methods[] = {
{"__contains__", (PyCFunction)directcontains, METH_O,
directcontains__doc__},
. . .
In theory, this makes the two methods equally fast.
In practice, the new function doesn't get run when d.__contains__()
is called.
Somewhere, I need to tell it that __contains__ refers to the
new function instead of the old wrapper. If anyone knows
where or has a better idea, please let me know.
Raymond Hettinger
----------------------------------------------------------------------
P.S. Here's the source for dict.has_key() and dict.__contains__():
static PyObject *
dict_has_key(register dictobject *mp, PyObject *key)
{
long hash;
register long ok;
if (!PyString_CheckExact(key) ||
(hash = ((PyStringObject *) key)->ob_shash) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
}
ok = (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
return PyBool_FromLong(ok);
}
static int
dict_contains(dictobject *mp, PyObject *key)
{
long hash;
if (!PyString_CheckExact(key) ||
(hash = ((PyStringObject *) key)->ob_shash) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return -1;
}
return (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
}
More information about the Python-Dev
mailing list