[Python-Dev] pystone(object)

Mon, 19 Aug 2002 11:18:35 -0400

> > Slots can get you back most of this, but not all.  Dict lookup is
> > already extremely tight code, and when I profiled this, most of the
> > time was spent there -- twice as many lookup calls using new-style
> > classes than for classic classes.
> 
> As I've said, and as Oren later demonstrated with code, the cost of a
> namespace dict lookup now is more in the layers of function call overhead
> than in the actual lookup.  We could whittle that down in Oren-like ways,
> although I'd rather we spent whatever time we can devote to stuff like this
> on advancing one of the more-general optimization schemes that were a hot
> topic before the Python conference.

Here's a some info taken from a profile of a program that requests an
instance attribute of a new-style class without slots or properties ten
million times (using a for-loop over xrange(100000) and then 100
attribute lookups (a.foo) in the for-loop body).

The following functions are called for each attribute lookup:

#calls  seconds name

3	1.72    lookdict_string
1	1.17    PyObject_GenericGetAttr
1	1.10    _PyType_Lookup
3	1.00    PyDict_GetItem
1       0.45    _PyObject_GetDictPtr
1       0.38    PyObject_GetAttr

10	5.82	Subtotal

	3.28	eval_frame (one call!)

	9.10	Total

Here, "seconds" is the total time spent in 10 million times the number
of calls.  In addition, the program spent 3.28 seconds in 500 calls to
eval_frame, I assume nearly all of it in the one call that corresponds
to the body of the test function, so I've added that.

The call graph is as follows:

eval_frame -> (10 million times)
    PyObject_GetAttr ->
        PyObject_GenericGetAttr ->
             _PyObject_GetDictPtr
	     _PyType_Lookup ->
	         PyDict_GetItem ->
		      lookdict_string
	         PyDict_GetItem ->
		      lookdict_string
	      PyDict_GetItem ->
	          lookdict_string

If we want to be really aggressive about this, I suppose we could
inline all of that in PyObject_GenericGetAttr, for the case that the
name passes the PyString_CheckExact test and has a pre-calculated
hash.  In particular, PyDict_GetItem then pretty much boils down to
"mp->ma_lookup(mp, key, hash)->me_value".  That should cut out 5
function calls.

A quick small gain would be to inline just the call to
_PyObject_GetDictPtr.  (I tried this; it saves about 2% on the total
running time of this particular test when not using the profiler.)

An intermediate gain would be to inline the call to _PyType_Lookup.

Here's the code I profiled:

============================================================================
class C(object):
    pass

def main():
    a = C()
    a.foo = 42
    for i in xrange(100000):
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo
        a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo

main()
============================================================================

If I add __slots__ = ['foo'] to the class definition, here's what I
get this call graph (prefixed with the total seconds for each
function; each function is called exactly once per attribute lookup in
this case):

3.22    eval_frame -> (10 million times)
0.33        PyObject_GetAttr ->
1.05            PyObject_GenericGetAttr ->
0.35                PyDescr_IsData
0.36                member_get ->
0.15                     descr_check ->
0.27                         PyObject_IsInstance
0.44                     PyMember_GetOne
0.49                 _PyType_Lookup ->
0.35                     PyDict_GetItem ->
1.17                          lookdict_string

8.18    Total

This profile points out a bug in descr_check!  It calls
PyObject_IsInstance, which is a very general routine and hence
relatively expensive.  But descr_check's call to it always passes a
genuine PyTypeObject as the second argument, and we can in-line this
by writing PyObject_TypeCheck(obj, descr->d_type); that's a macro that
may call PyType_IsSubtype but in this case never needs to, saving
about 6% on the total running time of this particular test when not
using the profiler.

--Guido van Rossum (home page: http://www.python.org/~guido/)