[Python-ideas] Exporting dict Items for direct lookups of specific keys

Eyal Lotem eyal.lotem at gmail.com
Mon Jun 11 08:18:19 CEST 2007

On 6/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> > On 6/10/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> > > > On 6/10/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > > Only access of exported items is O(1) time (when accessed via your
> > > > PyDictItem_obj->value), other items must be accessed normally and they
> > > > take just as much time (or as I explained and you reiterated, a tad
> > > > longer, as it requires a bitmap check and in the case of exported
> > > > items another dereference).
> > >
> > > But you still don't explain *how* these exported keys are going to be
> > > accessed.  Walk me through the steps required to improve access times in
> > > the following case:
> > >
> > > def foo(obj):
> > >     return obj.foo
> > >
> > >
> > I think you missed what I said - I said that the functionality should
> > probably not be exported to Python - as Python has little to gain from
> > it (it would have to getattr a C method just to request the exported
> > item -- which will nullify the speed benefit).
> >
> > It is the C code which can suddenly do direct access to access the
> > exported dict items - not Python code.
> Maybe my exposure to C extensions is limited, but I haven't seen a whole
> lot of C doing the equivalent of obj.attrname outside of the Python
> standard library. And even then, it's not "I'm going to access attribute
> Y of object X a million times", it's "I'm going to access some
> attributes on some objects".  The only exception that I've seen happen
> really at all is when someone converts their pure Python library that
> interacts with other libraries into Pyrex.  But even then, repeated
> access is generally uncommon except in wxPython uses; wx.<attrname>
> (which I've never seen converted to Pyrex), and in those cases, repeated
> access is generally rare.
> I'm curious as to what kind of C code you are seeing in which these
> cached lookups will help in a substantial way.
While extensions are an optimization target, the main target is
global/builtin/attribute accessing code.

> > > If I have a dictionary X, and X has exported keys, then whenever I
> > > access exported values in the dictionary via X[key], your proposed
> > > indirection will necessarily be slower than the current implementation.
> >
> > That is true, I acknowledged that. It is even true also that access to
> > X[key] even when key is not exported is slower.  When I have a few
> > spare moments, I'll try and benchmark how much slower it is.
> I await your benchmarks.
I have started work on this. I am still struggling to understand some
nuances of dict's implementation in order to be able to make such a

> > > > Attribute lookups in the class dict are all literal/static key lookups
> > > > in a static dict (though in order for a code object to know that it is
> > > > a static dict, a psyco-like system is required. If such a system is
> > > > used, all of those dict lookups can be made faster as well).
> > >
> > > No, attribute lookups are not all literal/static key lookups.  See
> > > getattr/setattr/delattr, operations on cls.__dict__, obj.__dict__, etc.
> >
> > I may have oversimplified a bit for the sake of explaining.  I was
> > referring to the operations that are taken by LOAD_ATTR, as an
> > example.
> > Lets analyze the LOAD_ATTR bytecode instruction:
> >  * Calls PyObject_GetAttr for the requested attribute name on the
> > request object.
> >  * PyObject_GetAttr redirects it to the type's tp_getattr[o].
> >  * When tp_getattr[o] is not overridden, this calls PyObject_GenericGetAttr.
> >  * PyObject_GenericGetAttr first looks for a method descriptor in
> > dicts of every class in the entire __mro__. If it found a
> > getter/setter descriptor, it uses that. If it didn't, it tries the
> > instance dict, and then uses the class descriptor/attr.
> >
> > I believe this implementation to be very wasteful (specifically the
> > last step) and I have posted a separate post about this in python-dev.
> Due to the lack of support on the issue in python-dev (it would break
> currently existing code, and the time for Python 3.0 PEPs is past), I
> doubt you are going to get any changes in this area unless the resulting
> semantics are unchanged.

Well, I personally find those semantics (involving the question of
whether the class attribute has a __set__ or not) to be "inelegant",
at best, but since I realized that the optimization I am proposing is
orthogonal to that change, I have lost interest there.

> > Since a "static lookup" costs a dereference and a conditional, and a
> > dynamic lookup entails at least 4 C function calls (including C stack
> > setup/unwinds), a few C assignments and C conditionals, I believe it
> > is likely that this will pay off as a serious improvement in Python's
> > performance, when combined with a psyco-like system (not an
> > architecture-dependent ones).
> It's really only useful if you are accessing fixed attributes of a fixed
> object many times.  The only case I can think of where this kind of
> thing would be useful (sufficient accesses to make a positive difference)
> is in the case of module globals, but in that case, we can merely change
> how module globals are implemented (more or less like self.__dict__ = ...
> in the module's __init__ method).
That's not true.

As I explained, getattr accesses the types's mro dicts as well. So
even if you are accessing a lot of different instances, and those have
a shared (fixed) type, you can speed up the type-side dict lookup
(even if you still pay for a whole instance-side lookup).  Also,
"fixed-object" access can occur when you have a small number of
objects whose attributes are looked up many times. In such a case, a
psyco-like system can create a specialized code object specifically
for _instances_ (not just for types), each code object using "static
lookups" on the instance's dict as well, and not just on the class's

> > > From what I can gather (please describe the algorithm now that I have
> > > asked twice), the only place where there exists improvement potential is
> > > in the case of global lookups in a module.  That is to say, if a
> > > function/method in module foo is accessing some global variable bar, the
> > > compiler can replace LOAD_GLOBAL/STORE_GLOBAL/DEL_GLOBAL with an opcode
> > > to access a special PyDictItem object that sits in the function/method
> > > cell variables, rather than having to look in the globals dictionary
> > > (that is attached to every function and method).
> >
> > As I explained above, there is room for improvement in normal
> > attribute lookups, however that improvement requires a psyco-like
> > system in order to be able to deduce which dicts are going to be
> > accessed by the GetAttr mechanism and then using static lookups to
> > access them directly.
> Insights into a practical method of such optimizations are not leaping
> forth from my brain (aside from using a probabilistic tracking mechanism
> to minimize overhead), though my experience with JIT compilers is
> limited.  Maybe someone else has a good method (though I suspect that
> this particular problem is hard enough to make it not practical to make
> it into Python).

Implementing a psyco-like system in CPython is indeed not an easy
task. But it is possible. The simple idea is that you create
specialized code objects that are specific to an instance or to a type
in the code object when the code object is first run with that
instance or type, and use an exact-type check to invoke the right code
object.  The specialized code object can use "static lookups" in
dicts, and perhaps even avoid using obj->ob_type->slotname (instead
use slotname directly, as its already specific to a type).

> > With access to globals and builtins, this optimization is indeed
> > easier, and your description is correct, I can be a little more
> > specific:
> > * Each code object can contain a "co_globals_dict_items" and
> > "co_builtins_dict_items" attributes that refer to the
> > exported-dict-items for that literal name in both the globals and
> > builtins dict.
> >
> > * Each access to a literal name in the globals/builtins namespace, at
> > the compilation stage, will request the globals dict and builtins dict
> > to create an exported item for that literal name. This exported item
> > will be put into the co_globals_dict_items/co_builtins_dict_items in
> > the code object.
> >
> > * LOAD_GLOBAL will not be used when literal name are accessed.
> > Instead, a new bytecode instruction "LOAD_LITERAL_GLOBAL" with an
> > index to the "co_globals_dict_items/co_builtins_dict_items" tuples in
> > the code object.
> You may want to change the name.  "Literal" implies a constant, like 1
> or "hello", as in 'x = "hello"'.  LOAD_GLOBAL_FAST would seem to make
> more sense to me, considering that is what it intends to do.
Well, LOAD_GLOBAL_FAST can only be used when the string that's being
looked up is known at the code-object creation time, which means that
the attribute name was indeed literal.


More information about the Python-ideas mailing list