[Python-ideas] Exporting dict Items for direct lookups of specific keys

Tue Jun 12 14:56:07 CEST 2007

On 6/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> > On 6/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> > > > On 6/10/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > > > "Eyal Lotem" <eyal.lotem at gmail.com> wrote:
> > > > > > On 6/10/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > > > > Only access of exported items is O(1) time (when accessed via your
> > > > > > PyDictItem_obj->value), other items must be accessed normally and they
> > > > > > take just as much time (or as I explained and you reiterated, a tad
> > > > > > longer, as it requires a bitmap check and in the case of exported
> > > > > > items another dereference).
> > > > >
> > > > > But you still don't explain *how* these exported keys are going to be
> > > > > accessed.  Walk me through the steps required to improve access times in
> > > > > the following case:
> > > > >
> > > > > def foo(obj):
> > > > >     return obj.foo
> > > > >
> > > > >
> > > > I think you missed what I said - I said that the functionality should
> > > > probably not be exported to Python - as Python has little to gain from
> > > > it (it would have to getattr a C method just to request the exported
> > > > item -- which will nullify the speed benefit).
> > > >
> > > > It is the C code which can suddenly do direct access to access the
> > > > exported dict items - not Python code.
> [snip]
> > While extensions are an optimization target, the main target is
> > global/builtin/attribute accessing code.
>
> Or really, module globals and __builtin__ accessing.  Arbitrary
> attribute access is one of those "things most commonly done in Python".
> But just for the sake of future readers of this thread, could you
> explicitly enumerate *which* things you intend to speed up with this
> work.

As for optimizing globals/builtins, this will be the effect of my patch:

global x
x = 5 # Direct access write instead of dict write
x # Direct access read
globals()['x'] = 5 # Same speed as before.
globals()['x'] # Same speed as before.
min # Two direct access reads, instead of 2 dict reads.

As for attribute access in classes, the speedup I can gain depends on
a psyco-like system. Lets assume that we have a system that utilizes a
new TYPE_FORK opcode that jumps to use different code according to a
map of types, for example, if we have:

def f(x, y):
  x.hello()

Then we will create a TYPE_FORK opcode before x.hello() that takes 'x'
as an argument, and a map of types (initially empty).  When the exact
type of 'x' isn't in the map, then the rest of the code in the code
object after TYPE_FORK will have a specialized version created for x's
current type [only if that type doesn't override tp_getattr/o], and
inserted in the map.
The specialized version of the code will contain, instead of a
LOAD_ATTR for the string "hello", a FAST_LOAD_ATTR for the string
"hello" (associated with the direct-access dict item in the mro dict
(if there's no mro cache, I actually have a problem here, because I
don't know which dicts I need to export dict items from - and worse,
that list may change with time. The simplest solution is to use an
exported item from an mro cache dict)).

FAST_LOAD_ATTR will not call PyObject_GetAttr, but instead use the
exported dict items to find the descriptor/classattr using direct
access.
If it found a descriptor with __get__/__set__, it will return its get-call.
Otherwise, it will do a normal expensive lookup on the instance dict
(for "hello") (unless __slots__ is defined in which case there is no
instance dict).
If it found that, it will return that.
Otherwise, it will return the descriptor's get-call if it has one or
the descriptor itself as a classattr.

In other words, I am reimplementing PyObject_GenericGetAttr here, but
for mro-side lookups, using my direct lookup.
The result is:

class X(object):
  def method(self, arg):
    self.x = arg # One direct-lookup+dict lookup instead of two dict lookups
    self.other_method() # One direct-lookup+dict lookup instead of two
dict lookups
class Y(object):
  __slots__ = ["x"]
  def method(self, arg):
    self.x = arg # One direct-lookup instead of one dict lookup
    self.other_method() # One direct-lookup instead of one dict lookup

A direct lookup is significantly cheaper than a dict lookup (as
optimized as dict is, it still involves C callstack setups/unwinds,
more conditionals, assignments, potential collisions and far more
instructions).
Therefore, with the combination of a psyco-like system I can eliminate
one of two dict lookup costs, and with the combination of __slots__ as
well, I can eliminate one of one dict lookup costs.

> > > > Since a "static lookup" costs a dereference and a conditional, and a
> > > > dynamic lookup entails at least 4 C function calls (including C stack
> > > > setup/unwinds), a few C assignments and C conditionals, I believe it
> > > > is likely that this will pay off as a serious improvement in Python's
> > > > performance, when combined with a psyco-like system (not an
> > > > architecture-dependent ones).
> > >
> > > It's really only useful if you are accessing fixed attributes of a fixed
> > > object many times.  The only case I can think of where this kind of
> > > thing would be useful (sufficient accesses to make a positive difference)
> > > is in the case of module globals, but in that case, we can merely change
> > > how module globals are implemented (more or less like self.__dict__ = ...
> > > in the module's __init__ method).
> >
> > That's not true.
> >
> > As I explained, getattr accesses the types's mro dicts as well. So
> > even if you are accessing a lot of different instances, and those have
> > a shared (fixed) type, you can speed up the type-side dict lookup
> > (even if you still pay for a whole instance-side lookup).  Also,
>
> That's MRO caching, which you have already stated is orthogonal to this
> particular proposal.
I may have made a mistake before - its not completely orthogonal as an
MRO cache dict which can export items for direct access in psyco'd
code is a clean and simple solution, while the lack of an MRO cache
means that finding which class-side dicts to take exported items from
may be a difficult problem which may involve a cache of its own.

> > "fixed-object" access can occur when you have a small number of
> > objects whose attributes are looked up many times. In such a case, a
> > psyco-like system can create a specialized code object specifically
> > for _instances_ (not just for types), each code object using "static
> > lookups" on the instance's dict as well, and not just on the class's
> > dict.
>
> If you re-read my last posting, which you quoted above and I re-quote,
> you can easily replace 'fixed attributes of a fixed object' with 'fixed
> attributes of a small set of fixed objects' and get what you say.  Aside
> from module globals, when is this seen?
Its seen when many calls are made on singletons.
Its seen when an inner loop is not memory-intensive but is
computationally intensive - which would translate to having an
instance calling methods on itself or other instances.
You may only have 100 instances relevant in your inner loop which is
running many millions of times. In such a case, you will benefit
greatly if for every code object in the methods of every instance, you
create specialized code for every one of the types it is called with
(say, 3 types per code object), so you take perhaps a factor of 3 of
space for code objects (which  I believe are not a significant portion
of memory consumption in Python), but your performance will involve
_no_ dict access at all for attribute lookups, and instead will just
compare instance pointers and then use direct access.

> > > You may want to change the name.  "Literal" implies a constant, like 1
> > > or "hello", as in 'x = "hello"'.  LOAD_GLOBAL_FAST would seem to make
> > > more sense to me, considering that is what it intends to do.
> >
> > Well, LOAD_GLOBAL_FAST can only be used when the string that's being
> > looked up is known at the code-object creation time, which means that
> > the attribute name was indeed literal.
>
> A literal is a value.  A name/identifier is a reference.
>
> In:
>     a = "hello"
> ... "hello" is a literal.
>
> In:
>     hello = 1
> ... hello is a name/identifier.
>
> In:
>     b.hello = 1
> ... hello is a named attribute of an object named/identified by b.
Then I agree, the use of the word literal here is inappropriate,
constant/static may be more appropriate.

Eyal