[Python-Dev] Accessing globals without dict lookup

Tim Peters tim.one@comcast.net
Sat, 09 Feb 2002 18:57:26 -0500

> Do you think it's PEP time yet?

If the ideas aren't written down in an editable form while they're fresh on
at least one person's mind, I'm afraid they'll just get lost.  If it's a
PEP, at least there will be a nagging reminder that someone once had a
promising idea <wink>.

>> 2. Presumably the first "the cell" in this sentence refers to a
>>    different cell than the second "the cell" intends.

> No, they are the same.  See __getitem__ pseudo code.

I persist in my delusion.  Original text:

    When you use its getitem method, the PyObject * in the cell is
    dereferenced, and if a NULL is found, getitem raises KeyError
    even if the cell exists.

Since we're doing something with "the PyObject* in the cell", surely "the
cell" *must* exist.  So what is the "even if the cell exists" trying to say?
I believe it means to say

    even if the cell's cellptr is not NULL

and "the cell's cellptr is not NULL" is quite different from "the cell

> ...
> To avoid the NULL check at the top, we could stuff func_cells with
> empty cells and do the special-case check at the end (just before we
> would raise NameError).

That would be better -- getting cycles out of the most-frequent paths is my
only goal here.

> Then there still needs to be a check for STORE and DELETE, because we
> don't want to store into the dummy cells.  Sound like a hack to assess
> separately later.

Another idea:  a celldict could contain a "real dict" pointer, normally
NULL, and pointing to a plain dict when a real dict is given.  The celldict
constructor would populate the cells from the realdict's contents when not
NULL.  Then getitem wouldn't have to do anything special (realdict==NULL and
realdict!=NULL would be the same to it).  setitem and delitem would
propagate mutations immediately into the realdict too when non-NULL.  Since
mutations are almost certainly much rarer than accesses, this makes the
rarer operations pay.  The eval loop would always see a celldict.

> (Another hack probably not worth it right now is to make the module's
> cell.cellptr point to itself if it's not shadowing a builtin cell --
> then the first NULL check for cell.cellptr can be avoided in the case
> of finding a builtin name successful.)

I don't think I followed this.  If, e.g., a module's "len" cell is normally

    {NULL, pointer to __builtin__'s "len" cell}

under the original scheme, how would that change?

    {NULL, pointer to this very cell}

wouldn't make sense.

    {builtin len, pointer to this very cell}

would make sense, but then the pointer to self is useless -- except as a
hint that we copied the value up from the builtins?  But then a change to
__builtin__.len wouldn't be visible to the module.

> ...
> The problem is that *during* the execution accessing the dict doesn't
> give the right results.  I don't care about this case being fast
> (after all it's exec and if people want it faster they can switch to
> using a celldict).  I do care about not changing corners of the
> semantics.

I expect that a write-through realdict (see above) attached to a celldict in
such cases would address this, keeping the referencing code uniform and
fast, and moving the special-casing into the celldict implementation for
mutating operations.

>> 	i = hash;
>> 	ep = &ep0[i];
>> 	if (ep->me_key == NULL || ep->me_key == key)
>> 		return ep;
>> Win or lose, that's usually the end of a dict lookup.  That is,
>> I'm certain we're paying significantly more for layers of C-level
>> function call overhead today than for what the dict implementation
>> actually does now in the usual cases).

> This should be tried!!!

It's less promising after more thought.  The chirf snag is that "usually the
end" relies on that we're usually looking for things that are there.  But
when looking for a builtin, it's usually not in the module's dict, where we
look first.  In that case, about half the time we'll find an occupied
irrelevant slot in the module's dict, and then we need the rest of
lookdict_string to do a (usually brief, but there's no getting away from the
loop because we can't know how brief in advance) futile chase down the
collision chain.

>>> This avoids the need for more global analysis,

>> Don't really care about that.

> I do.  The C code in compiler.c is already at a level of complexity
> that nobody understands it in its entirety!  (I don't understand what
> Jeremy added, and Jeremy has to ask me about the original code. :-( )

I don't care because I care about something else <wink>:  it would add to
the pressure to refactor this code mercilessly, and that would be a Good
Thing over the long term.  The current complexity isn't inherent, it's an
artifact of outgrowing the original concrete-syntax-tree direct-to bytecode
one-pass design.  Now we've got multiple passes crawling over a now-
inappropriate program representation, glued together more by "reliable
accidents" <wink> than sensible design.  That's all curable, and the
pressures *to* cure it will continue to multiply over time (e.g., it would
take a certain insanity to even think about folding pychecker-like checks
into the current architecture).

> Switching to the compiler.py package is unrealistic for 2.3; there's a
> bootstrap problem, plus it's much slower.  I know that we cache the
> bytecode, but there are enough situations where we can't and the
> slowdown would kill us (imagine starting Zope for the first time from
> a fresh CVS checkout).

I'm a fan of fast compilation.  Heck, I was upset in 1982 when Cray's
compiler dropped below 100,000 lines/minute for the first time <wink>.

>>>   and allows adding code to a module using exec that introduces new
>>>   globals without having to fall back on a less efficient scheme.

>> That is indeed lovely.

> Forgot a <wink> there?  It seems a pretty minor advantage to me.

No, it's lovely, not major.  It's simply a good sign when the worst semantic
nightmares "just work".  It's also a lovely sign.  Most flowers aren't
terribly important either, but they are lovely.

> I would like to be able to compare the two schemes more before
> committing to any implementation.  Unfortunately there's no
> description of Jeremy's scheme that we can compare easily (though I'm
> glad to see he put up his slides on the web:
> http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html).
> I guess there's so much handwaving in Jeremy's proposal about how to
> deal with exceptional cases that I'm uncomfortable with it.  But that
> could be fixed.

I agree it needs more detail, but at the start I'm more interested in the
normal cases.  I'll reattach my no-holds-barred description of resolving
normal-case "len" in this scheme.  Perhaps Jeremy could do the same for his.
Jeremy is also aiming at speeding things like math.pi (global.attribute) as
a whole (not just speeding the "math" part of it).


If I'm reading this right, then in the normal case of resolving "len" in

def mylen(s):
    return len(s)

1. We test func_cells for NULL and find out it isn't.
2. A pointer to a cell object is read out of func_cells at a fixed (wrt
   this function) offset.  This points to len's cell object in the
   module's celldict.
3. The cell object's PyObject* pointer is tested and found to be NULL.
4. The cell object's cellptr pointer is tested and found not to be NULL.
   This points to len's cell object in __builtin__'s celldict.
5. The cell object's cellptr's PyObject* is tested and found not to be
6. The cell object's cellptr's PyObject* is returned.

For a module global, the same description applies, but the outcome of #3 is
not-NULL and it ends there then.

For global.attr, step #3 yields the global, and then attr lookup is the same
as today.

Jeremy, can you do the same level of detail for your scheme?  Skip?