[Python-ideas] Optimizing global names via dict references

Franklin? Lee leewangzhong+python at gmail.com
Sat Dec 19 15:07:08 EST 2015


On Sat, Dec 19, 2015 at 9:10 AM, Franklin? Lee
<leewangzhong+python at gmail.com> wrote:
> Dictionary References: An Idea in Three Emails
*snip*

I found some earlier threads while looking for info on PyPy's
"celldicts.py". (I knew it couldn't have been a new idea!) I'm taking
a break from reading them, but I was writing some notes, and the fact
that they exist should be known early.

====

Subject: [Python-Dev] Accessing globals without dict lookup
From: Guido van Rossum
Date: Fri, 08 Feb 2002 11:50:31 -0500
Link: https://mail.python.org/pipermail/python-dev/2002-February/019846.html
PEP: https://www.python.org/dev/peps/pep-0280/

It goes into many of the same details I had worked out this week, including
- pre-creation of empty cells
- globals() > __builtins__ relationship
- can't figure out a use for deeper relationships
- Tim Peters lays out an idea of "celldict pointing to a realdict"
which is similar to how my refs table is a dict parallel to the
original dict, except that it being part of the same object means that
I can do a single lookup to find both an entry and its cell.
(https://mail.python.org/pipermail/python-dev/2002-February/019893.html)

It also has a basic implementation in Python. (Victor, maybe you'd
like reading that one better.)

The PEP, along with PEP 266 and PEP 267, was deferred because "no-one
has yet emerged to do the work of hashing out the differences between
[them.]"


Key differences:
- He stores cells as values. I store them parallel to values, to
reduce overhead on normal dict operations.
- He uses a subclass. I want to make it the default, and dynamically
convert from normal dict to celldict and back.
- He packs __builtins__ cells into module cells when the module is
created. I am lazier: a module cell will only request a __builtins__
cell when it fails to resolve to a value.
- He converts the lookups to indexed lookups, with new corresponding
bytecode ops. I... don't actually know how to get the interpreter to
use the cells.
- Since he uses a subclass, he doesn't have functions load cells from
the module if the functions have a regular dict as a globals() scope.
I want a function to always be able to request cells from its
globals(), even if it is a user dict.
- He uses a single-parent relationship: a cell either has a parent or
it doesn't. I allow for multiple parents, but since I've already
realized that scopes don't have multiple direct parents, let's take
away the loop and go with a single optional parent.


On semantics:

""" I think this faithfully implements the current
semantics (where a global can shadow a builtin), but without the need
for any dict lookups when accessing globals, except in cases where an
explicit dict is passed to exec or eval(). """


Tim Peters points out how it might be more expensive than a lookup.
(https://mail.python.org/pipermail/python-dev/2002-February/019874.html)

"""
    Note that a chain of 4 test+branches against NULL in "the usual case" for
    builtins may not be faster on average than inlining the first few useful
    lines of lookdict_string twice (the expected path in this routine became
    fat-free for 2.2):

        i = hash;
        ep = &ep0[i];
        if (ep->me_key == NULL || ep->me_key == key)
        return ep;

    Win or lose, that's usually the end of a dict lookup.  That is, I'm certain
    we're paying significantly more for layers of C-level function call overhead
    today than for what the dict implementation actually does now (in the usual
    cases).
"""


He also wanted to optimize the globals() -> __builtins__ relationship
by loading the __builtins__ value into the globals() cell as a fast
path, with an expensive notification to all modules' cells when a name
in __builtins__ changes.
(https://mail.python.org/pipermail/python-dev/2002-February/019904.html)


====


Subject: [Python-ideas] Fast global cacheless lookup
From: Neil Toronto
Date: Thu Nov 22 16:40:49 CET 2007
Link: https://mail.python.org/pipermail/python-ideas/2007-November/001212.html

A proposal to modify `dict` instead of subclassing.

"""
What if a frame could maintain an
array of pointers right into a dictionary's entry table? A global lookup
would then consist of a couple of pointer dereferences, and any value
change would show up immediately to the frame.
"""

Unfortunately, he makes the dict responsible for notifying all
registered functions, instead of using a ref-counted indirection. So
his resizes are an additional O(len(functions)), while mine are an
additional O(len(table) + #(refs)).


Subject: [Python-Dev] PATCH: Fast globals/builtins lookups for 2.6
From: Neil Toronto
Date: Thu Nov 29 11:26:37 CET 2007
Link: http://mail.python.org/pipermail/python-ideas/2007-November/001212.html

The patch.

He ended up using versioning, because the notification was expensive.
(*Possibly* not as much of an issue for me.) The patch slightly
changed the behavior of the __builtins__ dict itself.


More information about the Python-ideas mailing list