[Python-Dev] Idea: Dictionary references

Franklin? Lee leewangzhong+python at gmail.com
Thu Dec 17 10:38:50 EST 2015


On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee <leewangzhong+python at gmail.com>:
>> My suggestion should improve *all* function calls which refer to
>> outside names.
>
> Ok, I now think that you should stop hijacking the FAT Python thread.
> I start a new thread. IMHO your dictionary reference idea is completly
> unrelated to FAT Python.
>
> FAT Python is about guards and specialized bytecode.

Yeah, maybe it's out of the scope of bytecode optimization. But I
think this would make guards unnecessary, since once a name is found,
there's a quick way to refer to it.


>> Each function keeps an indirect, automagically updated
>> reference to the current value of the names they use, and will never
>> need to look things up again.[*]
>
> Indirections, nested dictionaries, creation of new "reference"
> objects... IMHO you are going to have major implementation issues :-/
> The design looks *very* complex. I'm quite sure that you are going to
> make namespace lookups *slower*.

The nested dictionaries are only for nested scopes (and inner
functions don't create nested scopes). Nested scopes will already
require multiple lookups in parents. I think this is strictly an
improvement, except perhaps in memory. Guards would also have an issue
with nested scopes. You have a note on your website about it:
(https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)

    "The optimization is disabled when the builtin function is
modified or if a variable with the same name is added to the global
namespace of the function."

With a NestedRefCell, it would check globals() (a simple dereference
and `pointer != NULL`) and then check __builtin__. If it finds it in
__builtin__, it will save a reference to that. It will only do
repeated lookups in __builtin__ if each of the previous lookups fail.
As far as I know, these repeated lookups are already necessary, and
anything that can be used to avoid them (e.g. guards) can be used for
repeated failed lookups, too.

For non-nested scopes, it will look things up once, costing an extra
RefCell creation if necessary, and the only other costs are on
resizing, deletion of the dict, and working with a larger dict in
general.

The important parts of the design is pretty much in the code that I
posted. We keep an extra hash table for refs, and keep it the same
size as the original hash table, so that we pay a single lookup cost
to get the index in both.


> It reminds me Python before the with statement and PyPy garbage
> collector. Many applications relied on the exact behaviour of CPython
> garbage collector. For example, they expected that a file is written
> on disk when the last reference to the file object is destroyed. In
> PyPy, it wasn't (it isn't) true, the write can be delayed.

It should not affect the semantic. Things should still happen as they
used to, as far as I can figure. Or at least as far as the rules of
the interpreter are concerned. (That is, values might live a little
longer in PyPy, but can't be forced to live longer than they were
formerly allowed to.)


> I guess that with all your complex machinery for dict lookups,

The only cost to a normal getitem (again, other than from looking it
up in a bigger dict) is to make sure the return value isn't NULL. The
machinery is involved in function creation and resolution of names: On
function creation, get refs to each name used. When the name is used,
try to resolve the refs.


>                                                                you
> will have similar issues of object lifetime. It's unclear to me when
> and how "reference" objects are destroyed, nor when dict values are
> destroyed.

RefCells are ref-counted PyObjects. That is not an issue. A RefCell
will live while it is useful (= it has an outside reference) or while
it's not useful but its owner dict hasn't been resized/deleted yet (at
which time RefCells without outside references will be deleted).

RefCells "know" whether they're part of a living dict. (The dict marks
them as such upon its death.) If they are not, they will decref their
value upon their death. They do not hold a reference to their owner
dict.

If it's part of a living dict, we have a choice: the dict can be
responsible for deletion, or the RefCell can be responsible for
deletion. It doesn't really matter which design we go with.


> What happens if a dict key is removed and a reference object is still
> alive? Is the dict value immediatly destroyed? Does the reference
> object contain a strong or a weak reference to the value?

If a dict key is removed, the inner dict will still have the key
(which is true in the current implementation), but the value will be
decref'd and the value pointer will be NULL'd. The RefCell will not
need to be updated, since (as part of a living dict) it's pointing at
the pointer to the value, not the object itself.

If it is detached from its owner dict (due to owner death), it will
own a strong reference to the value. This is necessarily the case,
since things that have a (strong) reference to the RefCell expect to
find the value (or lack of a value) there.


More information about the Python-Dev mailing list