[Python-Dev] Idea: Dictionary references

Victor Stinner victor.stinner at gmail.com
Thu Dec 17 11:50:35 EST 2015


2015-12-17 16:38 GMT+01:00 Franklin? Lee <leewangzhong+python at gmail.com>:
> Yeah, maybe it's out of the scope of bytecode optimization. But I
> think this would make guards unnecessary, since once a name is found,
> there's a quick way to refer to it.

FAT Python requires and supports various kinds of guards:
* type of a function argument
* dictionary key
* function (func.__code__)

I guess that you are talking about the dictionary key guards. In fact,
they are 4 subtypes of dict guards:
* guard on builtins.__dict__['key']
* guard on globals()['key']
* guard on MyClass.__dict__['key']
* guard on dict['key']

The implementation of the check if this guard currently relies on a
new "version" (global dict version, incremented at each change) to
avoid lookup if possible. The guard stores a strong reference to the
key and the value. If the value is different, the guard checks returns
False.

I don't understand how you plan to avoid guards. The purpose of guards
is to respect the Python semantic by falling back to the "slow"
bytecode if something changes. So I don't understand your idea of
avoiding completly guards. Again, that's why I guess that it's
unrelated to FAT Python...

Or it's just that your idea is an alternative to dict versionning to
get efficient guards on dict keys, right?

> Guards would also have an issue
> with nested scopes. You have a note on your website about it:
> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>
>     "The optimization is disabled when the builtin function is
> modified or if a variable with the same name is added to the global
> namespace of the function."

FAT Python doesn't emit a specialized version if it requires a builtin
function, but a local variable with the same name is found.

The check is done in the current function but also in the parent
namespaces, up to the global namespace. I'm talking about the static
analysis of the code.

If the specialized version is built, a guard is created on the builtin
namespace and another on the global namespace.

Sorry, I don't understand the "problem" you are referring to. Can you
maybe show an example of code where FAT Python doesn't work or where
you idea can help?

> RefCells "know" whether they're part of a living dict. (The dict marks
> them as such upon its death.) If they are not, they will decref their
> value upon their death. They do not hold a reference to their owner
> dict.

The dict contains the list all of its "own" RefCell objects, right?

> If it's part of a living dict, we have a choice: the dict can be
> responsible for deletion, or the RefCell can be responsible for
> deletion. It doesn't really matter which design we go with.

I see your RefCell idea like dictionary views. Views are linked to the
dict. If the dict is modified, views are "updated" too.

It would be confusing to have a view disconnected from its container.

In short, RefCell is a view on a single dict entry, to be able to
"watch" a dict entry without the cost of a lookup? And a RefCell can
be created, even if the dict entry doesn't exist right?

Hum, creating many RefCell introduces an overhead somewhere. For
example, deleting a key has to update N RefCell objects linked to this
key, right? So del dict[x] takes O(number of RefCell), right?

> If a dict key is removed, the inner dict will still have the key
> (which is true in the current implementation), but the value will be
> decref'd and the value pointer will be NULL'd. The RefCell will not
> need to be updated, since (as part of a living dict) it's pointing at
> the pointer to the value, not the object itself.

Ok, so deleting a dict key always destroy the value, but the key may
stay alive longer than expected (until all RefCell objects are
destroyed). Usually, dict keys are constants, so keeping them alive
doesn't matter so much. It's rare to have large keys.

> If it is detached from its owner dict (due to owner death), it will
> own a strong reference to the value. This is necessarily the case,
> since things that have a (strong) reference to the RefCell expect to
> find the value (or lack of a value) there.

I don't  understand this part.

You said that deleting a key destroys the value. Destroying a dict
means clearing all keys, so destroying all values. No?

What is the use case of having a RefCell no more connected to the dict?

Victor


More information about the Python-Dev mailing list