[Python-Dev] Third milestone of FAT Python

Franklin? Lee leewangzhong+python at gmail.com
Tue Dec 15 06:23:02 EST 2015


On Sat, Dec 04, 2015 at 7:49 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:

> Versionned dictionary
> =====================
>
> In the previous milestone of FAT Python, the versionned dictionary was a
> new type inherited from the builtin dict type which added a __version__
> read-only (global "version" of dictionary, incremented at each change),
> a getversion(key) method (version of a one key) and it added support for
> weak references.

I was thinking (as an alternative to versioning dicts) about a
dictionary which would be able to return name/value pairs, which would
also be internally used by the dictionary. This would be way less
sensitive to irrelevant changes in the scope dictionary, but cost an
extra pointer to each key.

Here's how it would work:
    pair = scope.item(name)
    scope[name] = newval
    assert pair.value is newval
    assert pair.key is name
    assert pair is scope.item(name)
    # Alternatively, to only create pair objects when `item` is
called, have `==` compare the underlying pair.

    del scope[name]
    assert pair.key is None
    # name-dicts can't have `None` keys
    assert pair.value is None
    # Alternatively, pair.value is scope.NULL

This dict will allow one to hold references to its entries (with the
caller promising not to change them, enforced by exceptions). You
won't have to keep looking up keys (unless the name is deleted), and
functions are allowed to change. For inlining, you can detect whether
the function has been redefined by testing the saved pair.value
against the saved function, and go into the slow path if needed (or
recompile the inlining).

I am not sure whether deleting from the dict and then readding the
same key should reuse the pair container. I think the only potential
issue for the Python version is memory use. There aren't going to be
THAT many names being deleted, right? So I say that deleted things in
the scope dict should not be removed from the inner dict. I predict
that this will simplify a lot of other things, especially when
deleting and readding the same name: if you save a pair, and it
becomes invalid, you don't have to do another lookup to make sure that
it's REALLY gone.

If memory is a real concern, deleted pairs can be weakrefed (and saved
in a second dict?) until they are reused. This way, pairs which aren't
saved by something outside will be removed.

For implementation, a Python implementation of the idea has probably
already been done. Here are some details:
- set: Internally store d._d[k] = k,v.
- get: Reject k if d._d[k].key is None. (Names must be strings.)
- del: Set d._d[k].key = None and .val = d.NULL to invalidate this entry.

For the CPython version, CPython's dict already stores its entries as
PyDictKeyEntry (hash, *key, *value), but those entries can move around
on resizing. Two possible implementations:
1. Fork dict to store {hash, *kv_pair}.
2. Use an inner dict (like in the Python suggestion) where values are
kv_pair. Write the indirection code in C, because scope dicts must be
fast.

For exposing a pair to Python code, here are two possibilities:
1. Make them Python objects in the first place.
2. Keep a second hash table in lockstep with the first (so that you
can do a lookup to find the index in the first, and then use that same
index with the second). In this table, store pair objects that have
been created. (They can be weakrefed, as before. Unless it's
impossible to weakref something you're returning?) This will save
memory for pairs that aren't ever exposed. If compact dictionaries are
implemented, the second hash table will be a second array instead.

To use this kind of scopedict, functions would have to store a list of
used names, which is memory overhead. But for what you're doing, some
overhead will be necessary anyway.


More information about the Python-Dev mailing list