[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

Jim J. Jewett jimjjewett at gmail.com
Thu Aug 24 00:32:45 EDT 2017

In https://mail.python.org/pipermail/python-dev/2017-August/148869.html
Nick Coghlan wrote:

> * what we want to capture at generator creation time is
>   the context where writes will happen, and we also
>   want that to be the innermost context used for lookups

So when creating a generator, we want a new (empty) ImplicitContext
map to be the head of the ChainMap.  Each generator should have one of
its own, just as each generator has its own frame.  And the ChainMap
delegation goes up the call stack, just as an exception would.
Eventually, it hits the event loop (or other Executor) which is
responsible for ensuring that the ChainMap eventually defers to the
proper (Chain)Map for this thread or Task.

> While the context is defined conceptually as a nested chain of
> key:value mappings, we avoid using the mapping syntax because of the
> way the values can shift dynamically out from under you based on who
> called you
> instead of having the problem of changes inside the
> generator leaking out, we instead had the problem of
> changes outside the generator *not* making their way in

I still don't see how this is different from a ChainMap.

If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C,
d_mine]  maps as your implicit context, then a change to d_thread map
(that some other code could make) will be visible unless it is masked.

Similarly, if the fallback for d_C changes from d_B to d_B1 (which
points directly to d_thread), that will be visible for any keys that
were previously resolved in d_A or d_B, or are now resolved in dB1.

Those seem like exactly the cases that would (and should) cause
"shifting values".

This does mean that you can't cache everything in the localmost map,
but that is a problem with the optimization regardless of how the
implementation is done.

In https://mail.python.org/pipermail/python-dev/2017-August/148873.html
Yury Selivanov wrote:

> Any code that uses EC will not see any difference
[between mutable vs immutable but replaced LC maps],
> because it can only work with the top LC.

> Back to generators. Generators have their own empty LCs when created
> to store their *local* EC modifications.

OK, so just as they have their own frame, they have their own
ChainMap, and the event loop is responsible for resetting the fallback
when it schedules them.

> When a generator is *being* iterated, it pushes its LC to the EC. When
> the iteration step is finished, it pops its LC from the EC.

I'm not sure it helps to think of a single stack.  When the generator
is active, it starts with its own map.  When it is in the call chain
of the active generator, its map will be in the chain of delegations.
When neither it nor a descendant are active, no code will end up
delegating to it.

If the delegation graph has 543 generators delegating directly to the
thread-wide map, there is no reason to pop/push an execution stack
every time a different generator is scheduled, since only that
generator itself (and code it calls) will even care.

> HAMT is a way to efficiently implement immutable mappings ...
> using regular dicts and copy, set() would be O(log N)

Using a ChainMap, set affects only the localmost map and is therefore
O(1).  get could require stacksize lookups, but ...

(A)  How many values do you expect a typical generator to use?  The
django survey suggested mostly 0, sometimes 1, occasionally 2.  So
caching the values of all possible keys probably won't pay off.

(B)  Other than the truly global context and thread-level context, how
many of these maps do you expect to be non-empty?

(C)  How deep do you expect the stack to get?  Are we talking about
100 layers of mappings to check between the current generator and the
thread-wide defaults?  Even if we are, verifying that there hasn't
been a change in some mid-level layer requires tracking the versions
of each mid-level layer.  (If version is globally unique, that would
also ensure that the stack hasn't changed.)  Is that really faster
than checking that the map is empty?

And, of course, using a ChainMap means that the keys do NOT have to be
predefined ... so the Key class really can be skipped.


More information about the Python-Dev mailing list