PEP 550 leak-in vs leak-out, why not just a ChainMap

In https://mail.python.org/pipermail/python-dev/2017-August/148869.html Nick Coghlan wrote:
* what we want to capture at generator creation time is the context where writes will happen, and we also want that to be the innermost context used for lookups
So when creating a generator, we want a new (empty) ImplicitContext map to be the head of the ChainMap. Each generator should have one of its own, just as each generator has its own frame. And the ChainMap delegation goes up the call stack, just as an exception would. Eventually, it hits the event loop (or other Executor) which is responsible for ensuring that the ChainMap eventually defers to the proper (Chain)Map for this thread or Task.
While the context is defined conceptually as a nested chain of key:value mappings, we avoid using the mapping syntax because of the way the values can shift dynamically out from under you based on who called you ... instead of having the problem of changes inside the generator leaking out, we instead had the problem of changes outside the generator *not* making their way in
I still don't see how this is different from a ChainMap. If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, d_mine] maps as your implicit context, then a change to d_thread map (that some other code could make) will be visible unless it is masked. Similarly, if the fallback for d_C changes from d_B to d_B1 (which points directly to d_thread), that will be visible for any keys that were previously resolved in d_A or d_B, or are now resolved in dB1. Those seem like exactly the cases that would (and should) cause "shifting values". This does mean that you can't cache everything in the localmost map, but that is a problem with the optimization regardless of how the implementation is done. In https://mail.python.org/pipermail/python-dev/2017-August/148873.html Yury Selivanov wrote:
Any code that uses EC will not see any difference [between mutable vs immutable but replaced LC maps], because it can only work with the top LC.
Back to generators. Generators have their own empty LCs when created to store their *local* EC modifications.
OK, so just as they have their own frame, they have their own ChainMap, and the event loop is responsible for resetting the fallback when it schedules them.
When a generator is *being* iterated, it pushes its LC to the EC. When the iteration step is finished, it pops its LC from the EC.
I'm not sure it helps to think of a single stack. When the generator is active, it starts with its own map. When it is in the call chain of the active generator, its map will be in the chain of delegations. When neither it nor a descendant are active, no code will end up delegating to it. If the delegation graph has 543 generators delegating directly to the thread-wide map, there is no reason to pop/push an execution stack every time a different generator is scheduled, since only that generator itself (and code it calls) will even care.
HAMT is a way to efficiently implement immutable mappings ... using regular dicts and copy, set() would be O(log N)
Using a ChainMap, set affects only the localmost map and is therefore O(1). get could require stacksize lookups, but ... (A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off. (B) Other than the truly global context and thread-level context, how many of these maps do you expect to be non-empty? (C) How deep do you expect the stack to get? Are we talking about 100 layers of mappings to check between the current generator and the thread-wide defaults? Even if we are, verifying that there hasn't been a change in some mid-level layer requires tracking the versions of each mid-level layer. (If version is globally unique, that would also ensure that the stack hasn't changed.) Is that really faster than checking that the map is empty? And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped. -jJ

Hi Jim, Sorry, I don't answer all questions/points directly. We are working on a new version of the PEP that will hopefully address most of them. Some comments inlined below: On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote: [..]
I still don't see how this is different from a ChainMap.
Because conceptually it's not different. In the next version of the PEP it will become more apparent. [..]
HAMT is a way to efficiently implement immutable mappings ... using regular dicts and copy, set() would be O(log N)
The key requirement for using immutable datastructures is to make "get_execution_context" operation fast. Currently, the PEP doesn't do a good job at explaining why we need that operation and why it will be used by asyncio.Task and call_soon, so I understand the confusion. This will be fixed in the next PEP version.
Using a ChainMap, set affects only the localmost map and is therefore O(1). get could require stacksize lookups, but ...
Again, the PEP essentially is implemented through a very specialized version of ChainMap, although written in C with a few extra optimizations and properties.
(A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off.
Not many, but caching is still as important, because some API users want the "get()" operation to be as fast as possible under all conditions.
(B) Other than the truly global context and thread-level context, how many of these maps do you expect to be non-empty?
It's likely that most of them will be empty in most cases. Although we can imagine a degenerate case of a recursive generator that modifies EC on each iteration.
(C) How deep do you expect the stack to get? Are we talking about 100 layers of mappings to check between the current generator and the thread-wide defaults? Even if we are, verifying that there hasn't been a change in some mid-level layer requires tracking the versions of each mid-level layer. (If version is globally unique, that would also ensure that the stack hasn't changed.) Is that really faster than checking that the map is empty?
The stack can be as deep as recursion limit.
And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped.
The first version of the PEP had no ContextKey object and the most popular complaint about it was that the key names will clash. The ContextKey design avoids clashing issue entirely and enables fast Python and C APIs (because of the cache). Yury

On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
The key requirement for using immutable datastructures is to make "get_execution_context" operation fast.
Do you really need the whole execution context, or do you just need the current value of a specific key? (Or, sometimes, the writable portion of the context.)
Currently, the PEP doesn't do a good job at explaining why we need that operation and why it will be used by asyncio.Task and call_soon, so I understand the confusion.
OK, the schedulers need the whole context, but if implemented as a ChainMap (instead of per-key), isn't that just a single constant? As in, don't they always schedule to the same thread? And when they need another map, isn't that because the required context is already available from whichever code requested the scheduling?
(A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off.
Not many, but caching is still as important, because some API users want the "get()" operation to be as fast as possible under all conditions.
Sure, but only because they view it as a hot path; if the cost of that speedup is slowing down another hot path, like scheduling the generator in the first place, it may not be worth it. According to the PEP timings, HAMT doesn't beat a copy-on-write dict until over 100 items, and never beats a regular dict. That suggests to me that it won't actually help the overall speed for a typical (as opposed to worst-case) process.
And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped.
The first version of the PEP had no ContextKey object and the most popular complaint about it was that the key names will clash.
That is true of any global registry. Thus the use of keys with prefixes like com.sun. The only thing pre-declaring a ContextKey buys in terms of clashes is that a sophisticated scheduler would have less difficulty figuring out which clashes will cause thrashing in the cache. Or are you suggesting that the key can only be declared once (as opposed to once per piece of code), so that the second framework to use the same name will see a RuntimeError? -jJ

On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
The key requirement for using immutable datastructures is to make "get_execution_context" operation fast.
Do you really need the whole execution context, or do you just need the current value of a specific key? (Or, sometimes, the writable portion of the context.)
We really need a snapshot of all keys/values. If we have the following "chain": EC = [{'a': 'b'}, {'spam': 'ham'}, {}, {'a': 'foo', 'num': 42}] we need sys.get_execution_context() to return this: {'spam': 'ham', 'a': 'foo', 'num': 42} To do this with chain map and mutable maps, you need to traverse the whole chain and merge all dicts -- this is very expensive. Immutable datastructures make it possible to avoid merge and traverse (in most cases) so the operation becomes cheap. Don't forget, that the EC is a *dynamic* stack -- a generator starts, it pushes its LogicalContext, etc. Without a way to get a full "snapshot" of the EC we can't support asynchronous tasks: If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". If asynchronous tasks topic is not enough: suppose you want to write a wrapper on top of concurrent.futures.ThreadPoolExecutor to offload some calculation to a threadpool. If you can "snapshot" the EC, you can do it: when you schedule a job, the EC is captured, and it is a guarantee that it will not be mutated before the job is executed. And it's also guaranteed that the job will not randomly mutate the EC of the code that scheduled it.
Currently, the PEP doesn't do a good job at explaining why we need that operation and why it will be used by asyncio.Task and call_soon, so I understand the confusion.
OK, the schedulers need the whole context, but if implemented as a ChainMap (instead of per-key),
I think you are a bit confused here (or I don't follow you) with the "per-key" ConceptKey design. With ConceptKeys: key = new_concept_key('spam') key.set('a') # EC = [..., {key: 'a'}] without: ec.set('key', 'a') # EC = [..., {'key': 'a'}] ConceptKey is an extra indirection layer that allows for granular caching and avoids the names clashing.
isn't that just a single constant? As in, don't they always schedule to the same thread? And when they need another map, isn't that because the required context is already available from whichever code requested the scheduling?
(A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off.
Not many, but caching is still as important, because some API users want the "get()" operation to be as fast as possible under all conditions.
Sure, but only because they view it as a hot path; if the cost of that speedup is slowing down another hot path, like scheduling the generator in the first place, it may not be worth it.
According to the PEP timings, HAMT doesn't beat a copy-on-write dict until over 100 items, and never beats a regular dict. That suggests to me that it won't actually help the overall speed for a typical (as opposed to worst-case) process.
It's a bit more complicated, unfortunately. The "100 items" it true when we can just memcpy all keys/values of the dict when we want to clone it. I've proposed a patch to do that, and it works. The old implementation of "dict.copy()" created a new dict, iterated over items of the old dict, and inserted them to the new one. This 5x slower than memcpy. The subtle difference is that the memcpy implementation does not resize the dict. The resize is needed when we delete items from it (which will happen with EC). I've updated the chart to show what happens when we implement immutable dict with the current dict.copy() semantics: https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png The fundamental problem, though, is that EC is hidden from the user. Suppose you have a huge application (let's say new YouTube) that uses a multitude of libraries. You can then be in a situation that some of those libraries use the EC, and you actually have 100s of items in it. Now at least one LC in the chain is big, and its copying becomes expensive. You start to experience overhead -- sometimes setting a variable to the EC is expensive, sometime capturing the context takes longer, etc. HAMT guarantees that immutable dicts will have O(log N) performance for set() and merge (we will need to merge long chains from time to time, btw). This means that it's a future proof solution that works in all edge cases.
And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped.
The first version of the PEP had no ContextKey object and the most popular complaint about it was that the key names will clash.
That is true of any global registry. Thus the use of keys with prefixes like com.sun.
Such prefixes aren't considered Pythonic though. In the end, ContextKey is needed to give us O(1) get() operation (because of caching). This is a hard requirement, otherwise we can't migrate libraries like decimal to use this new API.
The only thing pre-declaring a ContextKey buys in terms of clashes is that a sophisticated scheduler would have less difficulty figuring out which clashes will cause thrashing in the cache.
I don't understand what you mean by "sophisticated scheduler".
Or are you suggesting that the key can only be declared once (as opposed to once per piece of code), so that the second framework to use the same name will see a RuntimeError?
ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aaaaaa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "aaaaa". Yury

On Aug 24, 2017 11:02 AM, "Yury Selivanov" <yselivanov.ml@gmail.com> wrote: On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". I would expect 1000 to be the right answer! By the time it runs, 1000 (or mask_errors=false, to use a less toy example) is what its own controlling scope requested. If you are sure that you want the value frozen earlier, please make this desire very explicit ... this example is the first I noticed it. And please explain what this means for things like signal or warning masking. ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aaaaaa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "aaaaa". This I had also not realized. So effectively, they keys are based on object identity, with some safeguards to ensure that even starting with the same (interned) name will *not* produce the same object unless you passed it around explicitly, or are in the same same code unit (file, typically). This strikes me as reasonable, but still surprising. (I think of variables as typically named, rather than identified by address.) So please make this more explicit as well. -jJ

On Wed, Aug 23, 2017 at 9:32 PM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
While the context is defined conceptually as a nested chain of key:value mappings, we avoid using the mapping syntax because of the way the values can shift dynamically out from under you based on who called you ... instead of having the problem of changes inside the generator leaking out, we instead had the problem of changes outside the generator *not* making their way in
I still don't see how this is different from a ChainMap.
If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, d_mine] maps as your implicit context, then a change to d_thread map (that some other code could make) will be visible unless it is masked.
Similarly, if the fallback for d_C changes from d_B to d_B1 (which points directly to d_thread), that will be visible for any keys that were previously resolved in d_A or d_B, or are now resolved in dB1.
Those seem like exactly the cases that would (and should) cause "shifting values".
This does mean that you can't cache everything in the localmost map, but that is a problem with the optimization regardless of how the implementation is done.
It's crucial that the only thing that can effect the result of calling ContextKey.get() is other method calls on that same ContextKey within the same thread. That's what enables ContextKey to efficiently cache repeated lookups, which is an important optimization for code like decimal or numpy that needs to access their local context *extremely* quickly (like, a dict lookup is too slow). In fact this caching trick is just preserving what decimal does right now in their thread-local handling (because accessing the threadstate dict is too slow for them). So we can't expose any kind of mutable API to individual maps, because then someone might call __setitem__ on some map that's lower down in the stack, and break caching.
And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped.
The motivations for the Key class are to eliminate the need to worry about accidental key collisions between unrelated libraries, to provide some important optimizations (as per above), and to make it easy and natural to provide convenient APIs like for saving and restoring the state of a value inside a context manager. Those are all orthogonal to whether the underlying structure is implemented as a ChainMap or as something more specialized. But I tend to agree with your general argument that the current PEP is trying a bit too hard to hide away all this structure where no-one can see it. The above constraints mean that simply exposing a ChainMap as the public API is probably a bad idea. Even if there are compelling performance advantages to fancy immutable-HAMT implementation (I'm in wait-and-see mode on this myself), then there are still a lot more introspection-y operations that could be provided, like: - make a LocalContext out of a {ContextKey: value} dict - get a {ContextKey: value} dict out of a LocalContext - get the underlying list of LocalContexts out of an ExecutionContext - create an ExecutionContext out of a list of LocalContexts - given a ContextKey and a LocalContext, get the current value of the key in the context - given a ContextKey and an ExecutionContext, get out a list of values at each level -n -- Nathaniel J. Smith -- https://vorpus.org
participants (3)
-
Jim J. Jewett
-
Nathaniel Smith
-
Yury Selivanov