[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

Thu Aug 24 11:02:55 EDT 2017

On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjewett at gmail.com> wrote:
> On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017
> at 12:32 AM, Jim J. Jewett <jimjjewett at gmail.com> wrote:
>
>> The key requirement for using immutable datastructures is to make
>> "get_execution_context" operation fast.
>
> Do you really need the whole execution context, or do you just need
> the current value of a specific key?  (Or, sometimes, the writable
> portion of the context.)

We really need a snapshot of all keys/values.  If we have the following "chain":

   EC = [{'a': 'b'}, {'spam': 'ham'}, {}, {'a': 'foo', 'num': 42}]

we need sys.get_execution_context() to return this:

  {'spam': 'ham', 'a': 'foo', 'num': 42}

To do this with chain map and mutable maps, you need to traverse the
whole chain and merge all dicts -- this is very expensive.  Immutable
datastructures make it possible to avoid merge and traverse (in most
cases) so the operation becomes cheap.

Don't forget, that the EC is a *dynamic* stack -- a generator starts,
it pushes its LogicalContext, etc.

Without a way to get a full "snapshot" of the EC we can't support
asynchronous tasks:

If you look at this small example:

    foo = new_context_key()

    async def nested():
        await asyncio.sleep(1)
        print(foo.get())

    async def outer():
        foo.set(1)
        await nested()
        foo.set(1000)

    l = asyncio.get_event_loop()
    l.create_task(outer())
    l.run_forever()

If will print "1", as "nested()" coroutine will see the "foo" key when
it's awaited.

Now let's say we want to refactor this snipped and run the "nested()"
coroutine with a timeout:

    foo = new_context_key()

    async def nested():
        await asyncio.sleep(1)
        print(foo.get())

    async def outer():
        foo.set(1)
        await asyncio.wait_for(nested(), 10)  # !!!
        foo.set(1000)

    l = asyncio.get_event_loop()
    l.create_task(outer())
    l.run_forever()

So we wrap our `nested()` in a `wait_for()`, which creates a new
asynchronous tasks to run `nested()`.  That task will now execute on
its own, separately from the task that runs `outer()`.  So we need to
somehow capture the full EC at the moment `wait_for()` was called, and
use that EC to run `nested()` within it.  If we don't do this, the
refactored code would print "1000", instead of "1".

If asynchronous tasks topic is not enough: suppose you want to write a
wrapper on top of concurrent.futures.ThreadPoolExecutor to offload
some calculation to a threadpool.  If you can "snapshot" the EC, you
can do it: when you schedule a job, the EC is captured, and it is a
guarantee that it will not be mutated before the job is executed.  And
it's also guaranteed that the job will not randomly mutate the EC of
the code that scheduled it.

>>  Currently, the PEP doesn't do
>> a good job at explaining why we need that operation and why it will be
>> used by asyncio.Task and call_soon, so I understand the confusion.
>
> OK, the schedulers need the whole context, but if implemented as a
> ChainMap (instead of per-key),

I think you are a bit confused here (or I don't follow you) with the
"per-key" ConceptKey design.

With ConceptKeys:

    key = new_concept_key('spam')
    key.set('a')
    # EC = [..., {key: 'a'}]

without:

    ec.set('key', 'a')
    # EC = [..., {'key': 'a'}]

ConceptKey is an extra indirection layer that allows for granular
caching and avoids the names clashing.

> isn't that just a single constant?  As
> in, don't they always schedule to the same thread?  And when they need
> another map, isn't that because the required context is already
> available from whichever code requested the scheduling?
>
>>> (A)  How many values do you expect a typical generator to use?  The
>>> django survey suggested mostly 0, sometimes 1, occasionally 2.  So
>>> caching the values of all possible keys probably won't pay off.
>
>> Not many, but caching is still as important, because some API users
>> want the "get()" operation to be as fast as possible under all
>> conditions.
>
> Sure, but only because they view it as a hot path; if the cost of that
> speedup is slowing down another hot path, like scheduling the
> generator in the first place, it may not be worth it.
>
> According to the PEP timings, HAMT doesn't beat a copy-on-write dict
> until over 100 items, and never beats a regular dict.    That suggests
> to me that it won't actually help the overall speed for a typical (as
> opposed to worst-case) process.

It's a bit more complicated, unfortunately.

The "100 items" it true when we can just memcpy all keys/values of the
dict when we want to clone it.  I've proposed a patch to do that, and
it works.

The old implementation of "dict.copy()" created a new dict, iterated
over items of the old dict, and inserted them to the new one.  This 5x
slower than memcpy.

The subtle difference is that the memcpy implementation does not
resize the dict. The resize is needed when we delete items from it
(which will happen with EC).

I've updated the chart to show what happens when we implement
immutable dict with the current dict.copy() semantics:

https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png

The fundamental problem, though, is that EC is hidden from the user.
Suppose you have a huge application (let's say new YouTube) that uses
a multitude of libraries.  You can then be in a situation that some of
those libraries use the EC, and you actually have 100s of items in it.
Now at least one LC in the chain is big, and its copying becomes
expensive.  You start to experience overhead -- sometimes setting a
variable to the EC is expensive, sometime capturing the context takes
longer, etc.

HAMT guarantees that immutable dicts will have O(log N) performance
for set() and merge (we will need to merge long chains from time to
time, btw).  This means that it's a future proof solution that works
in all edge cases.

>
>>> And, of course, using a ChainMap means that the keys do NOT have to be
>>> predefined ... so the Key class really can be skipped.
>
>> The first version of the PEP had no ContextKey object and the most
>> popular complaint about it was that the key names will clash.
>
> That is true of any global registry.  Thus the use of keys with
> prefixes like com.sun.

Such prefixes aren't considered Pythonic though.

In the end, ContextKey is needed to give us O(1) get() operation
(because of caching).  This is a hard requirement, otherwise we can't
migrate libraries like decimal to use this new API.

>
> The only thing pre-declaring a ContextKey buys in terms of clashes is
> that a sophisticated scheduler would have less difficulty figuring out
> which clashes will cause thrashing in the cache.

I don't understand what you mean by "sophisticated scheduler".

>
> Or are you suggesting that the key can only be declared once (as
> opposed to once per piece of code), so that the second framework to
> use the same name will see a RuntimeError?

ContextKey is declared once for the code that uses it. Nobody else
will use that key. Keys have names only for introspection purposes,
the implementation doesn't use it, iow:

    var = new_context_key('aaaaaa')
    var.set(1)
    # EC = [..., {var: 1}]

    # Note the that EC has a "var" object itself as the key in the
mapping, not "aaaaa".

Yury