[Python-ideas] PEP 550 v2

Wed Aug 16 17:07:41 EDT 2017

On Wed, Aug 16, 2017 at 4:12 PM, Antoine Pitrou <antoine at python.org> wrote:
>
>
> Hi,
>
>> * ``sys.get_execution_context()`` function.  The function returns a
>>   copy of the current EC: an ``ExecutionContext`` instance.
>
> Can you explain the requirement for it being a copy?

When the execution context is used to schedule a function call in a
thread, or an asyncio callback in the futures, we want to take a
snapshot of all items in the EC. In general the recommendation will be
to store immutable data in the context (same as in .NET EC
implementation, or whenever you have some potentially shared state).

>  What do you call a copy exactly?  Does it shallow-copy the stack or does it deep copy the
> context items?

Execution Context is conceptually a stack of Local Contexts. Each
local context is a weak key mapping.  We need a shallow copy of the
EC, which is semantically equivalent to the below snippet:

   new_lc = {}
   for lc in execution_context:
      new_lc.update(lc)
   return ExecutionContext(new_lc)

>
>> * ``uint64_t PyThreadState->unique_id``: a globally unique
>>   thread state identifier (we can add a counter to
>>   ``PyInterpreterState`` and increment it when a new thread state is
>>   created.)
>
> How does this interact with sub-interpreters? (same question for rest of
> the PEP :-))

As long as PyThreadState_Get() works with sub-interpreters, all of the
PEP machinery will work too.

>
>> * O(N) for ``sys.get_execution_context()``, where ``N`` is the
>>   total number of items in the current **execution** context.
>
> Right... but if this is a simple list copy, we are talking about an
> extremely fast O(N):
>
>>>> l = [None] * 1000
>>>> %timeit l.copy()
> 3.76 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>
> (what is "number of items"? number of local contexts? number of
> individual context items?)

"Number of items in the current **execution** context" =
   sum(len(local_context) for local_context in current_execution_context)

Yes, even though making a new list + merging all LCs is a relatively
fast operation, it will need to be performed on *every*
asyncio.call_soon and create_task.  The immutable stack/mappings
solution simply elminates the problem because you can just copy by
reference which is fast.

The #3 approach is implementable with regular dicts + copy() too, it
will be just slower in some cases (explained below).

>
>> We believe that approach #3 enables an efficient and complete Execution
> Context implementation, with excellent runtime performance.
>
> What about the maintenance and debugging cost, though?

Contrary to Python dicts, the implementation scope for hamt mapping is
much smaller -- we only need get, set, and merge operations. No split
dicts, no ordering, etc. With the help of fuzz-testing and out
ref-counting test mode I hope that we'll be able to catch most of the
bugs.

Any solution adds to the total debugging and maintenance cost, but I
believe that in this specific case, the benefits outweigh that cost:

1. Sometimes we'll need to merge many dicts in places like
asyncio.call_soon or async Task objects.

2. "set" operation might resize the dict, making it slower.

3. The "dict.copy()" optimization that the PEP mentions won't be able
to always help us, as we will likely need to often resize the dict.

>
>> Immutable mappings implemented with HAMT have O(log32N) performance
> for both set(), get(), and merge() operations, which is essentially O(1)
> for relatively small mappings
>
> But, for relatively small mappings, regular dicts would also be fast
> enough, right?

If all mappings are relatively small than the answer is close to "yes".

We might want to periodically "squash" (or merge or compact) the chain
of Local Contexts, in which case merging dicts will be more expensive
than merging hamt.

>
> It would be helpful for the PEP to estimate reasonable parameter sizes:
> - reasonable number of context items in a local context

I assume that the number of context items will be relatively low. It's
hard for me to imagine having more than a thousand of them.

> - reasonable number of local contexts in an execution stack

In a simple multi-threaded code we will only have one local context
per execution context.  Every time you run a generator or an
asynchronous task you push a local context to the stack.

Generators will have an optimization -- they will push NULL to the
stack and it will be a NULL until a generator writes to its local
context. It's possible to imagine a degenerative case when a generator
recurses in, say, a 'decimal context' with block, which can
potentially create a long chain of LCs.

Long chains of LCs are not a problem in general -- once the generator
is done, it pops its LCs, thus decreasing the stack size.

Long chains of LCs might become a problem if, deep into recursion, a
generator needs to capture the execution context (say it makes an
asyncio.call_soon() call).  In which case the solution is simple -- we
squash chains that are longer than 5-10-some-predefined-number.

In general, though, EC is something that is there and you can't really
control it. If you have a thousand decimal libraries in your next
YouTube-killer website, you will have large numbers of items in your
Execution Context.  You will inevitably start experiencing slowdowns
of your code that you can't even fix (or maybe even explain).  In this
case, HAMT is a safer bet -- it's a guarantee that you will always
have O(log32) performance for LC-stack-squashing or set operations.
This is the strongest argument in favour of HAMT mapping - we
implement it and it should work for all use-cases, even the for the
unlikely ones.

Yury