[Python-ideas] PEP 550 v2

Wed Aug 23 20:26:19 EDT 2017

On Wed, Aug 23, 2017 at 8:41 AM, Guido van Rossum <guido at python.org> wrote:
> If we're extending the analogy with thread-locals we should at least
> consider making each instantiation return a namespace rather than something
> holding a single value. We have
>
> log_state = threading.local()
> log_state.verbose = False
>
> def action(x):
>     if log_state.verbose:
>         print(x)
>
> def make_verbose():
>     log_state.verbose = True
>
> It would be nice if we could upgrade this to make it PEP 550-aware so that
> only the first line needs to change:
>
> log_state = sys.AsyncLocal("log state")
> # The rest is the same

You can mostly implement this on top of the current PEP 550. Something like:

_tombstone = object()

class AsyncLocal:
    def __getattribute__(self, name):
        # if this raises AttributeError, we let it propagate
        key = object.__getattribute__(self, name)
        value = key.get()
        if value is _tombstone:
            raise AttributeError(name)
        return value

    def __setattr__(self, name, value):
        try:
            key = object.__getattribute__(self, name)
        except AttributeError:
            with some_lock:
                 # double-checked locking pattern
                 try:
                     key = object.__getattribute__(self, name)
                 except AttributeError:
                     key = new_context_key()
                     object.__setattr__(self, name, key)
        key.set(value)

    def __delattr__(self, name):
        self.__setattr__(name, _tombstone)

    def __dir__(self):
        # filter out tombstoned values
        return [name for name in object.__dir__(self) if hasattr(self, name)]

Issues:

Minor problem: On threading.local you can use .__dict__ to get the
dict. That doesn't work here. But this could be done by returning a
mapping proxy type, or maybe it's better not to support at all -- I
don't think it's a big issue.

Major problem: An attribute setting/getting API doesn't give any way
to solve the save/restore problem [1]. PEP 550 v3 doesn't have a
solution to this yet either, but we know we can do it by adding some
methods to context-key. Supporting this in AsyncLocal is kinda
awkward, since you can't use methods on the object -- I guess you
could have some staticmethods, like
AsyncLocal.save_state(my_async_local, name) and
AsyncLocal.restore_state(my_async_local, name, value)? In any case
this kinda spoils the sense of like "oh it's just an object with
attributes, I already know how this works".

Major problem: There are two obvious implementations. The above uses a
separate ContextKey for each entry in the dict; the other way would be
to have a single ContextKey that holds a dict. They have subtly
different semantics. Suppose you have a generator and inside it you
assign to my_async_local.a but not to my_async_local.b, then yield,
and then the caller assigns to my_async_local.b. Is this visible
inside the generator? In the ContextKey-holds-an-attribute approach,
the answer is "yes": each AsyncLocal is a bag of independent
attributes. In the ContextKey-holds-a-dict approach, the answer is
"no": each AsyncLocal is a single container holding a single piece of
(complex) state. It isn't obvious to me which of these semantics is
preferable – maybe it is if you're Dutch :-). But there's a danger
that either option leaves a bunch of people confused.

(Tangent: in the ContextKey-holds-a-dict approach, currently you have
to copy the dict before mutating it every time, b/c PEP 550 currently
doesn't provide a way to tell whether the value returned by get() came
from the top of the stack, and thus is private to you and can be
mutated in place, or somewhere deeper, and thus is shared and
shouldn't be mutated. But we should fix that anyway, and anyway
copy-the-mutate is a viable approach.)

Observation: I don't think there's any simpler way to implement
AsyncLocal other than to start with machinery like what PEP 550
already proposes, and then layer something like the above on top of
it. We could potentially hide the layers inside the interpreter and
only expose AsyncLocal, but I don't think it really simplifies the
implementation any.

Observation: I feel like many users of threading.local -- possibly the
majority -- only put a single attribute on each object anyway, so for
those users a raw ContextKey API is actually more natural and faster.
For example, looking through the core django repo, I see thread locals
in

- django.utils.timezone._active
- django.utils.translation.trans_real._active
- django.urls.base._prefixes
- django.urls.base._urlconfs
- django.core.cache._caches
- django.urls.resolvers.RegexURLResolver._local
- django.contrib.gis.geos.prototypes.threadsafe.thread_context
- django.contrib.gis.geos.prototypes.io.thread_context
- django.db.utils.ConnectionHandler._connections

Of these 9 thread-local objects, 7 of them have only a single
attribute; only the last 2 use multiple attributes. For the first 4,
that attribute is even called "value", which seems like a pretty clear
indication that the authors found the whole local-as-namespace thing a
nuisance to work around rather than something helpful.

I also looked at asyncio; it has 2 threading.locals, and they each
contain 2 attributes. But the two attributes are always read/written
together; to me it would feel more natural to model this as a single
ContextKey holding a small dict or tuple instead of something like
AsyncLocal.

So tl;dr: I think PEP 550 should just focus on a single object per
key, and the subgroup of users who want to convert that to a more
threading.local-style interface can do that themselves as efficiently
as we could, once they've decided how they want to resolve the
semantic issues.

-n

[1] https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py

-- 
Nathaniel J. Smith -- https://vorpus.org