[Python-Dev] PEP 563: Postponed Evaluation of Annotations

Tue Nov 7 01:09:23 EST 2017

On 7 November 2017 at 09:20, Lukasz Langa <lukasz at langa.pl> wrote:
>
>
>> On Nov 5, 2017, at 11:28 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> On 6 November 2017 at 16:36, Lukasz Langa <lukasz at langa.pl> wrote:
>>
>> - compile annotations like a small nested class body (but returning
>> the expression result, rather than None)
>> - emit MAKE_THUNK instead of the expression's opcodes
>> - emit STORE_ANNOTATION as usual
>>
>
> Is the motivation behind creating thunks vs. reusing lambdas just the difference in handling class-level scope? If so, would it be possible to just modify lambdas to behave thunk-like there? It sounds like this would strictly broaden the functionality of lambdas, in other words, wouldn't create backwards incompatibility for existing code.
>
> Reusing lambdas (with extending them to support class-level scoping) would be a less scary endeavor than introducing a brand new language construct.

I want to say "yes", but it's more "sort of", and at least arguably
"no" (while you'd be using building blocks that already exist inside
the compiler and eval loop, you'd be putting them together in a
slightly new way that's a hybrid of the way lambdas work and the way
class bodies work).

For the code execution part, class creation currently uses
MAKE_FUNCTION (just like lambdas and def statements), and is able to
inject the namespace returned by __prepare__ as the code execution
namespace due to differences in the way the function's code object
gets compiled. These are *exactly* the semantics you'd want for
deferred annotations, but the way they're currently structured
internally is inconvenient for your use case.

Compilation details here:
https://github.com/python/cpython/blob/master/Python/compile.c#L1895
Code execution details here:
https://github.com/python/cpython/blob/master/Python/bltinmodule.c#L167

It's the combination of using COMPILER_SCOPE_CLASS at compilation time
with the direct call to "PyEval_EvalCodeEx" in __build_class__ that
means you can't just treat annotations as a regular lambda function -
lambdas are compiled with CO_OPTIMIZED set, and that means they won't
read local variable values from the provided locals namespace, which
in turn means you wouldn't be able to easily inject "vars(cls)" to
handle the method annotation case.

So that's where the idea of finally adding a "thunk" object came from:
it would be to a class body as a lambda expression is to a function
body, except that instead of relying on a custom call to
PyEval_EvalCodeEx in __build_class__ the way class bodies do, it would
instead define a suitable implementation of tp_call that accepted the
locals namespace to use as a parameter.

The nice part of this approach is that even though it would
technically be a new execution primitive, it's still one with
well-established name resolution semantics: the behaviour we already
use for class bodies.

> With my current understanding I still think stringification is both easier to implement and understand by end users.

No matter how you slice it, PEP 563 *is* defining a new delayed
execution primitive as part of the core language syntax. The question
is whether we define it as a fully compiler integrated primitive, with
clearly specified lexical name resolution semantics that align with
other existing constructs, or something bolted on to the side of the
language without integrating it properly, which we then have to live
with forever.

"This isn't visibly quoted, but it's a string anyway, so the compiler
won't check it for syntax errors" isn't easy to understand. Neither is
the complex set of rules you're proposing for what people will need to
do in order to actually evaluate those strings and turn them back into
runtime objects.

By contrast, "parameter: annotation" can be explained as "it's like
'lambda: expression', but instead of being a regular function with an
explicit parameter list, the annotation is a deferred expression that
accepts a locals namespace to use when called".

> The main usability win of thunks/lambdas is not very significant: evaluating them is as easy as calling them whereas strings require typing.get_type_hints(). I still think being able to access function-local state at time of definition is only theoretically useful.

Your current proposal means that this code will work:

    class C:
        field = 1
        def method(a: C.field):
            pass

But this will fail:

    def make_class():
        class C:
            field = 1
            def method(a: C.field):
                pass

Dropping the "C." prefix would make the single class case work, but
wouldn't help with the nested classes case:

    def make_class():
        class C:
            field = 1
            class D:
                def method(a: C.field):
                    pass

Confusingly, though, this would still work:

    def make_class():
        class C:
            field = 1
            class D:
                field2 = C.field
                def method(a: field2):
                    pass

All of that potential for future confusion around which lexical
references will and won't work for annotation expressions can be
avoided if we impose "Annotations will fully participate in the
regular lexical scoping rules at the point where they appear in the
code" as a design constraint on PEP 563.

> What would be significant though is if thunk/lambdas helped fixing forward references in general. But I can't really see how that could work.

They'd only be able to help with forward references in the general
case if a dedicated thunk expression was added.

For example, combining the generator expression "parentheses are
required" constraint with PEP 312's "simple implicit lambda" syntax
proposal would give:

    T = TypeVar('T', bound=(:UserId))
    UserId = NewType('UserId', (:SomeType))
    Employee = NamedTuple('Employee', [('name', str), ('id', UserId)])

    Alias = Optional[(:SomeType)]
    AnotherAlias = Union[(:SomeType), (:OtherType)]

    cast((:SomeType), value)

    class C(Tuple[(:SomeType), (:OtherType)]): ...

However, rather than producing a zero-argument lambda (as PEP 312
proposed), this would instead produce a thunk object, which could
either be called with zero arguments (thus requiring all names used to
be resolvable as nonlocal, global, or builtin references), or else
with a local namespace to use (which would be checked first for all
variable names).

The main advantage of such a syntax is that it would make it easy for
both humans and computers to distinguish the actual strings from the
lazily evaluated expressions. However, I'd also consider that out of
scope for PEP 563 - it's just a potential future enhancement that
pursuing a thunk-based *solution* to PEP 563 would enable, in a way
that a string-based solution won't.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia