Draft PEP for Improved Type Parameter Syntax
In last month's typing meetup, Sebastian presented some options for improving the syntax for type parameters. I subsequently posted some slides that explored a bunch of options. For reference, here's a link to those slides: https://docs.google.com/presentation/d/1aVHTaj8zGYAvM27uft1ktdthwEKO4KH4qaRv.... In the interest of continuing to make progress on this topic, I've drafted a PEP that places some stakes in the ground. Here's a link to the PEP: https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst. I'm interested in feedback. Since there appears to be broad interest in addressing this problem, perhaps we can use the next typing meetup to continue this discussion. -- Eric Traut Contributor to Pyright & Pylance Microsoft
Thanks for writing this up! It's a great step forward. A few quick comments: - It's a very ambitious proposal, which increases the risk that it gets bogged down due to some small aspect. Concretely, perhaps default values for TypeVars should be saved for another PEP; I know there's already a draft PEP for that feature floating around. - The idea of adding a new lexical scope is clever but may lead to a rabbit hole of runtime subtleties, especially for classes nested in functions. An alternative approach could use an implicit `del` for names defined as TypeVars, similar to the way except blocks work. - "A duplicate name generates a syntax error at runtime" -> should be "compile time" El lun, 20 jun 2022 a las 0:28, Eric Traut (<eric@traut.com>) escribió:
In last month's typing meetup, Sebastian presented some options for improving the syntax for type parameters. I subsequently posted some slides that explored a bunch of options. For reference, here's a link to those slides: https://docs.google.com/presentation/d/1aVHTaj8zGYAvM27uft1ktdthwEKO4KH4qaRv... .
In the interest of continuing to make progress on this topic, I've drafted a PEP that places some stakes in the ground. Here's a link to the PEP: https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst. I'm interested in feedback.
Since there appears to be broad interest in addressing this problem, perhaps we can use the next typing meetup to continue this discussion.
--
Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
I agree that default values could be saved for another PEP. I mainly wanted to convince myself that this proposal wouldn't preclude their addition in the future because of some grammar or runtime constraint. I agree that adding a new lexical scope is fraught with problems. It mostly works, but there are enough problematic edge cases that I think it's not feasible. I've played with the idea of using an implicit `del`, but that doesn't work either. The problem is that TypeVars are often referenced inside a generic function body or the body of methods defined within a class. That means the TypeVar must have a lifetime that extends beyond the evaluation of the `def` or `class` statement. It needs to be part of the closure for inner scopes. I'm still optimistic that some solution exists here, but the scoping logic in the CPython compiler is quite baroque, and I'm still wrapping my head around it. -Eric
On Mon, Jun 20, 2022 at 4:32 PM Eric Traut <eric@traut.com> wrote:
I agree that adding a new lexical scope is fraught with problems. It mostly works, but there are enough problematic edge cases that I think it's not feasible. I've played with the idea of using an implicit `del`, but that doesn't work either. The problem is that TypeVars are often referenced inside a generic function body or the body of methods defined within a class. That means the TypeVar must have a lifetime that extends beyond the evaluation of the `def` or `class` statement. It needs to be part of the closure for inner scopes. I'm still optimistic that some solution exists here, but the scoping logic in the CPython compiler is quite baroque, and I'm still wrapping my head around it.
Hm, this is a big conundrum. The 'del' solution clearly doesn't work, and the same argument also means that just assigning typevars in the containing scope doesn't work either. And I think that the same argument *also* rules out the alternative (IIRC favored by Sebastian) of using a with-statement, since with-statements don't introduce scopes. In all cases, a runtime reference to a typevar from a function or method body could be overwritten by a later redefinition of the same type variable. (The current situation also has that problem, but it is solved by making typevars explicit globals.) Maybe we need to adjust the scope rules so that we can introduce the extra scope holding the typevars without changing the semantics of assignment expressions (the walrus operator)? There's a precedent in PEP 572 (which introduces the walrus operator): a walrus in a comprehension never stores into the function scope used for the comprehension, it stores in the nearest containing "real" function scope. So we could change the rules so that a walrus in an annotation or a class definition argument still writes into the nearest containing explicit scope -- only the typevars are stored in the special scope introduced by the new syntax. Eric, in your [section on this topic]( https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst#compiler-chan...) you write that this "breaks in certain cases involving generics defined within a `class` body." Do you have an example of such a breaking case that's not using a walrus? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Do you have an example of such a breaking case that's not using a walrus?
Yes, it breaks for generics declared within a class body if they reference other symbols declared within that class body. Here are examples of a generic method, generic class, and generic type alias declared within a class body. ```python class A: # This succeeds because it doesn't reference any other symbols # within the class A scope. class B[T]: ... # This fails because B cannot be accessed in the expression "B[T]" # if evaluated within an inner scope. def method[T](self, b: B[T], c: T): ... # This fails because B cannot be accessed in the expression "B[T]" # if evaluated within an inner scope. class C[T](B[T]): ... # This fails because C cannot be accessed in the expression "C[T]" # if evaluated within an inner scope. type MyAlias[T] = C[T] ``` The problem here is that class scopes don't support cell variables, so there's no way for a variable declared within a class body to be part of an inner scope's closure. All such variables are accessible only within the class scope. For this reason, I don't think it will work to introduce a new scope.
Ah, I see. The fact that B is generic doesn't even matter -- this would also not work: class A: class B: ... def method[T](self, arg: B, arg2: T) -> T: ... because this is effectively translated to something like this: class A: class B: ... def _helper(): T = TypeVar("T") def method(self, arg: B, arg2: T) -> T: ... return method method = _helper() and the current scoping rules work in such a way that B cannot be seen inside _helper(). Rewriting B a A.B doesn't work either, since A doesn't appear in the global scope until after the class body has executed. The only solution I can think of would be to define a new kind of scope that has the desired semantics: variable references will first search in that scope (so T is found) and then in its parent scope (so B is found), even if the parent is a class. After that it will follow the regular scoping rules: if the whole thing lives inside a function, that function's scope is visible, but no further class scopes are visible, and of course at the end we have the global and builtin scopes. On Wed, Jun 22, 2022 at 10:11 PM Eric Traut <eric@traut.com> wrote:
Do you have an example of such a breaking case that's not using a walrus?
Yes, it breaks for generics declared within a class body if they reference other symbols declared within that class body.
Here are examples of a generic method, generic class, and generic type alias declared within a class body. ```python class A: # This succeeds because it doesn't reference any other symbols # within the class A scope. class B[T]: ...
# This fails because B cannot be accessed in the expression "B[T]" # if evaluated within an inner scope. def method[T](self, b: B[T], c: T): ...
# This fails because B cannot be accessed in the expression "B[T]" # if evaluated within an inner scope. class C[T](B[T]): ...
# This fails because C cannot be accessed in the expression "C[T]" # if evaluated within an inner scope. type MyAlias[T] = C[T] ```
The problem here is that class scopes don't support cell variables, so there's no way for a variable declared within a class body to be part of an inner scope's closure. All such variables are accessible only within the class scope. For this reason, I don't think it will work to introduce a new scope. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
I've already explored the idea of defining a new kind of scope like you suggested. The problem is that at evaluation time (when the byte codes are being interpreted), names can be accessed only if they appear in the locals or globals. There's no mechanism for accessing a name in "my parent scope's locals". To do this, you'd need to capture it in a closure and add it to the inner scope's "locals". That's not possible for class scopes because they don't support cell variables. We could add new byte codes that loads from the parent scope, but that would be a pretty invasive change — one that probably has implications that I don't fully understand. It also wouldn't fully solve the problem because any approach that involves a new implicit scope breaks `eval` when applied to annotations for forward declarations, which means `typing.get_type_hints` would no longer work for a class of annotations. So I'm pretty convinced at this point that using a new scope won't work. I also explored the idea of creating a `__type_params__` dictionary within the global scope. The global scope is accessible from all scopes via a LOAD_GLOBAL bytecode op. The compiler could assign a globally-unique key for each type parameter defined within the module. TypeVars could then be constructed and cached in the `__type_params__` dictionary using the associated key name. This approach works as long as the compiler is able to analyze all of the code within the module and assign unique key names to each type param. It unfortunately breaks for interactive sessions because the compiler is invoked once for each REPL input, and there's no state retained between compiler invocations. That means the compiler can't guarantee unique key names. The same is true for `eval` called on forward declared annotation expressions. So I've abandoned that approach. I have a few more ideas that I still need to explore. -Eric
Yeah, I see the problems with implementing the extra scope. Maybe you could “mangle” typevar names using the name of the class/def/type they belong to plus a random value (in case there are multiple definitions of the same name). So class C[T]: ivar: T def meth(self, a: T) -> T: … Would become e.g. _C_T_123 = TypeVar(“T”) class C(Generic[_C_T_123]): ivar: _C_T_123 def meth(self, a: _C_T_123) -> _C_T_123: … We could make the random part cryptographically secure to avoid collisions. (The specific mangling is just an example; it could just as easily be T_C_123.) —Guido On Thursday, June 23, 2022, Eric Traut <eric@traut.com> wrote:
I've already explored the idea of defining a new kind of scope like you suggested. The problem is that at evaluation time (when the byte codes are being interpreted), names can be accessed only if they appear in the locals or globals. There's no mechanism for accessing a name in "my parent scope's locals". To do this, you'd need to capture it in a closure and add it to the inner scope's "locals". That's not possible for class scopes because they don't support cell variables.
We could add new byte codes that loads from the parent scope, but that would be a pretty invasive change — one that probably has implications that I don't fully understand. It also wouldn't fully solve the problem because any approach that involves a new implicit scope breaks `eval` when applied to annotations for forward declarations, which means `typing.get_type_hints` would no longer work for a class of annotations. So I'm pretty convinced at this point that using a new scope won't work.
I also explored the idea of creating a `__type_params__` dictionary within the global scope. The global scope is accessible from all scopes via a LOAD_GLOBAL bytecode op. The compiler could assign a globally-unique key for each type parameter defined within the module. TypeVars could then be constructed and cached in the `__type_params__` dictionary using the associated key name. This approach works as long as the compiler is able to analyze all of the code within the module and assign unique key names to each type param. It unfortunately breaks for interactive sessions because the compiler is invoked once for each REPL input, and there's no state retained between compiler invocations. That means the compiler can't guarantee unique key names. The same is true for `eval` called on forward declared annotation expressions. So I've abandoned that approach.
I have a few more ideas that I still need to explore.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile)
Yeah, I explored the idea of name mangling, but I couldn't figure out how to make this work with `eval`, which is needed for `typing.get_type_hints()`. ```python def foo[T](x: "T") -> "list[T]": ... get_type_hints(foo) # This would crash because T isn't available. ``` I'm thinking that we need to add a dict of all active type parameters and include this dict as an attribute (e.g. `__type_params__`) for a generic class, function, or type alias. Then `get_type_hints` could access `__type_params__` and merge the type parameter symbols into the locals dictionary before calling `eval`. This way, when it evaluates `list[T]`, it will be able to resolve the name `T` correctly. ```python class Foo[T]: def bar[S](self, a: "T", b: "S") -> "S": ... print(Foo.__type_params__) # { "T": ~T } print(Foo.bar.__type_params__) # { "T": ~T, "S": ~S } print(get_type_hints(Foo.bar)) # { "a": ~T, "b": ~S, "return": ~S } ``` What do you think of that? I may have also come up with a way to revive the idea of using an extra scope. If the extra scope is introduced within a class body (the problematic case I discussed above), any symbols that are local to the class body but required by the inner scope could be passed in as explicit arguments. They don't need to be cell variables if we assume they are read-only within the inner scope. ```python class Parent: class Child1[T]: ... class Child2[T](Child1[T]): ... ``` Would effectively be translated to: ```python class Parent: def __temp1(T): class Child1: ... return Child1 Child1 = __temp1(TypeVar("T")) def __temp2(T, Child1): class Child2(Child1): ... return Child2 Child2 = __temp2(TypeVar("T"), Child1) # Child1 is passed as an explicit argument ``` Thoughts?
On Fri, Jun 24, 2022 at 12:56 AM Eric Traut <eric@traut.com> wrote:
Yeah, I explored the idea of name mangling, but I couldn't figure out how to make this work with `eval`, which is needed for `typing.get_type_hints()`.
```python def foo[T](x: "T") -> "list[T]": ...
get_type_hints(foo) # This would crash because T isn't available. ```
But do we need to support that? Assuming the SC chooses PEP 649 over 563 for 3.12, there would be no need to use string quotes for forward declarations, (OTOH, if 563 is selected, there would be strings everywhere and we would need to handle this.)
I'm thinking that we need to add a dict of all active type parameters and include this dict as an attribute (e.g. `__type_params__`) for a generic class, function, or type alias. Then `get_type_hints` could access `__type_params__` and merge the type parameter symbols into the locals dictionary before calling `eval`. This way, when it evaluates `list[T]`, it will be able to resolve the name `T` correctly.
```python class Foo[T]: def bar[S](self, a: "T", b: "S") -> "S": ...
print(Foo.__type_params__) # { "T": ~T } print(Foo.bar.__type_params__) # { "T": ~T, "S": ~S } print(get_type_hints(Foo.bar)) # { "a": ~T, "b": ~S, "return": ~S } ```
What do you think of that?
It might work, but it doesn't feel elegant. Then again, I worry that we've over-constrained the problem and hence no solution will feel quite right.
I may have also come up with a way to revive the idea of using an extra scope. If the extra scope is introduced within a class body (the problematic case I discussed above), any symbols that are local to the class body but required by the inner scope could be passed in as explicit arguments. They don't need to be cell variables if we assume they are read-only within the inner scope.
```python class Parent: class Child1[T]: ...
class Child2[T](Child1[T]): ... ```
Would effectively be translated to:
```python class Parent: def __temp1(T): class Child1: ... return Child1 Child1 = __temp1(TypeVar("T"))
def __temp2(T, Child1): class Child2(Child1): ... return Child2 Child2 = __temp2(TypeVar("T"), Child1) # Child1 is passed as an explicit argument ```
Thoughts?
That doesn't look bad, better than mangling the names. We'd basically have to analyze all type parameters *and all default values* looking for variables defined at the class level, and do this to them. It feels related to lambda lifting (https://en.wikipedia.org/wiki/Lambda_lifting). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
It occurred to me that if you are your are going to pursue this last strategy, you have to fix the __qualname__ attribute of the defined class/function/alias to strip the helper function. Whether to also strip it from type variables I am not sure. —Guido On Fri, Jun 24, 2022 at 16:04 Guido van Rossum <guido@python.org> wrote:
On Fri, Jun 24, 2022 at 12:56 AM Eric Traut <eric@traut.com> wrote:
Yeah, I explored the idea of name mangling, but I couldn't figure out how to make this work with `eval`, which is needed for `typing.get_type_hints()`.
```python def foo[T](x: "T") -> "list[T]": ...
get_type_hints(foo) # This would crash because T isn't available. ```
But do we need to support that? Assuming the SC chooses PEP 649 over 563 for 3.12, there would be no need to use string quotes for forward declarations, (OTOH, if 563 is selected, there would be strings everywhere and we would need to handle this.)
I'm thinking that we need to add a dict of all active type parameters and include this dict as an attribute (e.g. `__type_params__`) for a generic class, function, or type alias. Then `get_type_hints` could access `__type_params__` and merge the type parameter symbols into the locals dictionary before calling `eval`. This way, when it evaluates `list[T]`, it will be able to resolve the name `T` correctly.
```python class Foo[T]: def bar[S](self, a: "T", b: "S") -> "S": ...
print(Foo.__type_params__) # { "T": ~T } print(Foo.bar.__type_params__) # { "T": ~T, "S": ~S } print(get_type_hints(Foo.bar)) # { "a": ~T, "b": ~S, "return": ~S } ```
What do you think of that?
It might work, but it doesn't feel elegant.
Then again, I worry that we've over-constrained the problem and hence no solution will feel quite right.
I may have also come up with a way to revive the idea of using an extra scope. If the extra scope is introduced within a class body (the problematic case I discussed above), any symbols that are local to the class body but required by the inner scope could be passed in as explicit arguments. They don't need to be cell variables if we assume they are read-only within the inner scope.
```python class Parent: class Child1[T]: ...
class Child2[T](Child1[T]): ... ```
Would effectively be translated to:
```python class Parent: def __temp1(T): class Child1: ... return Child1 Child1 = __temp1(TypeVar("T"))
def __temp2(T, Child1): class Child2(Child1): ... return Child2 Child2 = __temp2(TypeVar("T"), Child1) # Child1 is passed as an explicit argument ```
Thoughts?
That doesn't look bad, better than mangling the names. We'd basically have to analyze all type parameters *and all default values* looking for variables defined at the class level, and do this to them. It feels related to lambda lifting (https://en.wikipedia.org/wiki/Lambda_lifting).
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
-- --Guido (mobile)
I played around more with the strategy of using an additional scope but passing in all referenced symbols as arguments ("lambda lifting"). This approach unfortunately adds _significant_ complexity to the compiler. It also doesn't address some edge cases such as the use of walrus operators in the inner scope. I'm also worried about the runtime overhead and the impact on compatibility with debuggers and runtime type checking libraries. For those reasons, I abandoned the approach. The good news is that I think I've come up with a solution that addresses all of the requirements, doesn't add significant complexity to the compiler, and shouldn't have any compatibility impact on debuggers. The key is to abandon the requirement that every reference to a type variable "T" refer to the same object. Instead, I have the compiler generate code to construct a new "type var proxy" object each time "T" is referenced. Since type variables are typically referenced only in type annotations (which are evaluated once — or never if "from __future__ import annotations" is in effect) and in specialized base classes, I'm not concerned about the runtime overhead of allocating one of these objects each time. This limits the compiler changes to the following: 1. The symtable.c module needs to track which type parameters are "active" as it recursively explores the AST. If it sees a local symbol name (a local variable, a parameter in a function scope, a class variable in a class scope, etc.) that conflicts with one of the active type parameters, it emits a syntax error. 2. When the compiler.c module is generating byte codes and comes across a name that was marked as a type parameter by the symtable.c module, it emits byte codes to construct a "typing.TypeParameter(name)" object. 3. When the compiler.c module emits byte codes for a class statement with new-style type parameters, it implicitly adds "Generic" to the base class list. I have this all working in a fork of cpython if you want to check it out: https://github.com/erictraut/cpython/tree/type_param_syntax I can go over the details in tomorrow's typing meetup discussion. I've updated the draft PEP to reflect these changes. I've also made the following additional changes based on feedback: 1. I've scaled back the proposal and removed sections on "default type arguments", explicit specialization of generic functions, and recursive type aliases. These can be covered in other PEPs. 2. I've switched from an "extends" keyword to a simple colon when specifying the upper bound or the constraints for the type variable. This feels more Pythonic. I've implemented provisional support for most of this in pyright (currently in a private branch) to prove out the type checking side of it. I can demo this in tomorrow's typing meetup. -- Eric Traut Contributor to Pyright & Pylance Microsoft
On Mon, Jun 27, 2022 at 11:51 PM Eric Traut <eric@traut.com> wrote:
The good news is that I think I've come up with a solution that addresses all of the requirements, doesn't add significant complexity to the compiler, and shouldn't have any compatibility impact on debuggers. The key is to abandon the requirement that every reference to a type variable "T" refer to the same object. Instead, I have the compiler generate code to construct a new "type var proxy" object each time "T" is referenced. Since type variables are typically referenced only in type annotations (which are evaluated once — or never if "from __future__ import annotations" is in effect) and in specialized base classes, I'm not concerned about the runtime overhead of allocating one of these objects each time.
I haven't thought through this carefully, but this seems to solve all the concerns with runtime semantics that haven been raised. The main compromise seems to be that expressions that refer to type variables, such as "cast(list[T], x)", will be a bit slower? In my experience these are rare and the performance impact should be totally acceptable. This adds a new kind of namespacing concept to Python, but this seems justified to me, since the alternative ways of achieving the desired semantics (of which there seems to be little doubt) are much more complicated.
I've updated the draft PEP to reflect these changes. I've also made the following additional changes based on feedback: 1. I've scaled back the proposal and removed sections on "default type arguments", explicit specialization of generic functions, and recursive type aliases. These can be covered in other PEPs. 2. I've switched from an "extends" keyword to a simple colon when specifying the upper bound or the constraints for the type variable. This feels more Pythonic.
I like these updates! I think that the PEP looks more streamlined and fits in better with the rest of the language now. The PEP could perhaps be more convincing to people outside the typing community if there were more examples extracted from real-world projects. Also, what about adding statistics about how often type variables and generic classes and protocols are used? Finally, have you asked for feedback from some more typical users of typing (that are not active on typing-sig and don't work on type checkers)? In particular, maybe it would be helpful to present the PEP to developers who use type annotations for runtime purposes, and to ask if they have concerns about the runtime semantics. The feedback could then be summarized in the PEP. I'm not sure if this is commonly done in PEPs, but it could be helpful here since a lot of developers still don't use type annotations regularly (or at all). Jukka
[I'm responding to myself here...] On Tue, Jun 28, 2022 at 4:44 PM Jukka Lehtosalo <jlehtosalo@gmail.com> wrote:
... Finally, have you asked for feedback from some more typical users of typing (that are not active on typing-sig and don't work on type checkers)? In particular, maybe it would be helpful to present the PEP to developers who use type annotations for runtime purposes, and to ask if they have concerns about the runtime semantics. The feedback could then be summarized in the PEP. I'm not sure if this is commonly done in PEPs, but it could be helpful here since a lot of developers still don't use type annotations regularly (or at all).
I also wonder if this new syntax would appease some of those Python users who've complained about what they perceive as the "bolted-on" nature of Python type hinting. In quite a few online discussions (outside typing-sig and other spaces with a lot of typing users) I've seen people who are unhappy about this aspect, in particular. I'm not sure how we could easily determine if this is the case, though. Jukka
I agree with Jukka. Sadly I can't make it to the meeting but Eric's latest proposal (generating a new object for each occurrence of `T`) makes sense to me. As for the `extends` syntax, I had considered modifying the lexer to make '<:' a token, since that symbol seems to be used in this meaning in some of the academic literature and maybe even some languages. But otherwise, the "call" syntax would have my preference (e.g., T(bound=Base)), since it is extensible without further changes to the parser or compiler -- we just pass everything through to a `TypeVar()` call. PS. Alas, I cannot attend today's meeting.
On second read, I'm very happy with new explicit type alias syntax. This looks like make runtime introspection of type aliases will be easier. The current introspection makes accessing original name difficult (useful for tools that need readable annotations + recursive types) and there's a number of weird edge cases with aliases and introspection. I'd be hopeful get_type_hints note here, https://docs.python.org/3.11/library/typing.html#typing.get_type_hints, on cross module edge cases will just work with new aliases.
From a runtime type inspection perspective I'm avoidant of scoping rules being introduced that would case accessing type variable outside that scope to be an error. As a simple example, class Foo(Generic[T]): value: T type_hints = get_type_hints(Foo) currently returns {"value": T}. Several libraries do want to access a type variable being there outside of the class. It can be useful for documentation generation, serialization code based on types, or runtime validation checks. T needs to not be deleted I think for this to work. A more niche case I'd like to have some working equivalent is, class Foo(Generic[T]): value: T Foo.__orig_bases__ currently gives Generic[T]. Some way of getting the type variables that class is parameterized by would be nice to have. This one I'm more ok if exact code needs to change given __orig_bases__ is minimally documented as long as information stays inspectable. auto_variance would be nice to have utility function in standard library/typing inspect to identify which variance each type variable has at runtime. This is minor though. I think runtime variance usage is rather rare.
Is it worth mentioning in the PEP that future PEPs are free to introduce further uses of the `type` keyword? There was smorgasbord of ideas in a typing-sig thread [1], like typing-only imports, that's perhaps worth explicitly leaving the door open to? [1]: https://mail.python.org/archives/list/typing-sig@python.org/thread/LV22PX454...
I'd like to thank everyone who attended today's typing meetup and provided feedback on the proposal. I wasn't able to take notes during the discussion, but here are a few pieces of feedback that I jotted down after the meeting ended. (Pradeep or other attendees, please feel free to add anything that you think is material to the discussion.) * There seemed to be a general consensus that square bracket notation was the best option for type parameter declaration. * There also seemed to be a general consensus that a "punctuation" approach was best for the TypeVar "mini language" syntax. (Guido wasn't able to attend the meeting, but he mentioned above that he prefers the "function" approach for extensibility. The consensus was that the function approach was too verbose and harder to understand. The punctuation approach is admittedly less extensible, but we have some options since the bound expression is a general value expression, so it could support additional expression forms — including call expressions — without changing the grammar.) * Sebastian suggested that there might be better alternatives to the colon token for specifying an upper bound or constraints. Potential replacements include "<" or "<:". (Note: I think I still prefer colon here, but that might be because of familiarity.) * Several people objected to a compile-time check that prevents a locally-bound name from overlapping with a type parameter used in a "def", "class" or "type" statement within that same scope. I agree with that feedback, and I've subsequently updated the prototype cpython implementation and the PEP to address this feedback. I mentioned during the meeting that I need help from authors of runtime type checking libraries to help me understand what techniques they currently rely on — and whether the proposal breaks any key assumptions. Tin (attrs & cattrs) and Jelle (pyanalyze) were present for the discussion, but we'll need to get feedback from others as well. -- Eric Traut Contributor to Pyright & Pylance Microsoft
Thanks, Eric, for the talk and your work on the PEP! As usual, the recording and slides are available in the running Typing Meetup notes doc [1] Adding to Eric's notes from the meetup: + Eric mentioned that he's explicitly not including syntax for generic Callables in this PEP. That is, we are not looking to have syntax for `Callable[T][[T], T]`, etc. + Sebastian: I think when Jukka first suggested the `type` keyword, he had some more examples where such a keyword would be useful. + The PEP suggests `T: int` as a replacement for `TypeVar("T", bound=int)`. One concern about the upper-bound syntax is that it might be confusing for users who see `x: int` as a variable annotation elsewhere. That syntax usually means that any value of type compatible with `int` can be assigned to `x`, but that `x` will be seen as having type `int`. That is not the case for `T` with upper bound `int`, since it will be specialized to the specific type that was passed by the user. So, it might be worth having syntax such as `T <: int` or `T extends int`. + Regarding extensibility of the TypeVar "mini-language", Eric mentioned that, while it is harder to extend the punctuation syntax, we might get away by adding information within the type annotations that we use for the upper bound, etc. + One idea suggested by Kevin was that we allow keyword arguments when specializing types. For example, if we have `GenericParent[T, R]`, we might have `class Base(GenericParent[R, T=int]): ...`. Note that PEP 637 (Support for indexing with keyword arguments) [2] was rejected [3]. Though, the SC did say, "The strongest argument for the new syntax comes from the typing side of Python", so it might be worth trying again in the future (not in this PEP). [1]: https://docs.google.com/document/d/17iqV7WWvB0IwA43EPlIqlUS6Xuvk08X3sEudAA-g... [2]: https://peps.python.org/pep-0637 [3]: https://mail.python.org/archives/list/python-dev@python.org/thread/6TAQ2BEVS... On Tue, Jun 28, 2022 at 3:53 PM Eric Traut <eric@traut.com> wrote:
I'd like to thank everyone who attended today's typing meetup and provided feedback on the proposal.
I wasn't able to take notes during the discussion, but here are a few pieces of feedback that I jotted down after the meeting ended. (Pradeep or other attendees, please feel free to add anything that you think is material to the discussion.)
* There seemed to be a general consensus that square bracket notation was the best option for type parameter declaration.
* There also seemed to be a general consensus that a "punctuation" approach was best for the TypeVar "mini language" syntax. (Guido wasn't able to attend the meeting, but he mentioned above that he prefers the "function" approach for extensibility. The consensus was that the function approach was too verbose and harder to understand. The punctuation approach is admittedly less extensible, but we have some options since the bound expression is a general value expression, so it could support additional expression forms — including call expressions — without changing the grammar.)
* Sebastian suggested that there might be better alternatives to the colon token for specifying an upper bound or constraints. Potential replacements include "<" or "<:". (Note: I think I still prefer colon here, but that might be because of familiarity.)
* Several people objected to a compile-time check that prevents a locally-bound name from overlapping with a type parameter used in a "def", "class" or "type" statement within that same scope. I agree with that feedback, and I've subsequently updated the prototype cpython implementation and the PEP to address this feedback.
I mentioned during the meeting that I need help from authors of runtime type checking libraries to help me understand what techniques they currently rely on — and whether the proposal breaks any key assumptions. Tin (attrs & cattrs) and Jelle (pyanalyze) were present for the discussion, but we'll need to get feedback from others as well.
-- Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: gohanpra@gmail.com
-- S Pradeep Kumar
Am 29.06.22 um 00:53 schrieb Eric Traut:
* Sebastian suggested that there might be better alternatives to the colon token for specifying an upper bound or constraints. Potential replacements include "<" or "<:". (Note: I think I still prefer colon here, but that might be because of familiarity.)
While most other punctuation is familiar from other parts of Python, colons are only used as block start and for type annotations so far. My suggestion was to use "T(int)" to signal a sub-type relationship, similar to how sub-class relationships are declared, but that might not be obvious either. As Python is usually a fairly verbose, keyword-based language, maybe it could also make sense to use a keyword for this. Regarding extensibility: Apart from defaults (which we will need in a foreseeable future), what features do other languages have for type variables that we don't yet? Is it even likely that we will extend type variables in the future? - Sebastian
On Wed, Jun 29, 2022 at 1:50 AM Sebastian Rittau <srittau@rittau.biz> wrote:
Am 29.06.22 um 00:53 schrieb Eric Traut:
* Sebastian suggested that there might be better alternatives to the
colon token for specifying an upper bound or constraints. Potential replacements include "<" or "<:". (Note: I think I still prefer colon here, but that might be because of familiarity.)
While most other punctuation is familiar from other parts of Python, colons are only used as block start and for type annotations so far. My suggestion was to use "T(int)" to signal a sub-type relationship, similar to how sub-class relationships are declared, but that might not be obvious either. As Python is usually a fairly verbose, keyword-based language, maybe it could also make sense to use a keyword for this.
Actually I like T(int) a lot, and it avoids giving new meanings to ':' or inventing new tokens. And if we declare the syntax as being a function call parameter list, we can get creative with future extensions like default=X. Only problem is that it would look like T(str, bytes) might supply a "type constraint" (like AnyStr) but we want T(int) to behave like TypeVar("T", bound=int). This is solvable though without adding new syntax.
Regarding extensibility: Apart from defaults (which we will need in a foreseeable future), what features do other languages have for type variables that we don't yet? Is it even likely that we will extend type variables in the future?
I dunno. Regarding your other message, I agree that the runtime solution still feels hacky. I'm not sure what new fundamental idea we could introduce though -- maybe a new "configurable" scope that is visible for reads but invisible for writes? IIRC Eric rejected this after finding out how tricky it would be to implement (new bytecodes etc.), but maybe that's still the right solution. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Actually I like T(int) a lot, and it avoids giving new meanings to ':'
I'm somewhat negative on `T(int)`. It looks to me like a function call, which isn't the right mental model in this case. It's "giving new meaning to a function call syntax" which is well established in Python. I guess one could argue that it's intended to look like a constructor call, but "constructing" type parameters is a weird mental model. No other programming language that supports generics forces users to think of "constructing" type parameters. IMO, the `:` provides the appropriate mental model. A type parameter is similar to a function parameter, just a meta-level above. Like a function parameter, a type parameter can have a type that constrains its usage when arguments are provided. Function arguments are provided by function calls at runtime, whereas type arguments are provided through implicit or explicit specialization at type checking time. So I don't think this gives new meaning to the colon token. I think it's using the colon consistently with how it's already used elsewhere in the language. -Eric
El vie, 1 jul 2022 a las 10:48, Eric Traut (<eric@traut.com>) escribió:
Actually I like T(int) a lot, and it avoids giving new meanings to ':'
I'm somewhat negative on `T(int)`. It looks to me like a function call, which isn't the right mental model in this case. It's "giving new meaning to a function call syntax" which is well established in Python.
I guess one could argue that it's intended to look like a constructor call, but "constructing" type parameters is a weird mental model. No other programming language that supports generics forces users to think of "constructing" type parameters.
IMO, the `:` provides the appropriate mental model. A type parameter is similar to a function parameter, just a meta-level above. Like a function parameter, a type parameter can have a type that constrains its usage when arguments are provided. Function arguments are provided by function calls at runtime, whereas type arguments are provided through implicit or explicit specialization at type checking time. So I don't think this gives new meaning to the colon token. I think it's using the colon consistently with how it's already used elsewhere in the language.
It's a bit different though in that the type parameter's bound is not the type of the variable. If we have something like `def f[T, U: int](x: T, y: U)`, it kind of looks like `U` is a variable of type int and `T` is unannotated. What about using <: like in Scala? def f[T, U <: int](x: T, y: U): ...
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
It's a bit different though in that the type parameter's bound is not the type of the variable
I don't think it's different. When you annotate a function parameter with `Foo`, you are not saying that the type must be a `Foo` object at runtime. Rather, you're saying that the parameter is constrained to a type that is compatible with `Foo` — i.e. a subtype thereof. It's the same logic with the upper bound for a type parameter. When you say that it is bounded by `Foo`, you are not saying that it is type `Foo`, but rather that its type is constrained by `Foo` — i.e. it must be a subtype thereof. I'm not completely opposed to using `<:` like in Scala, but I don't think most Python users will have ever seen (or heard of) Scala, so this token will seem very foreign to them. Also, I don't think it works well with constrained type parameters. To inform this discussion, I've updated the draft PEP to include an "Appendix A: Survey of Type Parameter Syntax". Here's a direct link: https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst#appendix-a-su... I looked at C++, Java, C#, TypeScript, Scala, Swift, Rust, Kotlin, and Julia. There are four patterns that emerge for specifying constraints on a type parameter: 1. Java and TypeScript use the "extends" keyword 2. C# and Rust use the "where" keyword (and place the clause at the end of the declaration — something that wouldn't work well for Python's grammar) 3. Scala and Julia use the "<:" operator 4. Swift, Rust and Kotlin use a colon If we think that we will want to support a lower bound at some point in the future, that would argue in favor of "<:" because ">:" is the logical counterpart. However, I don't think there's a strong need for specifying a lower bound, and most languages forego this capability. For that reason, I still favor using a colon here. Note that none of these languages use a function call or constructor-like syntax. Adopting that approach would make Python an outlier among all other popular languages. -Eric
On Fri, Jul 1, 2022 at 7:20 PM Eric Traut <eric@traut.com> wrote:
It's a bit different though in that the type parameter's bound is not the type of the variable
I don't think it's different. When you annotate a function parameter with `Foo`, you are not saying that the type must be a `Foo` object at runtime. Rather, you're saying that the parameter is constrained to a type that is compatible with `Foo` — i.e. a subtype thereof.
It's the same logic with the upper bound for a type parameter. When you say that it is bounded by `Foo`, you are not saying that it is type `Foo`, but rather that its type is constrained by `Foo` — i.e. it must be a subtype thereof.
I think Jelle is talking about the type of the thing before the colon in relation to the thing after the colon. When you say 'T: int' in a type parameter clause, that means roughly 'issubclass(T, int)'. But when you write 'a: int' in a parameter list, that means 'isinstance(a, int)'. However, I'm not convinced this is a big enough problem to reject ':' outright.
I'm not completely opposed to using `<:` like in Scala, but I don't think most Python users will have ever seen (or heard of) Scala, so this token will seem very foreign to them. Also, I don't think it works well with constrained type parameters.
I've seen this used in papers about static types as well, without introduction, so apparently it's a standard operator in that community ("Consider types *T* and *S* s.t. *T* <: *S* ... etc.). However, it's always taken me a bit of thinking about the context to derive that '<:' means "subclass" (i.e., "extends") and not "superclass".
To inform this discussion, I've updated the draft PEP to include an "Appendix A: Survey of Type Parameter Syntax". Here's a direct link: https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst#appendix-a-su...
That link no longer works (probably because I merged the PR :-). Here's a working link: https://github.com/python/peps/blob/main/pep-0695.rst#appendix-a-survey-of-t...
I looked at C++, Java, C#, TypeScript, Scala, Swift, Rust, Kotlin, and Julia.
There are four patterns that emerge for specifying constraints on a type parameter: 1. Java and TypeScript use the "extends" keyword 2. C# and Rust use the "where" keyword (and place the clause at the end of the declaration — something that wouldn't work well for Python's grammar) 3. Scala and Julia use the "<:" operator 4. Swift, Rust and Kotlin use a colon
Rust uses "extends" *and* a colon? (I think I get it -- it uses the colon but you can also use a "where" clause.)
If we think that we will want to support a lower bound at some point in the future, that would argue in favor of "<:" because ">:" is the logical counterpart. However, I don't think there's a strong need for specifying a lower bound, and most languages forego this capability. For that reason, I still favor using a colon here.
The call-like syntax also easily accommodates this: just add a new keyword parameter 'lower_bound=...'.
Note that none of these languages use a function call or constructor-like syntax. Adopting that approach would make Python an outlier among all other popular languages.
Not that we care all that much about that (none of them use indentation either :-). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
I find that it's useful to play with proposed language features in a code editor to get a sense for how they feel. I just published version 1.1.257 of pyright, and it contains support for all of the functionality described in the draft PEP 695. To use the new syntax in a ".py" file, you will need to create a "pyrightconfig.json" file in the root directory of your project and add `{ "pythonVersion": "3.12" }`. This isn't necessary if you are editing ".pyi" files. -Eric
Rust uses "extends" *and* a colon? (I think I get it -- it uses the colon but you can also use a "where" clause.)
Yeah in Rust `fn fun<T: Sized>(a: T) {}` is syntactic sugar over fn fun<T>(a: T) where T: Sized {} where clauses are actually incredibly powerful, but as Eric points out, they probably wouldn't work well for Python. class Example[T] # () would go here?? where T: int: ... def foo[B](b: B) -> B where B: int: ... Though now that I've written these examples this actually doesn't look too terrible? Maybe I just have used Rust too much recently :) Anyway, to the issue at hand, I expect people will be confused if we do T(int), for example, consider this declaration: class Bar[T(int)](Baz[T]): ... That's a *lot* of brackets and parentheses in a small amount of space, and it makes it harder to find an unbalanced bracket or parenthesis. I also don't particularly like <:. While it is used in type theory papers and Scala, most people that use types have read neither of those, and it will be unfamiliar. I think the utility of lower bounds is probably unlikely to be worth their introduction (though perhaps someone has a use case I haven't thought of!), so I feel that the colon syntax is the best candidate: it is familiar, and it is the easiest to read, which is of course a matter of opinion, but perhaps it would be good to do a poll about this? On Sat, Jul 2, 2022 at 7:38 PM Guido van Rossum <guido@python.org> wrote:
On Fri, Jul 1, 2022 at 7:20 PM Eric Traut <eric@traut.com> wrote:
It's a bit different though in that the type parameter's bound is not the type of the variable
I don't think it's different. When you annotate a function parameter with `Foo`, you are not saying that the type must be a `Foo` object at runtime. Rather, you're saying that the parameter is constrained to a type that is compatible with `Foo` — i.e. a subtype thereof.
It's the same logic with the upper bound for a type parameter. When you say that it is bounded by `Foo`, you are not saying that it is type `Foo`, but rather that its type is constrained by `Foo` — i.e. it must be a subtype thereof.
I think Jelle is talking about the type of the thing before the colon in relation to the thing after the colon. When you say 'T: int' in a type parameter clause, that means roughly 'issubclass(T, int)'. But when you write 'a: int' in a parameter list, that means 'isinstance(a, int)'.
However, I'm not convinced this is a big enough problem to reject ':' outright.
I'm not completely opposed to using `<:` like in Scala, but I don't think most Python users will have ever seen (or heard of) Scala, so this token will seem very foreign to them. Also, I don't think it works well with constrained type parameters.
I've seen this used in papers about static types as well, without introduction, so apparently it's a standard operator in that community ("Consider types *T* and *S* s.t. *T* <: *S* ... etc.). However, it's always taken me a bit of thinking about the context to derive that '<:' means "subclass" (i.e., "extends") and not "superclass".
To inform this discussion, I've updated the draft PEP to include an "Appendix A: Survey of Type Parameter Syntax". Here's a direct link: https://github.com/erictraut/peps/blob/typeparams/pep-9999.rst#appendix-a-su...
That link no longer works (probably because I merged the PR :-). Here's a working link: https://github.com/python/peps/blob/main/pep-0695.rst#appendix-a-survey-of-t...
I looked at C++, Java, C#, TypeScript, Scala, Swift, Rust, Kotlin, and Julia.
There are four patterns that emerge for specifying constraints on a type parameter: 1. Java and TypeScript use the "extends" keyword 2. C# and Rust use the "where" keyword (and place the clause at the end of the declaration — something that wouldn't work well for Python's grammar) 3. Scala and Julia use the "<:" operator 4. Swift, Rust and Kotlin use a colon
Rust uses "extends" *and* a colon? (I think I get it -- it uses the colon but you can also use a "where" clause.)
If we think that we will want to support a lower bound at some point in the future, that would argue in favor of "<:" because ">:" is the logical counterpart. However, I don't think there's a strong need for specifying a lower bound, and most languages forego this capability. For that reason, I still favor using a colon here.
The call-like syntax also easily accommodates this: just add a new keyword parameter 'lower_bound=...'.
Note that none of these languages use a function call or constructor-like syntax. Adopting that approach would make Python an outlier among all other popular languages.
Not that we care all that much about that (none of them use indentation either :-).
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: ethan@ethanhs.me
I posted a survey on a Microsoft-internal Python forum to get feedback about the proposed syntax options from other Python developers who are not part of the typing community. I received 19 responses. As with any survey data, the results should be taken with a grain of salt. Question 1: Do you use type annotations (sometimes called “type hints”) in Python? A (0%): No, I never use type annotations in Python B (32%): I occasionally use type annotations in Python C (68%): I frequently use type annotations in Python Question 2: Have you used “generic types”? A (0%): No, I don't know what that means B (47%): I have used generics in other programming languages but not Python C (53%): I have used generics (TypeVars) in Python Question 3: To define a generic class in Python, which of the following syntax options would you prefer? A (5%): using (K, V) class CustomDict(dict[K, V]): ... B (42%): class CustomDict<K, V>(dict[K, V]): ... C (42%): class CustomDict[K, V](dict[K, V]): ... D (11%): Other (one said "any of the above" and another said "anything but A") Question 4: To indicate that a type variable “T” must be compatible with a particular type (say, a “str”), which of the following syntax options would you prefer? A (5%): class MyClass[T <: str]: ... B (79%): class MyClass[T: str]: ... C: (5%): class MyClass[T(bound=str)]: ... D: (5%): class MyClass[T extends str]: ... E: (0%): class MyClass[T where T <: str]: ... F: (5%): Other ("any of the above") Conclusions: Both angle and square brackets are OK with this audience. As we previously discussed, angle brackets has some major problems in Python, so square brackets makes sense here. A simple colon for type parameter bounds is strongly preferred by this audience. -Eric
Am 29.06.22 um 00:53 schrieb Eric Traut:
I'd like to thank everyone who attended today's typing meetup and provided feedback on the proposal.
Also, apart from syntax bikeshedding, my main concern is the implementation side. During the presentation, Eric mentioned that he tried various approaches that all didn't work for various reasons. The solution he came up with works, but to me - not having intimate knowledge how the Python interpreter and scoping in Python works - it sounded a bit "hacky" and as if it could introduce unforeseen conflicts with how people expect scoping in Python to work. In the past, Python has often opted for a more generic approach in cases like these, that often opened up new design space. The best example is probably the descriptor protocol that goes far beyond only enabling @property. Maybe we could take a step back and think about a more general approach to the scoping problem. This isn't the most pragmatic approach, but would probably be better for Python as a whole in the long term and might increase our chances of acceptance by both the SC and the Python community. (But again: I might have misunderstood the potential problems in Eric's approach.) - Sebastian
This is a highly constrained problem, so I think our options are rather limited. I agree that the solution I've proposed feels somewhat inelegant, but by comparison to everything else I've explored, it's the simplest solution that addresses all of the requirements and constraints. I'm open to other ideas if someone has a better suggestion. Given my current understanding of the way the Python compiler works and the constraints and requirements that we've discussed, I don't see any better options. Let's take a deep look at the implications of my proposed implementation and see if we can convince ourselves that they're acceptable in terms of future extensibility. The main problems I see with my proposed approach are: 1. If you "eval" a snippet of code that refers to a type parameter defined by an outer scope, that "eval" operation will fail unless you manually include symbols in the `locals` dictionary that are stand-ins for the active type parameters. This may be problematic for runtime type checkers. 2. At runtime, you receive a different object each time you refer to a type parameter. That's because it generates a proxy object rather than referring to a common object that represents the parameter. It's similar to what happens if you use the expression `[]` multiple times in your code; each time a new list object will be constructed. 3. I'm not sure how existing Python debuggers will handle type parameters. It depends on the introspection mechanism they currently use when evaluating expressions. If you want to play with the current implementation, I have a fork of cpython available. Here's the link: https://github.com/erictraut/cpython/tree/type_param_syntax - Eric
I agree the problem of the scope of type variables is thorny. I worry that thinking about it too much in terms of "fewest changes to the compiler" might over-constrain the solution space though. I re-read some earlier messages in this thread. For a while Eric seemed in favor of putting typevars in a new scope, especially after hitting upon the idea of lambda lifting, but eventually soured on it for several reasons: compiler complexity, walrus operators in the inner scope, and worries about runtime overhead and debugger compatibility. **Compiler complexity.** This is the least worrisome to me. If it's the right thing for the user, we should do it even if it's complex to implement. (This is often a tenet of Python these days, despite what you may read in the Zen of Python.) The notion that occurrences of T are translated to an expression that at runtime constructs a new instance of some dummy type doesn't seem easy to explain to users, and I worry about 3rd party tools that do runtime introspection of annotations (not just runtime type checkers). **Walrus in the inner scope.** There are only two types of places where a walrus could occur: type annotations (for arguments and return value) and default values. I don't think a walrus in a type annotation would be useful (and I believe most static type checkers don't allow them, since annotations are required to follow a simpler syntax), so I assume this is about defaults. Example: ``` def foo[T](arg: T = (x := f())) -> T: ... ``` Using the lambda lifting approach we could compute the default in the outer scope and pass it into the helper: ``` def __foo(T, __arg): def foo(arg: T = __arg) -> T: ... return foo foo = __foo(TypeVar("T"), (x := f())) ``` Now, this means we can't use a walrus in the type annotation, but I don't think that's a useful pattern. Certainly static type checkers don't allow it. It's likely that someone, somewhere is using a walrus in a type annotation that's only for consumption by some dynamic tool, but that (in my expectation highly uncommon) scenario could easily be fixed by moving the walrus out of the annotation into an assignment statement preceding the function definition: ``` # Old, would not work with lambda lifting: def foo(arg: X := Y): ... # New, also works in 3.11 and before: TypeOfArg = (X := Y) def foo(arg: TypeOfArg): ... ``` So I think it's fine to declare any use of a walrus in an annotation illegal. Or perhaps only when occurring in a class -- that's s similar constraint as PEP 572 imposes on using a walrus in a comprehension: ``` class C: a = [x := i for i in range(3)] ``` This gives the following error: ``` SyntaxError: assignment expression within a comprehension cannot be used in a class body ``` **Runtime overhead** I presume this is a concern about the cost of the extra function definition and call used by lambda lifting. This would be the biggest concern for generic functions (class construction is already relatively slow). It would turn every (generic) function definition into two function definitions plus a call. A highly unscientific experiment using `timeit` tells me that on my computer the simplest definition costs 65 ns and the simplest call costs 35 ns. OTOH, calling `TypeVar("T")` costs 850 ns. So what are we even talking about? At best this concern is premature. **Debugger compatibility** I don't want to speculate about this, but I note that 3.11 broke compatibility for debuggers and we are addressing this (for the long term) by adding better APIs for debuggers to do what they want to do. **Other stuff** In a private message Eric mentioned to me that his current prototype generates bytecode to import e.g. TypeVar from the typing module. A production-quality implementation would have to reimplement TypeVar, TypeVarTuple, ParamSpec and Generic in C. That's a fair amount of work, but I'm confident that we can do it if the PEP is accepted. As long as it's a proof-of-concept prototype I don't think it will matter. **Final words** Despite several hurdles, I still favor the introduction of an actual new scope holding the type variables belonging to a given generic class, function or type alias, over Eric's (indubitably clever) current hack. --Guido PS. Regarding the syntax to indicate subclass constraints, I am fine with ":". On Fri, Jul 1, 2022 at 10:38 AM Eric Traut <eric@traut.com> wrote:
This is a highly constrained problem, so I think our options are rather limited.
I agree that the solution I've proposed feels somewhat inelegant, but by comparison to everything else I've explored, it's the simplest solution that addresses all of the requirements and constraints. I'm open to other ideas if someone has a better suggestion. Given my current understanding of the way the Python compiler works and the constraints and requirements that we've discussed, I don't see any better options.
Let's take a deep look at the implications of my proposed implementation and see if we can convince ourselves that they're acceptable in terms of future extensibility.
The main problems I see with my proposed approach are: 1. If you "eval" a snippet of code that refers to a type parameter defined by an outer scope, that "eval" operation will fail unless you manually include symbols in the `locals` dictionary that are stand-ins for the active type parameters. This may be problematic for runtime type checkers. 2. At runtime, you receive a different object each time you refer to a type parameter. That's because it generates a proxy object rather than referring to a common object that represents the parameter. It's similar to what happens if you use the expression `[]` multiple times in your code; each time a new list object will be constructed. 3. I'm not sure how existing Python debuggers will handle type parameters. It depends on the introspection mechanism they currently use when evaluating expressions.
If you want to play with the current implementation, I have a fork of cpython available. Here's the link: https://github.com/erictraut/cpython/tree/type_param_syntax
- Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
**Compiler complexity.**
I agree that compiler complexity by itself shouldn't dissuade us from considering a particular solution. On the other hand, any time a solution starts to feel overly complex, my "engineering spidey sense" kicks in and tells me that I'm probably on the wrong path. And in this case, my spidey sense was tingling. :) It turns out that complexity alone isn't the only problem with this approach. Lambda lifting works OK for global and function scopes. However, it has problems when used within a class scope. The problem is that symbols declared within a class scope are no longer visible within the lambda. Class-scoped symbols cannot be captured in an inner scope, so any annotation in a "def" statement or base class or keyword argument in a "class" statement will generate a runtime error if it refers to a class-scoped symbol. ```python class Foo: class A: ... class B[T](A): ... # Error: A is not defined def foo[T](self, a: A): ... # Error: A is not defined ``` We could introduce the idea of a "read-only captured variable" that doesn't require a cell variable. That addresses the problem, but it involves a significant change to the compiler and (depending on the implementation) also to the runtime structures. It also introduces other problems... Let's assume that we implement "read-only captured variables". Consider the following: ```python class Outer: class A: ... class B[T](A): print(A) ``` Today, the `print` statement would fail because it references `A` which is not visible inside the class `B` body. But if it becomes visible within the implied lambda, it will also be accessible to inner scopes like `B`. Even weirder is this case: ```python def foo(): A = 0 class Outer: class A: ... class B[T](A): nonlocal A print(A) ``` Today, the `A` referenced by the print statement refers to the `A` in the outer scope `foo`. If we were to introduce an implied lambda, it would reference the `A` in the implied lambda, which happens to be the `A` in `Outer`. Definitely non-intuitive. There is still another problem here related to forward-reference annotations. This solution depends on the compiler having visibility into which class-scoped symbols are referenced from within the lambda. Unfortunately, there are cases where the compiler doesn't have such visibility — namely, with forward-referenced annotations. Consider the following, which works today. ```python def outer(): def inner[T]() -> "A": return 3 A = int ``` When the forward-referenced annotation "A" is later evaluated through `get_type_hints()`, it will fail with lambda lifting whereas it succeeds today. We could decide that it is OK to break this use case, but I consider this yet another strike against the lambda lifting solution.
**Walrus in the inner scope.**
The issue here is not with default argument values (which can easily be evaluated in the outer scope) or annotations (which should never contain walrus operators, as you point out). The problem is with base classes and keyword argument expressions within a class declaration. Although it's unlikely that someone would use a walrus operator here, they are permitted today. ```python class Foo(x := Base, metaclass=(y := Metaclass)): ... ``` We could disallow walrus operators in these situations when the new syntax is used. Probably not a big deal, but it is an odd special case that would need to be documented and justified to the SC. Yet another strike against this approach.
**Runtime overhead**
In addition to the runtime overhead that you've mentioned (which I agree is probably not significant enough to be concerned about), we have to also consider: 1. It will generate additional cell variables and captures. Accesses these variables requires double dereferencing in both the inner scope and outer scopes. 2. If we implement the "read-only captured variable" idea mentioned above, the cost of passing additional arguments to the lambda. That cost will depend on the number of class-scoped variables that are referenced within annotations (for "def" statements) and base classes plus keyword arguments (for "class" statements). I could easily be convinced that these also don't represent enough runtime overhead to be concerned about, but I mention them for completeness.
3.11 broke compatibility for debuggers and we are addressing this
OK, that's good to know.
Despite several hurdles, I still favor the introduction of an actual new scope
Let me know what you think about my points above. If you're still in favor of adding a new scope, I'm game to give it a try and prototype it. -Eric
Jella and I met yesterday, and he provided me with additional insights about how runtime type checkers use type information. From that conversation, I concluded that my proposed approach (the one that involves type parameter "proxies") probably wouldn't meet the needs of runtime type checkers. We did some additional brainstorming, and I spent some time today exploring alternatives. I think I have a design that meets all of the requirements. Like the previous proposal, this new design doesn't require the use of "lambda lifting", which I still consider problematic for a number of reasons. The new design involves two new fields in the "frame" object at runtime. These fields track which type parameters are "live" at runtime. Both of these fields can be NULL or refer to a tuple of TypeVar-like objects (TypeVar, TypeVarTuple, ParamSpec). The first field is called "f_outer_typeparams" and contains all of the type parameters defined by outer scopes. The second field is called "f_local_typeparams". It contains all of the outer type parameters plus any type parameters that are temporarily needed for a generic "class", "def" or "type" statement in the current scope. This proposal introduces two new opcodes: EXTEND_TYPEPARAMS and LOAD_TYPEPARAM. The EXTEND_TYPEPARAMS op builds a new "f_local_typeparams" tuple from the "f_outer_typeparams" tuple and n new TypeVar-like objects that have been pushed onto the stack. The LOAD_TYPEPARAM op loads a single type parameter from "f_local_typeparams" referenced by numeric index. The compiler tracks the indices of all "live" type parameters, so it's able to emit the appropriate index as part of the opcode. When a new scope is entered (e.g. during a function or lambda call or the execution of a comprehension), the "f_local_typeparams" tuple is copied into the "f_outer_typeparams" field of the next scope. Through this mechanism, all of the "live" type parameter objects are always available even in inner scopes. Here's an example: ```python # At this point, f_outer_typeparams and f_local_typeparams are NULL # The compiler emits code to construct two TypeVar objects for `A` and `B`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B). When evaluating the expression `dict[A, B]`, the reference to # `A` generates a LOAD_TYPEPARAM(0), and the reference to `B` generates a # LOAD_TYPEPARAM(1) op. class Outer[A, B](dict[A, B]): # At this point, f_outer_typeparams and f_local_typeparams contain (A, B). # The compiler emits code to construct two TypeVar objects for `C` and `D`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B, C, D). When evaluating the expression `dict[B, D]`, the reference to # `B` generates a LOAD_TYPEPARAM(1), and the reference to `D` generates a # LOAD_TYPEPARAM(3) op. class Inner[C, D](dict[B, D]): ... # At this point, f_outer_typeparams and f_local_typeparams contain (A, B). # The compiler emits code to construct one TypeVar object for `X`, then # emits a EXTEND_TYPEPARAMS(1) op. This builds a new f_local_typeparams tuple # that contains (A, B, X). When evaluating the type annotations for parameters `a`, # `b` and `x`, the appropriate LOAD_TYPEPARAM ops are generated. def method[X](self, a: A, b: B, x: X): ... ```
print(Outer.__parameters__) (~A, ~B) print(Outer.Inner.__parameters__) (~C, ~D) print(Outer.method.__annotations__) {'a': ~A, 'b': ~B, 'x': ~X}
The f_outer_typeparams value is also stored in a function object in a new internal field called func_typeparams. This allows allows f_outer_typeparams to be restored to the current scope when resuming a coroutine, executing an async function, calling a lambda, etc. Other than the addition of two new fields to the frame object, one new field in the function object, and two new opcodes, this proposal adds no significant complexity to the compiler or the runtime. Performance impact should be negligible. It works well with the PEP 649 proposal for deferred evaluation of annotations, and it works well with today's (non-deferred) annotation evaluation. One small complexity is with forward-referenced annotations and the use of `get_type_hints()`, which will need to populate the `locals` dictionary with the "live" type parameters before calling `eval()` on the annotation string. The live type parameters are available on function, class, and type alias objects via a new `__typeparams__` attribute.
print(Outer.__typeparams__) (~A, ~B) print(Outer.Inner.__typeparams__) (~A, ~B, ~C, ~D) print(Outer.method.__typeparams__) (~A, ~B, ~X)
Let me know if you see any holes or have concerns with this proposal. If you're interested in looking at the CPython implementation, check out this branch: https://github.com/erictraut/cpython/commits/type_param_syntax2. Assuming that this design is amenable to the typing community, my next step is to post an update to the PEP that describes the design. At that point, I think we can notify the python-dev community that the PEP is ready for their review. -Eric
I am still on vacation so this is a preliminary reaction. I like this better than the previous version, since it almost feels like a scope to me. I'm not sure how much I like the implementation -- two extra object pointers per frame object that are rarely used seems a lot (though the cost of managing these on frame entry/exit may well be the more significant cost). But presumably another approach might offer the same semantics -- it almost feels like we could use the existing cell mechanism somehow. I'm not sure I understand why the type params need to be restored when a lambda is called (probably because your mechanism is different than cells :-). Maybe I'll figure it out after studying your implementation some more. On Fri, Jul 8, 2022 at 6:20 PM Eric Traut <eric@traut.com> wrote:
Jella and I met yesterday, and he provided me with additional insights about how runtime type checkers use type information. From that conversation, I concluded that my proposed approach (the one that involves type parameter "proxies") probably wouldn't meet the needs of runtime type checkers. We did some additional brainstorming, and I spent some time today exploring alternatives. I think I have a design that meets all of the requirements.
Like the previous proposal, this new design doesn't require the use of "lambda lifting", which I still consider problematic for a number of reasons.
The new design involves two new fields in the "frame" object at runtime. These fields track which type parameters are "live" at runtime. Both of these fields can be NULL or refer to a tuple of TypeVar-like objects (TypeVar, TypeVarTuple, ParamSpec). The first field is called "f_outer_typeparams" and contains all of the type parameters defined by outer scopes. The second field is called "f_local_typeparams". It contains all of the outer type parameters plus any type parameters that are temporarily needed for a generic "class", "def" or "type" statement in the current scope. This proposal introduces two new opcodes: EXTEND_TYPEPARAMS and LOAD_TYPEPARAM. The EXTEND_TYPEPARAMS op builds a new "f_local_typeparams" tuple from the "f_outer_typeparams" tuple and n new TypeVar-like objects that have been pushed onto the stack. The LOAD_TYPEPARAM op loads a single type parameter from "f_local_typeparams" referenced by numeric index. The compiler tracks the indices of all "live" type parameters , so it's able to emit the appropriate index as part of the opcode. When a new scope is entered (e.g. during a function or lambda call or the execution of a comprehension), the "f_local_typeparams" tuple is copied into the "f_outer_typeparams" field of the next scope. Through this mechanism, all of the "live" type parameter objects are always available even in inner scopes.
Here's an example: ```python # At this point, f_outer_typeparams and f_local_typeparams are NULL
# The compiler emits code to construct two TypeVar objects for `A` and `B`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B). When evaluating the expression `dict[A, B]`, the reference to # `A` generates a LOAD_TYPEPARAM(0), and the reference to `B` generates a # LOAD_TYPEPARAM(1) op.
class Outer[A, B](dict[A, B]): # At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct two TypeVar objects for `C` and `D`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B, C, D). When evaluating the expression `dict[B, D]`, the reference to # `B` generates a LOAD_TYPEPARAM(1), and the reference to `D` generates a # LOAD_TYPEPARAM(3) op.
class Inner[C, D](dict[B, D]): ...
# At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct one TypeVar object for `X`, then # emits a EXTEND_TYPEPARAMS(1) op. This builds a new f_local_typeparams tuple # that contains (A, B, X). When evaluating the type annotations for parameters `a`, # `b` and `x`, the appropriate LOAD_TYPEPARAM ops are generated.
def method[X](self, a: A, b: B, x: X): ... ```
print(Outer.__parameters__) (~A, ~B) print(Outer.Inner.__parameters__) (~C, ~D) print(Outer.method.__annotations__) {'a': ~A, 'b': ~B, 'x': ~X}
The f_outer_typeparams value is also stored in a function object in a new internal field called func_typeparams. This allows allows f_outer_typeparams to be restored to the current scope when resuming a coroutine, executing an async function, calling a lambda, etc.
Other than the addition of two new fields to the frame object, one new field in the function object, and two new opcodes, this proposal adds no significant complexity to the compiler or the runtime. Performance impact should be negligible. It works well with the PEP 649 proposal for deferred evaluation of annotations, and it works well with today's (non-deferred) annotation evaluation.
One small complexity is with forward-referenced annotations and the use of `get_type_hints()`, which will need to populate the `locals` dictionary with the "live" type parameters before calling `eval()` on the annotation string. The live type parameters are available on function, class, and type alias objects via a new `__typeparams__` attribute.
print(Outer.__typeparams__) (~A, ~B) print(Outer.Inner.__typeparams__) (~A, ~B, ~C, ~D) print(Outer.method.__typeparams__) (~A, ~B, ~X)
Let me know if you see any holes or have concerns with this proposal. If you're interested in looking at the CPython implementation, check out this branch: https://github.com/erictraut/cpython/commits/type_param_syntax2.
Assuming that this design is amenable to the typing community, my next step is to post an update to the PEP that describes the design. At that point, I think we can notify the python-dev community that the PEP is ready for their review.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Looking at the implementation, I think this can be significantly simplified (and avoiding a lot of the pitfalls that Guido mentioned), by considering the following: - You don't really care about the run-time call chain (which could go up or down your syntactic tree of function/class defs), and that's what the frames represent. You really care about the static structure. You're workarounding that by pulling f_outer_typeparams from the function object's fun_typeparams, each time you start a new frame.... but in the end that means you don't need to keep that in the frame. Each time you use frame->f_outer_typeparams, you could actually be using frame->f_func->func_typeparams, so that is no longer needed - The f_local_typeparams is kind of volatile. You always create it for a function definition, use it a few opcodes later, and then its content is no longer relevant. I'm pretty sure this could end up somewhere in the interpreter stack (and it goes away after your MAKE_FUNCTION, or the creation of the class) You shouldn't need cells for this, and I think it can be done with minor changes to your existing implementation. In fact, if you want to go the extra mile, given that the typevar structure of scopes is known statically(it's the one in the source and execution independent), and typevars are immutable, you could create typevars in compile-time and store them into the code objects rather than the functions. Then you could have a LOAD_TYPEVAR which works very similar to LOAD_CONST pulling the typevar. There are some complexities for this (especially in changes to code objects and their deserialization): if you want to keep the identity of typevars from outer scopes to inner scopes you need to pass the containing code block typevars when building (or unmarshalling!) code objects, and you need to define a way to serialize bounds at compile time in a way that marshall can support that (perhaps stringify them, or add some sort of function to evaluate them later PEP-649 style). That would be super fast: import my_mod would ensure you have all your typevars created, you'll have exactly 1 typevar per typevar definition in your source, and the compiler doesn't need to do much more than you're already doing. The only runtime const of looking up a typevar is going from the frame to the code object to the typevar tuple, and indexing. Hope this helps! D. On Sat, 9 Jul 2022 at 02:20, Eric Traut <eric@traut.com> wrote:
Jella and I met yesterday, and he provided me with additional insights about how runtime type checkers use type information. From that conversation, I concluded that my proposed approach (the one that involves type parameter "proxies") probably wouldn't meet the needs of runtime type checkers. We did some additional brainstorming, and I spent some time today exploring alternatives. I think I have a design that meets all of the requirements.
Like the previous proposal, this new design doesn't require the use of "lambda lifting", which I still consider problematic for a number of reasons.
The new design involves two new fields in the "frame" object at runtime. These fields track which type parameters are "live" at runtime. Both of these fields can be NULL or refer to a tuple of TypeVar-like objects (TypeVar, TypeVarTuple, ParamSpec). The first field is called "f_outer_typeparams" and contains all of the type parameters defined by outer scopes. The second field is called "f_local_typeparams". It contains all of the outer type parameters plus any type parameters that are temporarily needed for a generic "class", "def" or "type" statement in the current scope. This proposal introduces two new opcodes: EXTEND_TYPEPARAMS and LOAD_TYPEPARAM. The EXTEND_TYPEPARAMS op builds a new "f_local_typeparams" tuple from the "f_outer_typeparams" tuple and n new TypeVar-like objects that have been pushed onto the stack. The LOAD_TYPEPARAM op loads a single type parameter from "f_local_typeparams" referenced by numeric index. The compiler tracks the indices of all "live" type parameters , so it's able to emit the appropriate index as part of the opcode. When a new scope is entered (e.g. during a function or lambda call or the execution of a comprehension), the "f_local_typeparams" tuple is copied into the "f_outer_typeparams" field of the next scope. Through this mechanism, all of the "live" type parameter objects are always available even in inner scopes.
Here's an example: ```python # At this point, f_outer_typeparams and f_local_typeparams are NULL
# The compiler emits code to construct two TypeVar objects for `A` and `B`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B). When evaluating the expression `dict[A, B]`, the reference to # `A` generates a LOAD_TYPEPARAM(0), and the reference to `B` generates a # LOAD_TYPEPARAM(1) op.
class Outer[A, B](dict[A, B]): # At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct two TypeVar objects for `C` and `D`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B, C, D). When evaluating the expression `dict[B, D]`, the reference to # `B` generates a LOAD_TYPEPARAM(1), and the reference to `D` generates a # LOAD_TYPEPARAM(3) op.
class Inner[C, D](dict[B, D]): ...
# At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct one TypeVar object for `X`, then # emits a EXTEND_TYPEPARAMS(1) op. This builds a new f_local_typeparams tuple # that contains (A, B, X). When evaluating the type annotations for parameters `a`, # `b` and `x`, the appropriate LOAD_TYPEPARAM ops are generated.
def method[X](self, a: A, b: B, x: X): ... ```
print(Outer.__parameters__) (~A, ~B) print(Outer.Inner.__parameters__) (~C, ~D) print(Outer.method.__annotations__) {'a': ~A, 'b': ~B, 'x': ~X}
The f_outer_typeparams value is also stored in a function object in a new internal field called func_typeparams. This allows allows f_outer_typeparams to be restored to the current scope when resuming a coroutine, executing an async function, calling a lambda, etc.
Other than the addition of two new fields to the frame object, one new field in the function object, and two new opcodes, this proposal adds no significant complexity to the compiler or the runtime. Performance impact should be negligible. It works well with the PEP 649 proposal for deferred evaluation of annotations, and it works well with today's (non-deferred) annotation evaluation.
One small complexity is with forward-referenced annotations and the use of `get_type_hints()`, which will need to populate the `locals` dictionary with the "live" type parameters before calling `eval()` on the annotation string. The live type parameters are available on function, class, and type alias objects via a new `__typeparams__` attribute.
print(Outer.__typeparams__) (~A, ~B) print(Outer.Inner.__typeparams__) (~A, ~B, ~C, ~D) print(Outer.method.__typeparams__) (~A, ~B, ~X)
Let me know if you see any holes or have concerns with this proposal. If you're interested in looking at the CPython implementation, check out this branch: https://github.com/erictraut/cpython/commits/type_param_syntax2.
Assuming that this design is amenable to the typing community, my next step is to post an update to the PEP that describes the design. At that point, I think we can notify the python-dev community that the PEP is ready for their review.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: dfmoisset@gmail.com
Ah, this explains why Eric added the new field to the function object. It sounds great if it could *only* be a new, optional field on function objects. The function is already accessible from the frame (`f_func`). Constructing a function object already takes a bunch of optional things, I think there are a few bits free still (check out MAKE_FUNCTION). The tuple of typevars can be constructed on top of the stack and popped off it by MAKE_FUNCTION if the appropriate bit is set. While it would be great if it could be done, I don't think the typevars can be like a const on a code object -- some typevars have an upper bound which is a reference to something that might be computed at runtime (e.g. a Union, or an imported class), so this wouldn't work. (Happy to learn if there is a way, it would make the problem of generating code to import typing mostly go away, or at least not have a cost in extra bytecode.) On Sun, Jul 10, 2022 at 1:49 AM Daniel Moisset <dfmoisset@gmail.com> wrote:
Looking at the implementation, I think this can be significantly simplified (and avoiding a lot of the pitfalls that Guido mentioned), by considering the following:
- You don't really care about the run-time call chain (which could go up or down your syntactic tree of function/class defs), and that's what the frames represent. You really care about the static structure. You're workarounding that by pulling f_outer_typeparams from the function object's fun_typeparams, each time you start a new frame.... but in the end that means you don't need to keep that in the frame. Each time you use frame->f_outer_typeparams, you could actually be using frame->f_func->func_typeparams, so that is no longer needed - The f_local_typeparams is kind of volatile. You always create it for a function definition, use it a few opcodes later, and then its content is no longer relevant. I'm pretty sure this could end up somewhere in the interpreter stack (and it goes away after your MAKE_FUNCTION, or the creation of the class)
You shouldn't need cells for this, and I think it can be done with minor changes to your existing implementation.
In fact, if you want to go the extra mile, given that the typevar structure of scopes is known statically(it's the one in the source and execution independent), and typevars are immutable, you could create typevars in compile-time and store them into the code objects rather than the functions. Then you could have a LOAD_TYPEVAR which works very similar to LOAD_CONST pulling the typevar. There are some complexities for this (especially in changes to code objects and their deserialization): if you want to keep the identity of typevars from outer scopes to inner scopes you need to pass the containing code block typevars when building (or unmarshalling!) code objects, and you need to define a way to serialize bounds at compile time in a way that marshall can support that (perhaps stringify them, or add some sort of function to evaluate them later PEP-649 style). That would be super fast: import my_mod would ensure you have all your typevars created, you'll have exactly 1 typevar per typevar definition in your source, and the compiler doesn't need to do much more than you're already doing. The only runtime const of looking up a typevar is going from the frame to the code object to the typevar tuple, and indexing.
Hope this helps!
D.
On Sat, 9 Jul 2022 at 02:20, Eric Traut <eric@traut.com> wrote:
Jella and I met yesterday, and he provided me with additional insights about how runtime type checkers use type information. From that conversation, I concluded that my proposed approach (the one that involves type parameter "proxies") probably wouldn't meet the needs of runtime type checkers. We did some additional brainstorming, and I spent some time today exploring alternatives. I think I have a design that meets all of the requirements.
Like the previous proposal, this new design doesn't require the use of "lambda lifting", which I still consider problematic for a number of reasons.
The new design involves two new fields in the "frame" object at runtime. These fields track which type parameters are "live" at runtime. Both of these fields can be NULL or refer to a tuple of TypeVar-like objects (TypeVar, TypeVarTuple, ParamSpec). The first field is called "f_outer_typeparams" and contains all of the type parameters defined by outer scopes. The second field is called "f_local_typeparams". It contains all of the outer type parameters plus any type parameters that are temporarily needed for a generic "class", "def" or "type" statement in the current scope. This proposal introduces two new opcodes: EXTEND_TYPEPARAMS and LOAD_TYPEPARAM. The EXTEND_TYPEPARAMS op builds a new "f_local_typeparams" tuple from the "f_outer_typeparams" tuple and n new TypeVar-like objects that have been pushed onto the stack. The LOAD_TYPEPARAM op loads a single type parameter from "f_local_typeparams" referenced by numeric index. The compiler tracks the indices of all "live" type parameters , so it's able to emit the appropriate index as part of the opcode. When a new scope is entered (e.g. during a function or lambda call or the execution of a comprehension), the "f_local_typeparams" tuple is copied into the "f_outer_typeparams" field of the next scope. Through this mechanism, all of the "live" type parameter objects are always available even in inner scopes.
Here's an example: ```python # At this point, f_outer_typeparams and f_local_typeparams are NULL
# The compiler emits code to construct two TypeVar objects for `A` and `B`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B). When evaluating the expression `dict[A, B]`, the reference to # `A` generates a LOAD_TYPEPARAM(0), and the reference to `B` generates a # LOAD_TYPEPARAM(1) op.
class Outer[A, B](dict[A, B]): # At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct two TypeVar objects for `C` and `D`, then # emits a EXTEND_TYPEPARAMS(2) op. This builds a new f_local_typeparams tuple # that contains (A, B, C, D). When evaluating the expression `dict[B, D]`, the reference to # `B` generates a LOAD_TYPEPARAM(1), and the reference to `D` generates a # LOAD_TYPEPARAM(3) op.
class Inner[C, D](dict[B, D]): ...
# At this point, f_outer_typeparams and f_local_typeparams contain (A, B).
# The compiler emits code to construct one TypeVar object for `X`, then # emits a EXTEND_TYPEPARAMS(1) op. This builds a new f_local_typeparams tuple # that contains (A, B, X). When evaluating the type annotations for parameters `a`, # `b` and `x`, the appropriate LOAD_TYPEPARAM ops are generated.
def method[X](self, a: A, b: B, x: X): ... ```
print(Outer.__parameters__) (~A, ~B) print(Outer.Inner.__parameters__) (~C, ~D) print(Outer.method.__annotations__) {'a': ~A, 'b': ~B, 'x': ~X}
The f_outer_typeparams value is also stored in a function object in a new internal field called func_typeparams. This allows allows f_outer_typeparams to be restored to the current scope when resuming a coroutine, executing an async function, calling a lambda, etc.
Other than the addition of two new fields to the frame object, one new field in the function object, and two new opcodes, this proposal adds no significant complexity to the compiler or the runtime. Performance impact should be negligible. It works well with the PEP 649 proposal for deferred evaluation of annotations, and it works well with today's (non-deferred) annotation evaluation.
One small complexity is with forward-referenced annotations and the use of `get_type_hints()`, which will need to populate the `locals` dictionary with the "live" type parameters before calling `eval()` on the annotation string. The live type parameters are available on function, class, and type alias objects via a new `__typeparams__` attribute.
print(Outer.__typeparams__) (~A, ~B) print(Outer.Inner.__typeparams__) (~A, ~B, ~C, ~D) print(Outer.method.__typeparams__) (~A, ~B, ~X)
Let me know if you see any holes or have concerns with this proposal. If you're interested in looking at the CPython implementation, check out this branch: https://github.com/erictraut/cpython/commits/type_param_syntax2.
Assuming that this design is amenable to the typing community, my next step is to post an update to the PEP that describes the design. At that point, I think we can notify the python-dev community that the PEP is ready for their review.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: dfmoisset@gmail.com
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Thanks for the tips, Daniel. Those were really helpful. As Guido said, I don't think we can construct type variables at compile time because we need to evaluate bound expressions at runtime. I incorporated your other suggestions, and this allowed me to eliminate the two pointers in the frame object! I still require a new pointer in the function object, but it's optional and will be NULL most of the time. My latest implementation can be found here: https://github.com/erictraut/cpython/tree/type_param_syntax2 This implementation makes use of a local variable named "__type_variables__". This name will show up in the locals if you call "get_locals()", which is an unfortunate leakage of an implementation detail. I looked for ways to make this variable completely anonymous, but I didn't see a way of doing this without significant changes to the compiler. Of course, we could allocate a dedicated pointer within the frame like I was doing previously, but I was trying to avoid this. -Eric
Yes, my suggestion was to "stringify" bounds (PEP-563 style) to make them serializable. I understand that this limits introspection slightly, but do we think that the live class information will be actually used for bounds of typevars? Currently it is legal to have a bound set as a string anyway, and bounds have a lot of restrictions (they can not use typevars themselves, and I don't think they can be special forms) so most of the time evaluating them with a global namespace (for a function f, it's f.__globals__) will give you the right answer if you really want a class object. So yes, my suggestion introduces a limitation, but is that limitation relevant in any real scenario? D. On Mon, 11 Jul 2022 at 00:01, Eric Traut <eric@traut.com> wrote:
Thanks for the tips, Daniel. Those were really helpful.
As Guido said, I don't think we can construct type variables at compile time because we need to evaluate bound expressions at runtime.
I incorporated your other suggestions, and this allowed me to eliminate the two pointers in the frame object! I still require a new pointer in the function object, but it's optional and will be NULL most of the time.
My latest implementation can be found here: https://github.com/erictraut/cpython/tree/type_param_syntax2
This implementation makes use of a local variable named "__type_variables__". This name will show up in the locals if you call "get_locals()", which is an unfortunate leakage of an implementation detail. I looked for ways to make this variable completely anonymous, but I didn't see a way of doing this without significant changes to the compiler. Of course, we could allocate a dedicated pointer within the frame like I was doing previously, but I was trying to avoid this.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: dfmoisset@gmail.com
Bounds can contain unions, at least. If marshalled bounds always became strings, that would be problematic in the possible future where PEP 563 is eventually deprecated and PEP 649 takes its place. Now, unions (at least) are picklable, so we could probably seriaze them using marshal as well (by extending that protocol slightly), and make things work that way. But I think we may be prematurely optimizing here, and I'd rather focus on Eric's current version, where I'd like to get rid of the `__type_variables__` local variable, ideally. At the same time I think we can put that off until the PEP has been accepted. An extra local with a dunder name is not a bad price to pay for this feature. On Mon, Jul 11, 2022 at 9:56 AM Daniel Moisset <dfmoisset@gmail.com> wrote:
Yes, my suggestion was to "stringify" bounds (PEP-563 style) to make them serializable. I understand that this limits introspection slightly, but do we think that the live class information will be actually used for bounds of typevars? Currently it is legal to have a bound set as a string anyway, and bounds have a lot of restrictions (they can not use typevars themselves, and I don't think they can be special forms) so most of the time evaluating them with a global namespace (for a function f, it's f.__globals__) will give you the right answer if you really want a class object.
So yes, my suggestion introduces a limitation, but is that limitation relevant in any real scenario?
D.
On Mon, 11 Jul 2022 at 00:01, Eric Traut <eric@traut.com> wrote:
Thanks for the tips, Daniel. Those were really helpful.
As Guido said, I don't think we can construct type variables at compile time because we need to evaluate bound expressions at runtime.
I incorporated your other suggestions, and this allowed me to eliminate the two pointers in the frame object! I still require a new pointer in the function object, but it's optional and will be NULL most of the time.
My latest implementation can be found here: https://github.com/erictraut/cpython/tree/type_param_syntax2
This implementation makes use of a local variable named "__type_variables__". This name will show up in the locals if you call "get_locals()", which is an unfortunate leakage of an implementation detail. I looked for ways to make this variable completely anonymous, but I didn't see a way of doing this without significant changes to the compiler. Of course, we could allocate a dedicated pointer within the frame like I was doing previously, but I was trying to avoid this.
-Eric _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: dfmoisset@gmail.com
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
I really like this PEP but I'm curious as to if there will be any way to subscript a function at runtime to bind the type parameters to something and there would be any way to extract the subscripted functions at runtime? I'm not entirely sure what the API for extracting them would look like but I think since runtime type checking is becoming a big thing it might be worthwhile considering.
The first draft of the PEP included support for explicit specialization of generic functions. (It also included support for default type parameter arguments, which is something you have proposed in the past as well.) The feedback I received was that this additional functionality should be moved into other PEPs. This PEP is already relatively complex both in terms of the changes to CPython and the work it will take for type checkers to support. Adding more "bells and whistles" makes it less likely that it will be accepted. If you feel strongly about explicit specialization of generic functions, you could draft another PEP to cover that. I can even provide you with the earlier draft of this PEP if you want to copy and paste the relevant sections. The only complexity is how to handle overloads in this case. -Eric
Right, that makes sense, I might give specialisation a go then if this gets accepted. Thanks for your continued work on this.
Even without this PEP, specializing functions makes sense. It’s a straightforward extension of PEP 585, which added list[int] etc. (But type checkers would have to be modified to support it too, so it needs a — simple — PEP.) —Guido On Thu, Jul 14, 2022 at 09:16 James H-B <gobot1234yt@gmail.com> wrote:
Right, that makes sense, I might give specialisation a go then if this gets accepted. Thanks for your continued work on this. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile)
Regarding: https://github.com/python/peps/issues/2724 I believe there are many advantages to explicit variance annotations, specifically using the keywords `in` and `out`. - Improved readability/understand-ability. - Reduced chance of accidentally breaking type compatibility. - Greater discover-ability regarding generic variance. The only language that I know of that infers generic variance is TypeScript, and they state that the `in` / `out` keywords can improve readability(1). All languages that I know of that support declaration site variance have and require specifying variance annotations (almost all use `in` and `out`), I think it would be strange and unusual for Python to differ in this consistent pattern. 1: https://devblogs.microsoft.com/typescript/announcing-typescript-4-7-beta/#:~....
I strongly disagree. TypeScript only recently added support for `in` and `out`, and the TypeScript team did so very reluctantly. They added it to accommodate some rare cases where variance inference took too long. When `in` and `out` are used in TypeScript, the compiler doesn't validate the variance because it assumes that the developer added these keywords only out of necessity for performance reasons. Python's type system doesn't allow for such complex types, so I am not concerned about performance of variance inference. Forcing users to understand variance is unnecessary when it's something that can be determined by the type checker. As a maintainer of a Python type checker, I regularly field questions from developers who are confused by the notion of variance. We should not force all developers to understand this concept when it's something that can largely be ignored. Forcing users to explicitly declare the variance does not improve readability or understandability, IMO. It just adds unnecessary complexity and cognitive burden. -Eric
They added it to accommodate some rare cases where variance inference took too long.
while that may be true, they admit in that blog post that "it can be a useful for a reader to explicitly see how a type parameter is used at a glance. For much more complex types, it can be difficult to tell whether a type is meant to be read, written, or both."
When `in` and `out` are used in TypeScript, the compiler doesn't validate the variance
yes it does: ```ts interface Foo<in T> { //error: ype 'Foo<super-T>' is not assignable to type 'Foo<sub-T>' as implied by variance annotation a: T } ``` in my experience, languages that hide variance behind implicit inference just made it so much more confusing for me to learn. every other developer i've spoken to about this feels the same way. it also seems like a downgrade to not allow explicit variance annotations when the old `TypeVar` syntax allows it.
I just read PEP 695 and most of the things there seem like really nice improvements. There is one thing though that just feel wrong to me, and that is using the ** syntax for ParamSpec as in the example: ```python from typing import Callable def func[**P, R](cb: Callable[P, R], *args: P.args, **kwargs: P.kwargs) -> R: ... ``` The main reason is that in all other places in python ** always is used in combination kwargs or dicts. And ParamSpec isn't just kwargs, its args and kwargs. Meaning that ** has very different semantics in this pep then in other places in python and will be confusing. Another reason that I do not like ** for ParamSpec is that it might bite us in the future if there is a desire to continue building on PEP 692. That PEP introduces the possibilities to use a TypedDict to annotate kwargs as follows: ```python class Movie(TypedDict): name: str year: int def foo(**kwargs: **Movie) -> None: ... ``` So for me a natural extension of PEP 692 to would be to also make it possible to make foo generic in the above example using something as: ```python K = TypeVarDict("K") def foo(**kwargs: **K) -> None: ... ``` And once one has this it would only be natural to shorten this in the spirit of PEP 695 to something like ```python def foo[**K](**kwargs: **K) -> None: ... ``` Which is a syntax that is conflicting with the ParamSpec way of doing it. ```python def foo[**P](**kwargs: P.kwargs) -> None: ... ``` The most natural way for me to use Callable together with PEP 692 and PEP 695 would be ```python from typing import Callable def func[*A, **K, R](cb: Callable[A, K, R], *args: *A, **kwargs: **K) -> R: ... ``` which has an added benefit of providing a cleaner solution to the problem in PEP 612 then using ParamSpec.
The `**` syntax for a `ParamSpec` comes from PEP 677, which was widely supported in the typing community but ultimately rejected by the steering council. I don't think your proposed `TypeVarDict` would work as a replacement for `ParamSpec`. A `ParamSpec` captures much more than positions versus keyword parameters. It captures which parameters can be both (positions or keyword), which parameters contain defaults, and internal flags about the captured call such as whether it's an unbound instance or class method. I'm skeptical that we would ever pursue a `TypeVarDict` concept in the type system, so I'm not so worried about using "**" to mean `ParamSpec` in this context. -Eric
Dear Typing Community! I just wanted to share with you an idea about syntax regarding type aliases, alternative to the one currently in PEP 695. PEP says
We propose to introduce a new statement for declaring type aliases. Similar to class and def statements, a type statement defines a scope for type parameters.
But the proposed syntax looks rather like C code (`int x = 5`) and feels a bit strange in python: ```python # A non-generic type alias type IntOrStr = int | str # A generic type alias type ListOrSet[T] = list[T] | set[T] ``` So my humble proposition is to introduce a truly pythonic syntax: ```python # A non-generic type alias type IntOrStr: int str # A generic type alias type ListOrSet[T]: list[T] set[T] ``` While it takes more lines, it has the following advantages: 1. More pythonic No new C-flavoured syntax like `key_word name = value`, but a well known style like with class, def, with, match, case etc. 2. No need to use `\` Type names happen to be very long in real life projects. Pythonic syntax allows to avoid line continuation characters. 3. No need for `|` While it is elegant and cute, the new line can take its role gracefully. 4.More readable diffs when someone adds or removes types. For example when we allow the `ListOrSet` to be None ```python type ListOrSet[T]: list[T] set[T] None ``` The diff will show accurately what was added, without a need to check which parts of the long line were modified. Readability counts. ;) 5. Clear scope of T. It naturally plays with this requirement:
Type parameters declared as part of a generic type alias are valid only when evaluating the right-hand side of the type alias.
6. In the spirit of making types first class citizens open for future extensions. While this last argument may be a bit vague, it feels that my pythonic type alias syntax will leave more room for next possible syntax enhancements if in a few years from now we discover such a need. In case you have already discussed such a pythonic syntax but for some reason rejected it, it may be worth adding it to the Rejected Ideas section. It's my first time on the python mailing list, so if you find my proposition somehow inappropriate or flawed, please accept my apologies for wasting your time. Have a great Halloween weekend! - Maciej
El sáb, 29 oct 2022 a las 16:14, Maciej M (<maciej.mikulski.jr@gmail.com>) escribió:
Dear Typing Community!
I just wanted to share with you an idea about syntax regarding type aliases, alternative to the one currently in PEP 695.
PEP says
We propose to introduce a new statement for declaring type aliases. Similar to class and def statements, a type statement defines a scope for type parameters.
But the proposed syntax looks rather like C code (`int x = 5`) and feels a bit strange in python:
It reminds me more of Haskell than C. C type aliases look like `typedef int str;`.
```python # A non-generic type alias type IntOrStr = int | str
# A generic type alias type ListOrSet[T] = list[T] | set[T] ```
So my humble proposition is to introduce a truly pythonic syntax:
```python # A non-generic type alias type IntOrStr: int str
Doesn't seem that Pythonic to me. In every other block (except match), the lines inside the block are statements, not types.
# A generic type alias type ListOrSet[T]: list[T] set[T] ```
While it takes more lines, it has the following advantages:
1. More pythonic No new C-flavoured syntax like `key_word name = value`, but a well known style like with class, def, with, match, case etc.
2. No need to use `\` Type names happen to be very long in real life projects. Pythonic syntax allows to avoid line continuation characters.
3. No need for `|` While it is elegant and cute, the new line can take its role gracefully.
Not all type aliases are unions. It's odd to me to privilege unions in this way.
4.More readable diffs when someone adds or removes types. For example when we allow the `ListOrSet` to be None
```python type ListOrSet[T]: list[T] set[T] None ```
The diff will show accurately what was added, without a need to check which parts of the long line were modified. Readability counts. ;)
5. Clear scope of T. It naturally plays with this requirement:
Type parameters declared as part of a generic type alias are valid only when evaluating the right-hand side of the type alias.
6. In the spirit of making types first class citizens open for future extensions. While this last argument may be a bit vague, it feels that my pythonic type alias syntax will leave more room for next possible syntax enhancements if in a few years from now we discover such a need.
In case you have already discussed such a pythonic syntax but for some reason rejected it, it may be worth adding it to the Rejected Ideas section.
It's my first time on the python mailing list, so if you find my proposition somehow inappropriate or flawed, please accept my apologies for wasting your time.
Have a great Halloween weekend!
- Maciej _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
It reminds me more of Haskell than C. C type aliases look like `typedef int str;`.
I meant general syntax `<keyword> <identifier> = <value>`, which is absent in python.
Not all type aliases are unions. It's odd to me to privilege unions in this way.
Of course! But `Union` is quite often (at least in my humble experience) the most external one, especially with `Optional` becoming `| None` with PEP 604. Anyway thanks for the feedback.
participants (16)
-
Anton Agestam
-
Daniel Moisset
-
detachhead@gmail.com
-
Eric Traut
-
Ethan Smith
-
Guido van Rossum
-
Guido van Rossum
-
James H-B
-
Jelle Zijlstra
-
Jukka Lehtosalo
-
Maarten Derickx
-
Maciej M
-
Mehdi2277
-
pippy022@gmail.com
-
S Pradeep Kumar
-
Sebastian Rittau