Proposal: Type evaluation functions
I recently implemented a new feature in pyanalyze that unlocks many things that aren't possible with the current type system and provides a simpler alternative for most use cases of overloads. The idea is that users can specify a special function, marked with the `@evaluated` decorator, written in a (very) restricted subset of Python, to indicate how a type checker should evaluate a function call. For example, this is an implementation of a simplified version of the `open()` builtin: @evaluated def open(mode: str): if is_of_type(mode, Literal["r", "w"]): return TextIO elif is_of_type(mode, Literal["rb", "wb"]): return BinaryIO else: return IO[Any] Compared to overloads, this is shorter (the real `open()` takes lots of other parameters, which have to be repeated with every overload), is easier to read (you don't have to stare at a series of overloads to find the difference), and gives the stub author more precise control over how the code is interpreted (overloads are underspecified). I wrote a detailed spec at https://github.com/quora/pyanalyze/blob/master/docs/type_evaluation.md, but here I'll just briefly cover a few more use cases. 1. Customized errors The specification includes a `show_error()` intrinsic that allows users to emit a specific error message. For example, we may want to warn about the deprecated `U` mode: @evaluated def open(mode: str): if is_of_type(mode, Literal["rU"]): show_error("The U open mode is deprecated", argument=mode) ... 2. Overlapping types Some type checkers implement custom checks for cases where two types do not overlap but are compared. Type evaluation functions provide a way to do this in the stub itself: @evaluated def safe_contains(elt: T1, container: Container[T2]) -> bool: if not is_of_type(elt, T2) and not is_of_type(container, Container[T1]): show_error("Element cannot be a member of container") lst: List[int] safe_contains("x", lst) # error safe_contains(True, lst) # ok (bool is a subclass of int) safe_contains(object(), lst) # ok (List[int] is a subclass of Container[object]) 3. Detecting invalid combinations of arguments `open()` takes an `encoding` argument that must be None if the file is opened in binary mode. But the current typeshed overloads ( https://github.com/python/typeshed/blob/f42de016b8443bea012d06d02c88d41e0bd3...) do not catch this case, because `open(..., "rb", encoding="x")` matches the fallback overload with `mode: str`. A full implementation of `open()` is in the pyanalyze test suite: https://github.com/quora/pyanalyze/blob/234625d88aa4c0b2f9ffc76cc9a0ec7b3f0d... . I'd love to hear any feedback: Is this worth pursuing as a PEP? Are there more use cases that could be covered? Should the spec be changed?
I’ve also played with the idea of providing an imperative way to specify type information, but I abandoned the idea after concluding that the complexity wasn’t justified. I’m not aware of any other languages that use an imperative approach for static type analysis. There are very good reasons to stick with declarative approaches for static type information. You said that your proposal unlocks a bunch of things that aren’t possible today. Everything I see in your examples can already be done through overloads, constrained TypeVars, and the addition of a standardized `@deprecated` decorator (which has been discussed separately). I admit that it might make complex overloads easier to read and maintain, but complex overloads are uncommon, and they’re mostly restricted to typeshed stubs whose authors/contributors are some of the most experienced static typing experts in the Python community (yourself included :) ). This proposal would introduce a significant amount of complexity and a completely different, parallel way of specifying type information. It looks like you’ve given it significant thought and done a good job adding constraints (e.g. limiting the supported operators and control flow constructs), but it’s still quite complex. Plus, there are many additional details that would need to be worked out that are not anticipated by your current draft proposal. I think this level of complexity would be justified only if it could solve some significant problems that cannot be solved today using existing patterns and declarative approaches. I don’t currently see it. For those reasons, I’m pretty negative on the idea unless it can be shown to solve more pressing problems that cannot be solved today. -Eric Eric Traut Contributor to Pyright & Pylance Microsoft
Thanks for your feedback!
El jue, 13 ene 2022 a las 14:23, Eric Traut (
I’ve also played with the idea of providing an imperative way to specify type information, but I abandoned the idea after concluding that the complexity wasn’t justified. I’m not aware of any other languages that use an imperative approach for static type analysis. There are very good reasons to stick with declarative approaches for static type information.
You said that your proposal unlocks a bunch of things that aren’t possible today. Everything I see in your examples can already be done through overloads, constrained TypeVars, and the addition of a standardized `@deprecated` decorator (which has been discussed separately). I admit that it might make complex overloads easier to read and maintain, but complex overloads are uncommon, and they’re mostly restricted to typeshed stubs whose authors/contributors are some of the most experienced static typing experts in the Python community (yourself included :) ).
The three numbered examples in my original message cannot be expressed in the current type system as far as I am aware, even with the addition of an @deprecated operator. @evaluated can also be used to cover some use cases of the proposed Not and Intersection types (though admittedly not in a very clean fashion): @evaluated def non_str_sequence(s: Sequence[str]): if isinstance(s, str): show_error("s argument must be a non-string sequence", argument=s) non_str_sequence(["x"]) # ok non_str_sequence("x") # error T1 = TypeVar("T1") T2 = TypeVar("T2") @evaluated def require_intersection(obj: object, left: Type[T1], right: Type[T2]): if not is_of_type(obj, left): show_error("Argument must satisfy left type") if not is_of_type(obj, right): show_error("Argument must satisfy right type") class A: pass class B: pass class C(A, B): pass require_intersection(A(), A, B) # error require_intersection(B(), A, B) # error require_intersection(C(), A, B) # ok
This proposal would introduce a significant amount of complexity and a completely different, parallel way of specifying type information. It looks like you’ve given it significant thought and done a good job adding constraints (e.g. limiting the supported operators and control flow constructs), but it’s still quite complex. Plus, there are many additional details that would need to be worked out that are not anticipated by your current draft proposal.
What are these additional details? I'm sure there are things I missed, but I tried to specify thoroughly how the proposed feature works. If there are specific things I missed, let me know and I will try to address them.
I think this level of complexity would be justified only if it could solve some significant problems that cannot be solved today using existing patterns and declarative approaches. I don’t currently see it. For those reasons, I’m pretty negative on the idea unless it can be shown to solve more pressing problems that cannot be solved today.
-Eric
Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
For each problem you're attempting to solve with this approach, I'd want to first consider an alternative declarative approach — one that conforms to existing patterns and norms within the Python static type checking language. Only once I had convinced myself that there's no practical way to make a declarative approach work would I start to seriously consider an imperative approach, which differs from everything else we've seen in static type information in a pretty radical way. For what it's worth, I was exploring an imperative approach as a way to eliminate the need for mypy plugins (or other type-checker-specific plugins) for libraries like django, sqlalchemy, etc. I ended up convincing myself that we could use existing declarative mechanisms (like overloads) to define various "transforms". The proposed `dataclass_transform` is one idea that came out of that thought experiment.
On Thu, Jan 13, 2022 at 2:50 PM Eric Traut
For each problem you're attempting to solve with this approach, I'd want to first consider an alternative declarative approach — one that conforms to existing patterns and norms within the Python static type checking language. Only once I had convinced myself that there's no practical way to make a declarative approach work would I start to seriously consider an imperative approach, which differs from everything else we've seen in static type information in a pretty radical way.
For what it's worth, I was exploring an imperative approach as a way to eliminate the need for mypy plugins (or other type-checker-specific plugins) for libraries like django, sqlalchemy, etc. I ended up convincing myself that we could use existing declarative mechanisms (like overloads) to define various "transforms". The proposed `dataclass_transform` is one idea that came out of that thought experiment.
+1 on trying to stick to declarative solutions here. I notice that Jelle's examples are pretty close to declarative though. Assuming that is_of_type() is a primitive operation (similar to isinstance() but allowing arbitrary types for the 2nd arg), e.g. example (1) could become something like def open( mode: str ) -> ( TextIO if is_of_mode(mode, Literal["r", "w"]) else BinaryIO if is_of_type(mode, Literal["rb", "wb"]) else IO[Any] ): ... and that seems close enough to declarative to me (we can just declare it to be :-). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
El jue, 13 ene 2022 a las 14:50, Eric Traut (
For each problem you're attempting to solve with this approach, I'd want to first consider an alternative declarative approach — one that conforms to existing patterns and norms within the Python static type checking language. Only once I had convinced myself that there's no practical way to make a declarative approach work would I start to seriously consider an imperative approach, which differs from everything else we've seen in static type information in a pretty radical way.
I won't quibble about what exactly we can call "imperative" or "declarative", but I'll note that the proposal only allows a highly restricted subset of Python: just if statements, and/or/not booleans, return statements, and show_error() calls for producing errors. We could express the same things with more traditional syntax: something like `-> If[IsOfType["mode", Literal["rb"]], BinaryIO, TextIO]`, or Guido's suggestion with `if else`. But that would be much harder to write and read. We could create more focused solutions for many of the individual use cases I listed, but to me the power of `@evaluated` isn't so much in the individual examples, but in the flexibility it unlocks. If we were to add Deprecated[], we might support deprecating a single parameter completely, but `@evaluated` makes it possible to deprecate calling a parameter as a positional, to deprecate a single type in a union, and to customize the error message. I'm sure people will come up with more creative use cases that I haven't thought of yet.
For what it's worth, I was exploring an imperative approach as a way to eliminate the need for mypy plugins (or other type-checker-specific plugins) for libraries like django, sqlalchemy, etc. I ended up convincing myself that we could use existing declarative mechanisms (like overloads) to define various "transforms". The proposed `dataclass_transform` is one idea that came out of that thought experiment. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
I don't dispute that an imperative approach allows for more flexibility, but flexibility is a double-edged sword. It opens a Pandora's Box, and I don't think you're appreciating the complexity it adds — even with the restrictions you've placed on it. You're creating a parallel universe to the existing Python static typing that largely overlaps the existing functionality but introduces new capabilities (and a large set of new problems to consider). If we were to adopt this proposal, we'd need to determine how this composes with every other type-related construct. The type system in Python is already so full of exceptions and special cases, composing it with such a radically different approach is going to be fraught with problems. Jia already raised the concern about bidirectional type inference. That's just the tip of the iceberg. What about forward declarations (quoted annotations)? What is the impact on the TypeVar constraint solver? What does this mean for libraries that perform runtime type validation? Providing custom error messages is somewhat appealing, but it also raises challenges. For example, it eliminates the ability for type checkers to localize error messages. I think the idea is clever, but I'm still very negative on it overall. There seems to be a bunch of desire to provide a standardized way to mark functions, methods, and parameters as deprecated. Let's put our energy behind solving that problem. I think we can address that without deviating significantly from the existing type mechanisms.
This proposal reminds me of the presentation given by Elias Ellison from the PyTorch dev in one of the tensor typing meetups (https://mail.python.org/archives/list/typing-sig@python.org/message/MF3TSMGI...). The basic idea there was also to let users imperatively specify input/output types for tensor-manipulating operations by writing a function in (a subset of) Python directly. Granted, the input/output types in that system are tensor shapes, not arbitrary Python types. But I think the underlying principles were similar. I can definitely see the charm of this kind of approach in the particular area of tensor typing: - Certain established tensor-related APIs (e.g. broadcasting) have complicated type-level semantics that are really hard to model using pre-existing typing facilities. The Pyre team has done a lot of work looking into how we could support those, and we ended up having to resort to implementing tensor-specific type operators (e.g. `Product`, `Divide`, `Broadcast`, `Length`, `Compose`, etc.) within Pyre itself. Allowing folks to provide custom type-level semantics, on the other hands, seems to be more desirable from extensibility perspective compared to hard-coding the meaning of all tensor APIs within the type checker. - The imperative nature of the specification seems to be more approachable to data engineers, who are more familiar with untyped Python and less familiar with how typing works. I suspect it's going to be faster for data engineers to write a small Python function to describe how the tensor shapes change in their functions, than to express the same idea by nesting type variables within various tensor-specific type operators. That being said, for general-purpose usage outside of numerical applications, I am a bit more hesitant. Yes, this feature brings a lot of flexibility to the type system -- you can express pretty much any other typing features and even more with it. But at the same time, it also introduces a lot of complexity: you are essentially requiring every type checker to implement a translator from your DSL to its internal constraint solver, and augment the constraint solvers to handle arbitrary form of boolean constraints. Besides, it would make type inference a lot trickier. One reason why Pyre adopts a constraint-solving approach for type checking is to be able to do bi-directional inference: for a given function, not only should the type checker deduce the return type based on the argument types, it is also important to be able to deduce arguments types from return types. For example, ``` def foo(x: List[T]) -> T: ... x = [] y: int = foo(x) reveal_type(x) # Should reveal List[int] ``` Supporting the @evalutated block would greatly complicate this task -- imagine in the above example, the return type of `foo` is defined as a complicated function that depends on many conditionals of positional/keyword input types and Python versions. Implementing similar kind of backward reasoning is not entire impossible, but would require highly nontrivial amount of engineering investment. Granted, similar problem exists today with `@overload`. But having @evaluated introduced would make the problem a lot worse. So my main concern with the proposal is not the lack of expressibility. Quite the opposite, my concern is that it brings too much expressibility to the extent that properly implementing it would too quickly introduce an unjustifiable amount of complexity and potentially a lot of runtime cost if people start to abuse the system. If we want the existing type system to be extended, I feel more comfortable if we could do it in a more gradual way by, e.g. introducing things like intersection types or negation types individually, leaving enough room for design debates and implementation catchups. - Jia
El jue, 13 ene 2022 a las 16:00, Jia Chen via Typing-sig (< typing-sig@python.org>) escribió:
This proposal reminds me of the presentation given by Elias Ellison from the PyTorch dev in one of the tensor typing meetups ( https://mail.python.org/archives/list/typing-sig@python.org/message/MF3TSMGI...). The basic idea there was also to let users imperatively specify input/output types for tensor-manipulating operations by writing a function in (a subset of) Python directly. Granted, the input/output types in that system are tensor shapes, not arbitrary Python types. But I think the underlying principles were similar.
I can definitely see the charm of this kind of approach in the particular area of tensor typing: - Certain established tensor-related APIs (e.g. broadcasting) have complicated type-level semantics that are really hard to model using pre-existing typing facilities. The Pyre team has done a lot of work looking into how we could support those, and we ended up having to resort to implementing tensor-specific type operators (e.g. `Product`, `Divide`, `Broadcast`, `Length`, `Compose`, etc.) within Pyre itself. Allowing folks to provide custom type-level semantics, on the other hands, seems to be more desirable from extensibility perspective compared to hard-coding the meaning of all tensor APIs within the type checker. - The imperative nature of the specification seems to be more approachable to data engineers, who are more familiar with untyped Python and less familiar with how typing works. I suspect it's going to be faster for data engineers to write a small Python function to describe how the tensor shapes change in their functions, than to express the same idea by nesting type variables within various tensor-specific type operators.
That being said, for general-purpose usage outside of numerical applications, I am a bit more hesitant. Yes, this feature brings a lot of flexibility to the type system -- you can express pretty much any other typing features and even more with it. But at the same time, it also introduces a lot of complexity: you are essentially requiring every type checker to implement a translator from your DSL to its internal constraint solver, and augment the constraint solvers to handle arbitrary form of boolean constraints.
Besides, it would make type inference a lot trickier. One reason why Pyre adopts a constraint-solving approach for type checking is to be able to do bi-directional inference: for a given function, not only should the type checker deduce the return type based on the argument types, it is also important to be able to deduce arguments types from return types. For example,
``` def foo(x: List[T]) -> T: ...
x = [] y: int = foo(x) reveal_type(x) # Should reveal List[int] ```
Supporting the @evalutated block would greatly complicate this task -- imagine in the above example, the return type of `foo` is defined as a complicated function that depends on many conditionals of positional/keyword input types and Python versions. Implementing similar kind of backward reasoning is not entire impossible, but would require highly nontrivial amount of engineering investment. Granted, similar problem exists today with `@overload`. But having @evaluated introduced would make the problem a lot worse.
So my main concern with the proposal is not the lack of expressibility. Quite the opposite, my concern is that it brings too much expressibility to the extent that properly implementing it would too quickly introduce an unjustifiable amount of complexity and potentially a lot of runtime cost if people start to abuse the system. If we want the existing type system to be extended, I feel more comfortable if we could do it in a more gradual way by, e.g. introducing things like intersection types or negation types individually, leaving enough room for design debates and implementation catchups.
Thanks, that's a useful piece of feedback. Do you think this could be alleviated by adding some restrictions on evaluated functions? For example, we could perhaps prohibit TypeVars in the return type of evaluated functions.
- Jia _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
participants (4)
-
Eric Traut
-
Guido van Rossum
-
Jelle Zijlstra
-
Jia Chen