Semantics for type checking (was: What should a good type checker do?)
On Friday, while replying to a post on python-dev about PEP 526
(variable annotations), I ended up mentioning things that I think a
good type checker should do, which seem to differ from the general
understanding. The discussion should continue here, though, because
the aim of PEP 526 is not to define type-checking semantics.
I'll recap the thoughts in this post, but here are links to the
original posts, for reference:
[1] https://mail.python.org/pipermail/python-dev/2016-September/146069.html
[2] https://mail.python.org/pipermail/python-dev/2016-September/146090.html
Quoting [1], with added explanations in between:
On Fri, Sep 2, 2016 at 1:10 PM, Koos Zevenhoven
The more you infer types, the less you check them. It's up to the programmers to choose the amount of annotation.
[...]
I'm also a little worried about not being able to reannotate a name.
IOW, this should be allowed: x : int = 1 foo(x) x : float = 0.5 bar(x) It turns out that disallowing this was not the PEP's intention, because type-checking semantics is up to the type checker. This has now been clarified in the PEP. Good. [...]
def eggs(cond:bool): x: Optional[List[int]] if cond: x = None # Now legal else: x: = [] spam(x)
The above is an example from Mark Shannon's email, and below is my response:
A good checker should be able to infer that x is a union type at the point that it's passed to spam, even without the type annotation. For example:
def eggs(cond:bool): if cond: x = 1 else: x = 1.5 spam(x) # a good type checker infers that x is of type Union[int, float]
Here, there are two inferred types for x, but in different branches of the if statement. So x, which is dynamically typed at runtime, becomes a Union at "static-check time".
Or with annotations:
def eggs(cond:bool): if cond: x : int = foo() # foo may not have a return type hint else: x : float = bar() # bar may not have a return type hint spam(x) # a good type checker infers that x is of type Union[int, float]
Two explicit annotations, and again neither of them "wins". The
inferred type after the if statement is therefore a Union.
Below, I'm quoting [2], my response to an email that disagreed with me
about the above being something "a good type checker" would do:
On Fri, Sep 2, 2016 at 6:49 PM, Koos Zevenhoven
A good checker should be able to infer that x is a union type at the point that it's passed to spam, even without the type annotation. For example:
def eggs(cond:bool): if cond: x = 1 else: x = 1.5 spam(x) # a good type checker infers that x is of type Union[int, float]
Oh I really hope not. I wouldn't call that a *good* type checker. I would call that a type checker that is overly permissive.
I guess it's perfectly fine if we disagree about type checking ideals, and I can imagine the justification for you thinking that way. There can also be different type checkers, and which can have different modes.
But assume (a) that the above function is perfectly working code, and spam(...) accepts Union[int, float]. Why would I want the type checker to complain?
Then again, (b) instead of that being working code, it might be an error and spam only takes float. No problem, the type checker will catch that.
In case of (b), to get the behavior you want (but in my hypothetical semantics), this could be annotated as
def eggs(cond:bool): x : float if cond: x = 1 # type checker says error else: x = 1.5 spam(x)
So here the programmer thinks the type of x should be more constrained than what spam(...) accepts.
The reasoning here is that x is explicitly annotated without an assignment, so the inferred types subsequently assigned to it (int and float) must be compatible with the annotation. Therefore "the good type checker" would say assigning 1 is an error.
Or you might have something like this
def eggs(cond:bool): if cond: x = 1 else: x = 1.5 # type checker has inferred x to be Union[int, float] x : float # type checker finds an error spam(x)
Here, the same error is found, but at a different location.
Here, x is explicitly annotated as float while its value remains a Union[int, float], as inferred by "the good type checker". Therefore this is an error. One could consider this a type assertation (at static-check time). "The good type checker" should also be able to tell the programmer that it's "x = 1" that made the subsequent "type assertion" fail. In addition to being type *hints*, the annotations can also be considered as "check points". -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
Below I respond to Chris Angelico's post in the python-dev thread.
On Sat, Sep 3, 2016 at 2:01 AM, Chris Angelico
def eggs(cond:bool): if cond: x = 1 else: x = 1.5 spam(x) # a good type checker infers that x is of type Union[int, float]
I wonder if it would be different if you wrote that as a single expression:
x = 1 if cond else 1.5
x = sum([1] + [0.5] * cond)
What should type inference decide x is in these cases? Assume an arbitrarily smart type checker that can implement your ideal; it's equally plausible to pretend that the type checker can recognize an if/else block (or even if/elif/else tree of arbitrary length) as a single "assignment" operation. IMO both of these examples - and by extension, the if/else of the original - should be assigning a Union type.
In the first case it would indeed again be Union[int, float] (the code is basically equivalent after all). In the second case, the checker would infer List[Union[int, float]]. No magic involved, assuming that list.__add__ is annotated precisely enough.
Lots of Python code assumes that smallish integers [1] are entirely compatible with floats. Is Python 4 going to have to deal with the int/float distinction the way Python 3 did for bytes/text, or are they fundamentally compatible concepts? Is the "small integer" like the "ASCII byte/character" as a kind of hybrid beast that people treat as simultaneously two types? (Personally, I don't think it's anything like bytes/text. But I'm open to argument.)
Forcing people to write 1.0 just to be compatible with 1.5 will cause a lot of annoyance. I leave it to you to decide whether there's a fundamental difference that needs to be acknowledged, or just subtleties of representational limitations to be ignored until they become a problem.
Agreed. However, I think it completely depends on the context, which types are "compatible" in the way that 1 and 1.5 often are. Numbers are just a very typical example. But it could as well be Animal and Person. It's just a matter of whether the code base is written in a way that it's ok to pass on either type.
ChrisA
[1] And by "smallish" I mean less than 2**53. Big enough for a lot of purposes. Bigger (by definition) than JavaScript's integers, which cap out at 2**32.
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Sun, Sep 4, 2016 at 7:44 AM, Koos Zevenhoven
I wonder if it would be different if you wrote that as a single expression:
x = 1 if cond else 1.5
x = sum([1] + [0.5] * cond)
What should type inference decide x is in these cases? Assume an arbitrarily smart type checker that can implement your ideal; it's equally plausible to pretend that the type checker can recognize an if/else block (or even if/elif/else tree of arbitrary length) as a single "assignment" operation. IMO both of these examples - and by extension, the if/else of the original - should be assigning a Union type.
In the first case it would indeed again be Union[int, float] (the code is basically equivalent after all).
In the second case, the checker would infer List[Union[int, float]]. No magic involved, assuming that list.__add__ is annotated precisely enough.
I think you missed the sum() call there. Most likely, the sum of List[Union[anything]] would be Union[anything], so these would all be the same (which was my original point). ChrisA
On Sun, Sep 4, 2016 at 2:22 AM, Chris Angelico
On Sun, Sep 4, 2016 at 7:44 AM, Koos Zevenhoven
wrote: I wonder if it would be different if you wrote that as a single expression:
x = 1 if cond else 1.5
x = sum([1] + [0.5] * cond)
What should type inference decide x is in these cases? Assume an arbitrarily smart type checker that can implement your ideal; it's equally plausible to pretend that the type checker can recognize an if/else block (or even if/elif/else tree of arbitrary length) as a single "assignment" operation. IMO both of these examples - and by extension, the if/else of the original - should be assigning a Union type.
In the first case it would indeed again be Union[int, float] (the code is basically equivalent after all).
In the second case, the checker would infer List[Union[int, float]]. No magic involved, assuming that list.__add__ is annotated precisely enough.
I think you missed the sum() call there. Most likely, the sum of List[Union[anything]] would be Union[anything], so these would all be the same (which was my original point).
Oh, you're right, I missed the sum. Suitable @overloads in the stubs for sum should lead to that being inferred as Union[int, float]. But if the annotations are not that precise, that might become just List (or List[Any]), and the programmer may want to annotate x or improve the stubs. -- Koos
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Sun, Sep 4, 2016 at 2:33 AM, Koos Zevenhoven
On Sun, Sep 4, 2016 at 2:22 AM, Chris Angelico
wrote: On Sun, Sep 4, 2016 at 7:44 AM, Koos Zevenhoven
wrote: I wonder if it would be different if you wrote that as a single expression:
x = 1 if cond else 1.5
x = sum([1] + [0.5] * cond)
What should type inference decide x is in these cases? Assume an arbitrarily smart type checker that can implement your ideal; it's equally plausible to pretend that the type checker can recognize an if/else block (or even if/elif/else tree of arbitrary length) as a single "assignment" operation. IMO both of these examples - and by extension, the if/else of the original - should be assigning a Union type.
In the first case it would indeed again be Union[int, float] (the code is basically equivalent after all).
In the second case, the checker would infer List[Union[int, float]]. No magic involved, assuming that list.__add__ is annotated precisely enough.
I think you missed the sum() call there. Most likely, the sum of List[Union[anything]] would be Union[anything], so these would all be the same (which was my original point).
Oh, you're right, I missed the sum.
Suitable @overloads in the stubs for sum should lead to that being inferred as Union[int, float]. But if the annotations are not that precise, that might become just List (or List[Any]), and the programmer may want to annotate x or improve the stubs.
Let me try explaining this once more. If list.__add__ is imprecisely annotated, one may end up with just List or list. But the annotation is not that complicated. In the stubs you might have T1 = TypeVar('T1') T2 = TypeVar('T2') def __add__(self : List[T1], other : List[T2]) -> List[Union[T1,T2]]: ... And sum is also implemented in c, so no inference there (unless stubs can provide a reference Python implementation in the future, but I'm not sure I want to propose that). For sum, there could be an overload to handle this case: @overload def sum(iterable : Iterable[Union[int, float]], start : Union[int, float] = 0) -> Union[int, float]: ... In general the following is not strictly true: đef sum(iterable : Iterable[T], start: T) -> T: ... because sum uses __add__ which can return anything. However, the counterexamples are probably very rare. -- Koos
-- Koos
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Sun, Sep 04, 2016 at 12:26:58AM +0300, Koos Zevenhoven wrote:
On Friday, while replying to a post on python-dev about PEP 526 (variable annotations), I ended up mentioning things that I think a good type checker should do, which seem to differ from the general understanding. The discussion should continue here, though, because the aim of PEP 526 is not to define type-checking semantics.
Why should it continue here? The semantics of a type-checker is not part of Python the language. This is "python-ideas", not "third-party- application-ideas". If this thread belongs anywhere, it is probably the code quality mailing list. It is very likely that different linters and type-checkers will take different approaches. Some may infer more, some may infer less, and that's a good thing. Before arguing what a type-checker should or shouldn't do, perhaps we ought to look at what MyPy, PyLint, PyChecker, PyFlakes and Jedi already do. (Apologies if I have missed anyone's favourite linter/checker.) We should not try to specify the behaviour of the type-checker. Let the type-checkers compete in the wider Python eco-system and converge to best practice as found by experience, not because we declare by fiat that Thou Shalt Do It This Way. -- Steve
On Sun, Sep 4, 2016 at 4:20 AM, Steven D'Aprano
On Sun, Sep 04, 2016 at 12:26:58AM +0300, Koos Zevenhoven wrote:
On Friday, while replying to a post on python-dev about PEP 526 (variable annotations), I ended up mentioning things that I think a good type checker should do, which seem to differ from the general understanding. The discussion should continue here, though, because the aim of PEP 526 is not to define type-checking semantics.
Why should it continue here? The semantics of a type-checker is not part of Python the language. This is "python-ideas", not "third-party- application-ideas". If this thread belongs anywhere, it is probably the code quality mailing list.
I don't think you can 100% separate type checking syntax and semantics. We should try to imagine desired future semantics as well as possible before making decisions about syntax.
It is very likely that different linters and type-checkers will take different approaches. Some may infer more, some may infer less, and that's a good thing.
I agree that's likely, but I'm not sure that is ultimately a good thing. Type-checking semantics affect the way you write code, so at least you cannot fully separate writing code from the kind of type checking you use. What should large code bases do? What if you use different libraries that have been annotated for different semantics?
Before arguing what a type-checker should or shouldn't do, perhaps we ought to look at what MyPy, PyLint, PyChecker, PyFlakes and Jedi already do. (Apologies if I have missed anyone's favourite linter/checker.)
Sure, arguments in any direction would be interesting.
We should not try to specify the behaviour of the type-checker. Let the type-checkers compete in the wider Python eco-system and converge to best practice as found by experience, not because we declare by fiat that Thou Shalt Do It This Way.
I'm not looking to fix type-checker behavior now. I do hope that convergence is not going to take forever. -- Koos
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
participants (3)
-
Chris Angelico
-
Koos Zevenhoven
-
Steven D'Aprano