type narrowing beyond single method

Hello, I have a problem with code like below: class Foo: def __init__(self) -> None: self.x: int | None = None def setup(self): self.x = 42 def add1(self): self.setup() assert self.x is not None, "expected setup to make self.x an integer." self.x += 1 # Operator "+=" not supported for types "int | None" and "int" def add2(self): self.setup() assert self.x is not None, "expected setup to make self.x an integer." self.x += 2 # Operator "+=" not supported for types "int | None" and "int" In the actual code there are even more than two methods that need asserts to avoid the type error involving initial value of None. A trivial DRY refactoring is to move this assert into the "setup" method. However, pyright type checker is not impressed by this refactoring, apparently its type narrowing only works for assert directly in the same code block. Does any other type checker support this? Would somebody else find such a feature useful, or is there a workaround I'm missing? If such implied type narrowing was to work, then an assert would be redundant altogether as assignment implies the same type guarantees as an assert. p.s. you might rightly suggest that the design of class `Foo` is at fault. It should just do complete initialization. The problem is that I'm trying to add types to an existing library and effort of changing this design + resulting user migrations are too much pain. I'm also hopeful that such type checking feat would be useful outside of bad design software. Best Regards, -- Ilya Kamen

There have been recent discussions about extending PEP 647 to support "type asserts" whereby a caller can pass an expression to a function and rely on that function to verify the type or raise an exception if the type does not match. See https://github.com/python/typing/discussions/1013#discussioncomment-1966238 for details. A type assert doesn't quite accomplish what you're looking for though because your code pattern doesn't pass `self.x` as an argument to `setup()`. So a type checker would have no idea that `setup()` was guaranteeing to always assign a non-None value to the instance variable `x`. I don't think that a type checker can safely assume type narrowing across functions. Function execution order can't be determined statically, and method overrides are possible (unless the class is `@final`). For example, if `setup()` were to call another method, a static type checker wouldn't know whether that method (which is potentially overridden by a subclass) modifies instance variable `x` in some way (perhaps assigning it back to `None`). As you pointed out, using complete initialization is a better approach here. You can do that a couple of different ways: ```python class Foo: # Define the instance variable in the class but don't initialize it. # This eliminates the need to declare it as Optional. x: int def __init__(self) -> None: pass def setup(self): self.x = 42 ``` ```python class Foo: # Initialize the value to a non-None value in the constructor. # This also eliminates the need to declare the type as Optional. def __init__(self) -> None: self.x: int = 0 def setup(self): self.x = 42 ``` Another approach I use is to wrap the instance variable in a property. A common assert can then be placed in the property getter. It's a little more verbose, but perhaps not when compared to the verbosity of many repeated asserts. ```python class Foo: def __init__(self) -> None: self._x: int | None = None def setup(self): self.x = 42 @property def x(self): assert self._x is not None return self._x @x.setter def x(self, val: int): self._x = val ``` -Eric -- Eric Traut Contributor to Pyright and Pylance Microsoft

Thanks for the design examples to avoid this problem currently, I think I like the property/getter solution the best. Other two have bad behavior if the guarantee of calling setup is broken.
I didn't get why function execution order can't be determined statically. At least for the purpose of analyzing a given method, it should be clear when this method calls other methods. Regarding overrides, this would mean that all overrides must be known before analysis is done, and the weakest narrowing applies. Best Regards, -- Ilya Kamen On Sat, Jan 29, 2022 at 6:23 PM Eric Traut <eric@traut.com> wrote:

I didn't get why function execution order can't be determined statically.
Rice's theorem states that all non-trivial semantic properties of programs are undecidable. Consider this very simple code snippet: ``` if arbitrary_computation(): foo() bar() else: bar() foo() ``` Without know when `arbitrary_computation()` returns true or not, a static analyzer won't be able to determine the ordering between the two function calls `foo()` and `bar()`. But a static analyzer can never be expected to figure out when `arbitrary_computation()` return true -- if you know how to do that in general, you've solved the halting problem. ... and yet this is just the simplest snippet that does not involve loops, recursive calls, overrides, and other more advanced language features.
Regarding overrides, this would mean that all overrides must be known before analysis is done, and the weakest narrowing applies.
It is true that a static analyzer may be able to apply some kind of over-approximation here to make the problem decidable, e.g. "if function a() may call function b(), then the analysis should conservatively assume that b() will be called". But such kind of over-approximation suffers from 2 big problems: - It requires that the analysis must be inter-procedural, i.e. information one obtains from one function must be propagated globally to all of its callers. This, in turn, means that we must implement some sort of transitive closure / global fixpoint computation -- whose asymptotic complexity would be at least cubic w.r.t. the total number of functions. From my own experience, computation this expensive tends to be really, really hard to scale, especially to million-line codebases that many of the type checker maintainers in this mailing list needs to support. - Due to the global nature of the analysis, a small change in one part of the codebase could significantly alter the analysis result of another seemingly-irrelevant part of the codebase via several levels of transitive propagations. This makes it hard to localize the root cause when an issue is detected: a human being has to manually follow the chain of propagations back, often to realize that the reported issue is a false-positive because one level in the middle is being too conservative. I don't think this is good experience from the end user's perspective -- at least not good enough to justify the extra-ordinary cost the analysis.

There have been recent discussions about extending PEP 647 to support "type asserts" whereby a caller can pass an expression to a function and rely on that function to verify the type or raise an exception if the type does not match. See https://github.com/python/typing/discussions/1013#discussioncomment-1966238 for details. A type assert doesn't quite accomplish what you're looking for though because your code pattern doesn't pass `self.x` as an argument to `setup()`. So a type checker would have no idea that `setup()` was guaranteeing to always assign a non-None value to the instance variable `x`. I don't think that a type checker can safely assume type narrowing across functions. Function execution order can't be determined statically, and method overrides are possible (unless the class is `@final`). For example, if `setup()` were to call another method, a static type checker wouldn't know whether that method (which is potentially overridden by a subclass) modifies instance variable `x` in some way (perhaps assigning it back to `None`). As you pointed out, using complete initialization is a better approach here. You can do that a couple of different ways: ```python class Foo: # Define the instance variable in the class but don't initialize it. # This eliminates the need to declare it as Optional. x: int def __init__(self) -> None: pass def setup(self): self.x = 42 ``` ```python class Foo: # Initialize the value to a non-None value in the constructor. # This also eliminates the need to declare the type as Optional. def __init__(self) -> None: self.x: int = 0 def setup(self): self.x = 42 ``` Another approach I use is to wrap the instance variable in a property. A common assert can then be placed in the property getter. It's a little more verbose, but perhaps not when compared to the verbosity of many repeated asserts. ```python class Foo: def __init__(self) -> None: self._x: int | None = None def setup(self): self.x = 42 @property def x(self): assert self._x is not None return self._x @x.setter def x(self, val: int): self._x = val ``` -Eric -- Eric Traut Contributor to Pyright and Pylance Microsoft

Thanks for the design examples to avoid this problem currently, I think I like the property/getter solution the best. Other two have bad behavior if the guarantee of calling setup is broken.
I didn't get why function execution order can't be determined statically. At least for the purpose of analyzing a given method, it should be clear when this method calls other methods. Regarding overrides, this would mean that all overrides must be known before analysis is done, and the weakest narrowing applies. Best Regards, -- Ilya Kamen On Sat, Jan 29, 2022 at 6:23 PM Eric Traut <eric@traut.com> wrote:

I didn't get why function execution order can't be determined statically.
Rice's theorem states that all non-trivial semantic properties of programs are undecidable. Consider this very simple code snippet: ``` if arbitrary_computation(): foo() bar() else: bar() foo() ``` Without know when `arbitrary_computation()` returns true or not, a static analyzer won't be able to determine the ordering between the two function calls `foo()` and `bar()`. But a static analyzer can never be expected to figure out when `arbitrary_computation()` return true -- if you know how to do that in general, you've solved the halting problem. ... and yet this is just the simplest snippet that does not involve loops, recursive calls, overrides, and other more advanced language features.
Regarding overrides, this would mean that all overrides must be known before analysis is done, and the weakest narrowing applies.
It is true that a static analyzer may be able to apply some kind of over-approximation here to make the problem decidable, e.g. "if function a() may call function b(), then the analysis should conservatively assume that b() will be called". But such kind of over-approximation suffers from 2 big problems: - It requires that the analysis must be inter-procedural, i.e. information one obtains from one function must be propagated globally to all of its callers. This, in turn, means that we must implement some sort of transitive closure / global fixpoint computation -- whose asymptotic complexity would be at least cubic w.r.t. the total number of functions. From my own experience, computation this expensive tends to be really, really hard to scale, especially to million-line codebases that many of the type checker maintainers in this mailing list needs to support. - Due to the global nature of the analysis, a small change in one part of the codebase could significantly alter the analysis result of another seemingly-irrelevant part of the codebase via several levels of transitive propagations. This makes it hard to localize the root cause when an issue is detected: a human being has to manually follow the chain of propagations back, often to realize that the reported issue is a false-positive because one level in the middle is being too conservative. I don't think this is good experience from the end user's perspective -- at least not good enough to justify the extra-ordinary cost the analysis.
participants (3)
-
Eric Traut
-
Ilya Kamenshchikov
-
Jia Chen