A few thoughts here (and apologies for the long post). First, I think we should try to avoid conflating _when_ to apply type narrowing with _how_ to apply type narrowing (i.e. the resulting type when narrowing is applied). We may want to standardize aspects of both the _when_ and the _how_, but it will help the discussion if we keep these two issues separate as much as possible.
I will first focus on _when_ type narrowing should be applied.
It appears that we have one primary point of contention — whether narrowing should be applied for all assignments. Some have argued that narrowing should be skipped if the assignment statement includes a variable type annotation.
In short, should these two cases be treated in a consistent manner?
```python # (1) a: T = x
# (2) a: T a = x ```
We (the maintainers of pyright) have a strong conviction that type narrowing should not be tied in any way with the location of type declarations. The two code samples above should be treated in a consistent manner, and narrowing should be applied in both cases.
Here’s the thought process behind this position. I’m hopeful that most of these points (at least through point 5) will be uncontentious and will form a solid basis for our point of view.
**Point 1**: A type annotation provided for a variable declares that the variable should be assigned only values whose types are compatible with the declared type. Any attempt to assign a value whose type is not compatible with a variable’s declared type should be flagged as an error by a type checker.
**Point 2**: A symbol can have only one declared type. A symbol can’t be declared as a function and in the same scope redeclared as a variable of type “str”. An instance variable cannot be declared as an `int` in one method but a `str` in another. A local variable cannot be declared as an `int` within an “if” clause and declared as a `str` within the “else” clause. Any attempt to redeclare a symbol in an incompatible manner should be flagged as an error.
**Point 3**: PEP 526 does not (nor should it) dictate that a variable’s type declaration must be provided upon “first use” of that variable. In fact, there’s no clear definition of “first use” in Python. An instance variable (e.g. “self.foo”) might be assigned a value in two different methods, and the order in which these methods are called at runtime cannot be known statically. A variable’s type declaration can therefore appear anywhere, in any order, and it must apply to all uses of that variable.
```python # These are semantically equivalent. class C1: def f1(self): self.var: Optional[int] = None def f2(self): self.var = 4
class C1: def f1(self): self.var = None def f2(self): self.var: Optional[int] = 4 ```
**Point 4**: Absent any additional information about a variable whose type has been declared, a type checker should assume that it contains a value with the declared type.
```python def foo(a: Optional[int]): reveal_type(a) # “Optional[int]” at this point
class C1: def __init__(self): self.var: Optional[str] = None
def f1(self): reveal_type(self.var) # “Optional[str]” at this point ```
**Point 5**: Type narrowing is a process by which a type checker can use additional contextual knowledge within a control flow graph to understand more detailed information about a variable’s contents _at a particular point in the code flow_. Knowledge of narrower types is beneficial because it allows the type checker to perform more accurate check, minimize false positives, and provide more accurate completion suggestions for developers. Additional knowledge about a variable’s type that can be discerned from the control flow graph should not be discarded or ignored!
**Point 6**: Type narrowing should be independent of (and not influenced by) the location of a variable’s type declaration. As we noted above, the location of a variable’s declaration is incidental, and the type declaration is independent of code flow considerations.
Any other interpretation results in inconsistent behaviors, an inconsistent mental model for developers — especially those who are new to type annotations, and hard-to-explain edge cases. Alternate interpretations also result in false-positive errors because they ignore valuable information derived from narrowing.
Some have argued that applying type narrowing in an inconsistent manner can be a useful technique for handling cases where the assigned variable is subsequently used to infer an invariant type argument for a generic type.
```python a: Union[int, str] = 1 # Type is a is narrowed to int b = [a] # Inferred type of b is List[int] c: List[Union[int, str]] = b # Error ```
In this example, `b` does not have a declared type, so its type needs to be inferred. There are cases (like this) where a type cannot be inferred in an unambiguous manner, and the normal heuristics for type inference can produce an answer that is inconsistent with the programmer’s intent. The appropriate fix for removing this ambiguity is to introduce a type declaration and eliminate the need for type inference. In this case, the ambiguity is with the inference of `b`, so a type declaration should be added as follows:
```python a: Union[int, str] = 1 b: List[Union[int, str]] = [a] c = b ```
Solving the inference ambiguity for `b` by introducing an inconsistent type narrowing rule for `a` is not the right solution!
Finally, it’s worth noting that there are many different conditions where type narrowing can be applied, involving a variety of techniques. For example, pyright currently applies type narrowing in the following cases:
1. Assignments 2. Positive (“if”) and negative (“else”) conditional code flow constructs where the conditional expression contains: * A combination of “and”, “or” or “not” boolean operators * “is None” or “is not None” tests * “type(X) is Y” or “type(X) is not Y” tests * “X.Y == <literal>” or “X.Y != <literal>” tests where Y is used to discriminate among subtypes * “X is Y” or “X is not Y” enum tests * “X in Y” tests * isinstance() tests * issubclass() tests * callable() tests * Truthy or Falsy tests
Standardizing this list is going to be challenging because different type checkers implement different techniques, and this list will undoubtedly continue to grow over time as new innovations are introduced. It may still be worth trying to standardize some aspects of assignment-based narrowing, but this will invariably leave behavioral differences between type checkers. We should collectively decide on whether the benefits of standardizing these behaviors outweigh the downsides.
Now, I’ll focus on _how_ type narrowing is applied, specifically for assignments.
We seem to have reached some level of agreement about the following: 1. Narrowing should eliminate subtypes within a union where possible 2. Narrowing should replace base classes and protocols with derived classes and derived protocols where possible 3. Narrowing should not be applied if the declared type is “Any”
Guido identified a bug in pyright where rule #2 was not being applied when the RHS of the assignment was a call to a constructor. This is a regression that I introduced recently. I’m working on a fix as well as a more comprehensive test suite that should prevent similar regressions in the future.
As I started to fill out this test suite, I ran across additional cases that are not covered by the list above. The most notable case involves a type declaration that includes a generic type (or a generic protocol) where one or more type arguments is specified as “Any”. For example:
```python def func(p: List[int]): a: List[Any] = p reveal_type(a) # ? ```
If we agree that “Any” is meant as a way to disable further type checking, then it probably follows that the `reveal_type` in the above example should indicate that `a` is type `List[Any]` rather than `List[int]`. Do others agree?
If so, how far do we take this? Should it apply to derived classes and protocols? I think the answer is probably yes, but I’m not fully convicted.
Should this nest arbitrarily or apply only to the first level of type arguments? I’m not sure.
```python def func(p: List[int]): a: Iterable[Any] = p reveal_type(a) # List[Any]? ```
```python def func(p: List[List[int]]): a: Iterable[List[Any]] = p reveal_type(a) # List[List[Any]]? List[List[int]]? ```
There are also interesting cases involving unions that include Any. I think these are straightforward if we apply the rules consistently.
```python def func(p: List[int]): a: Union[Any, Iterable[int]] = p reveal_type(a) # Union[Any, List[int]]? ```
```python def func(p: Union[Any, List[int]]): a: Iterable[int] = p reveal_type(a) # Union[Any, List[int]]? ```
Based on this exploration, I propose that we add the following fourth rule: 4. If a declared type contains an “Any” type argument, the corresponding type argument(s) in the narrowed type must also be “Any”
Here’s a partial test suite that I’ve started to compile based on these four rules. It builds on Guido’s examples using an `Animal` base class. Do these test cases look correct?
```python from typing import Any, Dict, Generic, Iterable, List, Literal, Protocol, TypeVar, Union
_T1 = TypeVar("_T1") _T2 = TypeVar("_T2") _T3 = TypeVar("_T3")
class Animal(Generic[_T1, _T2]): pass
class Bear(Animal[_T3, int]): pass
class Donkey(Animal[int, int], Generic[_T3]): pass
class Flyer(Protocol[_T1]): def get_wingspan(self, p1: _T1) -> float: raise NotImplemented
class CaveDweller(Generic[_T1]): pass
class Bat(Animal[int, int], CaveDweller[int]): def get_wingspan(self, p1: int) -> float: raise NotImplemented
def s1(): b: Bear[str] = Bear() a: Animal[str, int] = b t: Literal["Bear[str]"] = reveal_type(a)
def s2(): a: Animal[str, int] = Bear() t: Literal["Bear[str]"] = reveal_type(a)
def s3(): a: Animal[str, int] = Bear() t: Literal["Bear[str]"] = reveal_type(a)
def s4(): a: Bear[Any] = Bear[int]() t: Literal["Bear[Any]"] = reveal_type(a)
def s5(): a: Animal[Any, Any] = Bear[int]() t: Literal["Bear[Any]"] = reveal_type(a)
def s6(): a: Union[Bat, Bear[str]] = Bear() t: Literal["Bear[str]"] = reveal_type(a)
def s7(p: Union[Bat, Bear[int]]): a: Animal[int, int] = p t: Literal["Bat | Bear[int]"] = reveal_type(a)
def s8(): a: Animal[int, int] = Bear[int]() t: Literal["Bear[int]"] = reveal_type(a)
def s9(p: Dict[str, str]): a: Dict[str, Any] = p t: Literal["Dict[str, Any]"] = reveal_type(a)
def s10(p: List[str]): a: Iterable[Any] = p t1: Literal["List[Any]"] = reveal_type(a) b: Iterable[str] =  t2: Literal["List[str]"] = reveal_type(b) c: Iterable[str] = list() t3: Literal["list[str]"] = reveal_type(c)
def s11(): a: Animal[Any, Any] = Donkey[int]() t: Literal["Donkey[int]"] = reveal_type(a)
def s12(p: Bear[_T1]): a: Animal[Any, int] = p t: Literal["Bear[Any]"] = reveal_type(a)
def s13(p: Bat): a: Flyer[int] = p t: Literal["Bat"] = reveal_type(a)
def s14(p: Bat): a: CaveDweller[int] = p t: Literal["Bat"] = reveal_type(a) ```
--- Eric Traut Microsoft