
I would like to continue the discussion we had about this in yesterday's meetup. (Thanks again to Pradeep for offering the opportunity to present the case in the meetup, and big thanks to Steven for clarifying the arguments and presenting the case. As a newcomer, I feel I am facing a very open, kind and supportive community.) 0. The problem diagnosis presented as a premise for suggested changes was that TypedDicts, the way they are currently defined, do not treat values in a transitive way: ```python class Parent(TypedDict): x: int class Child(Parent): y: int a: Parent = {"x": 0, "y": 0} # forbidden, due to PEP-589 b: Child = {"x": 0, "y": 0} # OK, because there is a: Parent = b # upcasting is allowed ``` Each of the last three lines makes sense in isolation. Combined, they reveal an oddity: Even though `a` has identical value in line 7 and line 9, it is allowed to assign the type `Parent` to it in line 9, but not in line 7. One could call this "intransitive", "incoherent" or even "inconsistent" behavior of the type-checking system. The name does not matter too much. The point is, there is a tension between allowing subtyping (effectively allowing extra fields on a type-compliant value) and forbidding extra fields on literals. 1. There was unresolved dissent whether lack of transitivity should be considered problematic. In particular, the idea was expressed that consistency is not a value per se. It was also expressed that the confusion caused by this oddity is subjective and part of a learning curve. If we don't believe this is a problem, any further discussion of taking action would be obsolete. I would like to try to establish consensus that this should be considered a problem. This is the argument: It is common to think of types as descriptions of sets of values. There are other definitions, but even PEP-438 ("The Theory to Type Hints") conceptualizes types as sets of values in order to explain the typing system. It is also common language in PEPs to speak of "the type of a certain value". When we have a non-transitive type this concept is broken: Then a type does not correspond to a set of values in an unambiguous way. Whether a literal value complies with the type is then not a question of the value and the type alone. The notion of "type" can still be defined in a consistent and well-defined way - for instance the type could be the definition of a set of statements. It is understandable, that writers of type checkers might conceptualize it this way. Even though this is a possible and well-defined notion of types, it is way less intuitive for users of the type checkers. It is true that it depends on perspective and experience what one finds confusing and that every confusion could be overcome by individual learning. But requiring the users to invest more effort into understanding the typing system has a price. But even more fundamentally, the benefit of the typing system for the user is that they can use the type as a guarantee that it is safe to do certain operations on a certain variable (and leads to predictable results). For that use case it is OK, if the typing system might not be able to tell if a value complies with the type (and resorts to `Any`). But what we see here is different: The typing system gives contradictive information about whether the value complies with the type depending on the context. Note, that it does not say: "This value might comply with this TypedDict or not, I can't tell." That would be OK. But it is different: On a sunny Monday before lunch the system claims: "Yes, this value complies with the type", and on a rainy Tuesday after teatime it states: "No, it doesn't". If types do not correspond to sets of values unambiguously, the benefit of the type-checker is broken for the user. 2. Another main question was about actual use cases. Often, the code does not care about extra fields. Developers will often want to silently ignore extra fields. The use case is writing a test that this ignorance is a safe strategy and that the code is robust against extra keys. This use case is not limited to the boundaries of the code (validating incoming data). As values with extra fields could still comply through a subtype, this could happen everywhere in the code: ```python class Data(TypedDict): x: int y: int def sum_data(data: Data): trimmed = {key: value for key in get_annotations(Data).keys()} return sum(trimmed.values()) def test_sum_data(): test_data: Data = {"x": 0, "y": 0, "z": 1} # this throws a typing error assert sum_data(test_data) == 0 ``` `data` being typed `Data` does not effectively prevent extra keys from being present on `data`. Writing the test that `sum_data` is robust against additional keys, should be straight-forward, but it is not. 3. There were detailed concerns about certain ways to change the behavior of TypedDicts. I think, it makes sense to postpone these questions until we reach consensus first on the question whether that this is a real problem and second that there are valid use cases where this problem unfolds. Am 13.03.2022 21:14 schrieb Jelle Zijlstra:
El dom, 13 mar 2022 a las 12:51, <j.scholbach@posteo.de> escribió:
I have started a draft for a PEP for this. It is my first draft of a PEP. Any feedback and input on it is very much appreciated: https://github.com/jonathan-scholbach/peps/blob/main/pep-9999.rst [1] I have written the draft in ignorance of your message, Eric Traut. I answer here, and point to the draft, where it makes sense.
A) Typing of extra fields: I have taken this into consideration with some detailed reasoning on the draft. The baseline is that I find it hard to come up with a use case for that. The problems you line out for the inheritance of the __extra__ attribute (if it holds the value type constraint) could also be seen as a strengthener for the tendency to keep it simple (just a boolean flag). But if you could name a good use case for value type constraints on the extra fields, that would increase my understanding of the implications of this a lot.
I don't find this feature very compelling if there is no way to specify the type of the extra fields. TypedDicts in general already allow extra keys to exist (because they support structural subtyping), so if we can't say what the type of the extra keys is, we really don't gain much from this new feature.
Your proposed PEP doesn't say much about what specific operations are allowed on an extra=True TypedDict but not a regular TypedDict.
B) Inheritance behavior of `extra` (I think that name is better than `extra_fields`, the argument for this is on the draft, too): The idea is to conceptualize `extra` in close analogy to `total` (find the reasoning on the draft). `total` already showed the problem that inheritance could lead to slight inconsistency (https://bugs.python.org/issue38834 [2]). I agree there should be an `__extra__` dunder attribute on the TypedDict, which just behaves like a normal attribute under inheritance. But I still think, it would be sweet syntax to have `extra` as a parameter of the constructor instead of writing the dunder parameter in the dictionary definition. In particular, it is relevant that
```python class A(TypedDict): foo: str __extra__: str ```
is ambiguous: it could mean that a key `"__extra__"` with value type `str` would be enforced on the dictionary. But I agree with you that is important to handle the fact that
```python class A(TypedDict, extra=True): x: int
class B(TypedDict, extra=False): pass ```
leads to an inconsistency. The problem with this is that the class hierarchy would not be aligned with the type hierarchy any more. That is unexpected and a smell. For `total` this problem does not occur, as in something like
```python class A(TypedDict, total=True): a: int
class B(A, total=False): pass
B.__required_keys__ # frozenset({'a'}) ```
the child class's `total` specification has effect only on the keys that are added on the child class. The solution for `extra` would be to allow the inheriting class only to flip the value of `extra` from False to True when changing it. I think, that makes sense. I actually think, analogue behavior of `total` would also make sense, because I consider
```python class A(TypedDict, total=True): a: int b: str
class B(A, total=False): a: int
B.__required_keys__ # frozenset({'a', 'b'}) ```
a gotcha. But this is probably out of scope here.
C) Generics: This is a good point. I think it makes sense to discuss this, once A) is settled. I just don't know when a point could be considered "settled" :) I am sure this particular question is not settled yet (but far away from this), but any feedback on how these discussions are lead here, is very much appreciated. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ [3] Member address: jelle.zijlstra@gmail.com
Links: ------ [1] https://github.com/jonathan-scholbach/peps/blob/main/pep-9999.rst [2] https://bugs.python.org/issue38834 [3] https://mail.python.org/mailman3/lists/typing-sig.python.org/