[Typing-sig] Re: Any way to express that a dict is expected to have a key(s) but don't care about other keys?

April 7, 2022

      I would like to continue the discussion we had about this in yesterday's 
meetup. (Thanks again to Pradeep for offering the opportunity to present 
the case in the meetup, and big thanks to Steven for clarifying the 
arguments and presenting the case. As a newcomer, I feel I am facing a 
very open, kind and supportive community.)

0. The problem diagnosis presented as a premise for suggested changes 
was that TypedDicts, the way they are currently defined, do not treat 
values in a transitive way:

```python
class Parent(TypedDict):
     x: int

class Child(Parent):
     y: int

a: Parent = {"x": 0, "y": 0}  # forbidden, due to PEP-589
b: Child = {"x": 0, "y": 0}  # OK, because there is
a: Parent = b  # upcasting is allowed
```

Each of the last three lines makes sense in isolation. Combined, they 
reveal an oddity: Even though `a` has identical value in line 7 and line 
9, it is allowed to assign the type `Parent` to it in line 9, but not in 
line 7. One could call this "intransitive", "incoherent" or even 
"inconsistent" behavior of the type-checking system. The name does not 
matter too much. The point is, there is a tension between allowing 
subtyping (effectively allowing extra fields on a type-compliant value) 
and forbidding extra fields on literals.

1. There was unresolved dissent whether lack of transitivity should be 
considered problematic. In particular, the idea was expressed that 
consistency is not a value per se. It was also expressed that the 
confusion caused by this oddity is subjective and part of a learning 
curve. If we don't believe this is a problem, any further discussion of 
taking action would be obsolete. I would like to try to establish 
consensus that this should be considered a problem. This is the 
argument:

It is common to think of types as descriptions of sets of values. There 
are other definitions, but even PEP-438 ("The Theory to Type Hints") 
conceptualizes types as sets of values in order to explain the typing 
system. It is also common language in PEPs to speak of "the type of a 
certain value". When we have a non-transitive type this concept is 
broken: Then a type does not correspond to a set of values in an 
unambiguous way. Whether a literal value complies with the type is then 
not a question of the value and the type alone. The notion of "type" can 
still be defined in a consistent and well-defined way - for instance the 
type could be the definition of a set of statements. It is 
understandable, that writers of type checkers might conceptualize it 
this way. Even though this is a possible and well-defined notion of 
types, it is way less intuitive for users of the type checkers. It is 
true that it depends on perspective and experience what one finds 
confusing and that every confusion could be overcome by individual 
learning. But requiring the users to invest more effort into 
understanding the typing system has a price.
But even more fundamentally, the benefit of the typing system for the 
user is that they can use the type as a guarantee that it is safe to do 
certain operations on a certain variable (and leads to predictable 
results). For that use case it is OK, if the typing system might not be 
able to tell if a value complies with the type (and resorts to `Any`). 
But what we see here is different: The typing system gives contradictive 
information about whether the value complies with the type depending on 
the context. Note, that it does not say: "This value might comply with 
this TypedDict or not, I can't tell." That would be OK. But it is 
different: On a sunny Monday before lunch the system claims: "Yes, this 
value complies with the type", and on a rainy Tuesday after teatime it 
states: "No, it doesn't". If types do not correspond to sets of values 
unambiguously, the benefit of the type-checker is broken for the user.

2. Another main question was about actual use cases. Often, the code 
does not care about extra fields. Developers will often want to silently 
ignore extra fields. The use case is writing a test that this ignorance 
is a safe strategy and that the code is robust against extra keys. This 
use case is not limited to the boundaries of the code (validating 
incoming data). As values with extra fields could still comply through a 
subtype, this could happen everywhere in the code:

```python
class Data(TypedDict):

     x: int
     y: int

def sum_data(data: Data):
     trimmed = {key: value for key in get_annotations(Data).keys()}
     return sum(trimmed.values())

def test_sum_data():
     test_data: Data = {"x": 0, "y": 0, "z": 1}  # this throws a typing 
error
     assert sum_data(test_data) == 0
```

`data` being typed `Data` does not effectively prevent extra keys from 
being present on `data`. Writing the test that `sum_data` is robust 
against additional keys, should be straight-forward, but it is not.

3. There were detailed concerns about certain ways to change the 
behavior of TypedDicts. I think, it makes sense to postpone these 
questions until we reach consensus first on the question whether that 
this is a real problem and second that there are valid use cases where 
this problem unfolds.

Am 13.03.2022 21:14 schrieb Jelle Zijlstra:
...
El dom, 13 mar 2022 a las 12:51, <j.scholbach@posteo.de> escribió:
...
I have started a draft for a PEP for this. It is my first draft of a
PEP. Any feedback and input on it is very much appreciated:
https://github.com/jonathan-scholbach/peps/blob/main/pep-9999.rst
[1] I have written the draft in ignorance of your message, Eric
Traut. I answer here, and point to the draft, where it makes sense.
A) Typing of extra fields: I have taken this into consideration
with some detailed reasoning on the draft. The baseline is that I
find it hard to come up with a use case for that. The problems you
line out for the inheritance of the __extra__ attribute (if it holds
the value type constraint) could also be seen as a strengthener for
the tendency to keep it simple (just a boolean flag). But if you
could name a good use case for value type constraints on the extra
fields, that would increase my understanding of the implications of
this a lot.
I don't find this feature very compelling if there is no way to
specify the type of the extra fields. TypedDicts in general already
allow extra keys to exist (because they support structural subtyping),
so if we can't say what the type of the extra keys is, we really don't
gain much from this new feature.
Your proposed PEP doesn't say much about what specific operations are
allowed on an extra=True TypedDict but not a regular TypedDict.
...
B) Inheritance behavior of `extra` (I think that name is better
than `extra_fields`, the argument for this is on the draft, too):
The idea is to conceptualize `extra` in close analogy to `total`
(find the reasoning on the draft). `total` already showed the
problem that inheritance could lead to slight inconsistency
(https://bugs.python.org/issue38834 [2]). I agree there should be an
`__extra__` dunder attribute on the TypedDict, which just behaves
like a normal attribute under inheritance. But I still think, it
would be sweet syntax to have `extra` as a parameter of the
constructor instead of writing the dunder parameter in the
dictionary definition. In particular, it is relevant that
```python
class A(TypedDict):
foo: str
__extra__: str
```
is ambiguous: it could mean that a key `"__extra__"` with value
type `str` would be enforced on the dictionary.
But I agree with you that is important to handle the fact that
```python
class A(TypedDict, extra=True):
x: int
class B(TypedDict, extra=False):
pass
```
leads to an inconsistency. The problem with this is that the class
hierarchy would not be aligned with the type hierarchy any more.
That is unexpected and a smell. For `total` this problem does not
occur, as in something like
```python
class A(TypedDict, total=True):
a: int
class B(A, total=False):
pass
B.__required_keys__ # frozenset({'a'})
```
the child class's `total` specification has effect only on the keys
that are added on the child class. The solution for `extra` would be
to allow the inheriting class only to flip the value of `extra` from
False to True when changing it. I think, that makes sense. I
actually think, analogue behavior of `total` would also make sense,
because I consider
```python
class A(TypedDict, total=True):
a: int
b: str
class B(A, total=False):
a: int
B.__required_keys__ # frozenset({'a', 'b'})
```
a gotcha. But this is probably out of scope here.
C) Generics: This is a good point. I think it makes sense to
discuss this, once A) is settled. I just don't know when a point
could be considered "settled" :) I am sure this particular question
is not settled yet (but far away from this), but any feedback on how
these discussions are lead here, is very much appreciated.
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/ [3]
Member address: jelle.zijlstra@gmail.com
Links:
------
[1] https://github.com/jonathan-scholbach/peps/blob/main/pep-9999.rst
[2] https://bugs.python.org/issue38834
[3] https://mail.python.org/mailman3/lists/typing-sig.python.org/

[Typing-sig] Re: Any way to express that a dict is expected to have a key(s) but don't care about other keys?

j.scholbach＠posteo.de