Mailman 3 Literals in dynamic type checking - Typing-sig

newer
Next tensor typing meeting: Monday...

Literals in dynamic type checking

Paul Bryan

18 Sep 2021 18 Sep '21

7 a.m.

Background typing.Literal was defined with static type checking in mind. During static type checking, Literal requires the value being checked to be a literal value, matching one of the Literal value(s) specified. In a static type checking context, this behaviour is both intuitive and obvious. Problem During dynamic type checking, Literal can be specified in a type hint, but there is no obvious way to determine at runtime whether the value being checked is sourced from a Python literal or elsewhere. For example, it could have been loaded from JSON. Furthermore, it is not necessarily desirable to make such a determination at runtime. Pydantic currently validates a value against a Literal type hint without regard to whether it is a Python literal. It appears to simply check for type and value equality against the defined Literal values in the type hint. (This is originally how I interpreted Literal would behave when I first encountered it.) I believe this makes Pydantic’s use of Literal incompatible with static type checking tools like mypy. Ideas 1. Codify Pydantic’s current behavior without changing static type checking behavior (in PEP 586 or elsewhere) This would lead to many situations where code would pass type checking at runtime, but fail static type checking due to constraints on literal values. 2. Relax Literal’s “literalness” static type checking behavior I believe this would conflict with developers’ objectives of keeping the value “safe” by requiring it be literally supplied, not allowing it to be deserialized or built dynamically. (Discussed previously on this list.) 3. Just use Enum I would argue Enum is unnecessary verbose, requiring the definition of a class, separate members, subclassing and dunders (e.g. __str__) to handle string values. It requires unnecessary naming of values, and requires avoiding “invalid” names. (3.11’s StrEnum could mitigate some of this?) 4. Define a new “relaxed” Literal type hint Define a new type hint, paralleling Literal’s syntax, relaxing the requirement that the value be an actual literal. Candidate names like Figure[...], Value[...], enum[...]. It would likely not be included in stdlib, so would likely be implemented by dynamic type checking libraries. What are your thoughts? Thanks in advance for your consideration.

Attachments:

attachment.htm (text/html — 3.5 KB)

Show replies by date

Guido van Rossum

18 Sep 18 Sep

2:34 p.m.

Hi Paul, I cannot follow your proposal (or complaint?) without an example, Here's how I suppose it could work at runtime. def g(a: Literal[1, 2, 3]): ... def f(a: Literal[1, 2]): g(a) x: Literal[1] = 1 # Could be a value read from a JSON file f(x) # Passes both static and runtime checks f(x+1) # Passes at runtime, fails static check f(x+2) # Fails both checks Do you agree with these outcomes? (I don't have Pydantic handy, so I don't actually know what it does, but based on your description it should check the value of x, x+1 and x+2 against Literal[1, 2].) If not, what would you want? If yes, can you show an example of what you are talking about? --Guido On Sat, Sep 18, 2021 at 12:00 AM Paul Bryan <pbryan@anode.ca> wrote:

...

*Background*

typing.Literal was defined with static type checking in mind. During static type checking, Literal requires the value being checked to be a literal value, matching one of the Literal value(s) specified. In a static type checking context, this behaviour is both intuitive and obvious.

*Problem*

During dynamic type checking, Literal can be specified in a type hint, but there is no obvious way to determine at runtime whether the value being checked is sourced from a Python literal or elsewhere. For example, it could have been loaded from JSON. Furthermore, it is not necessarily desirable to make such a determination at runtime.

Pydantic currently validates a value against a Literal type hint without regard to whether it is a Python literal. It appears to simply check for type and value equality against the defined Literal values in the type hint. (This is originally how I interpreted Literal would behave when I first encountered it.) I believe this makes Pydantic’s use of Literal incompatible with static type checking tools like mypy.

*Ideas*

1. Codify Pydantic’s current behavior without changing static type checking behavior (in PEP 586 or elsewhere) This would lead to many situations where code would pass type checking at runtime, but fail static type checking due to constraints on literal values.

2. Relax Literal’s “literalness” static type checking behavior I believe this would conflict with developers’ objectives of keeping the value “safe” by requiring it be literally supplied, not allowing it to be deserialized or built dynamically. (Discussed previously on this list.)

3. Just use Enum I would argue Enum is unnecessary verbose, requiring the definition of a class, separate members, subclassing and dunders (e.g. __str__) to handle string values. It requires unnecessary naming of values, and requires avoiding “invalid” names. (3.11’s StrEnum could mitigate some of this?)

4. Define a new “relaxed” Literal type hint Define a new type hint, paralleling Literal’s syntax, relaxing the requirement that the value be an actual literal. Candidate names like Figure[...], Value[...], enum[...]. It would likely not be included in stdlib, so would likely be implemented by dynamic type checking libraries.

What are your thoughts? Thanks in advance for your consideration.

_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Paul Bryan

5:24 p.m.

Yes, I agree with the outcomes. What seems problematic to me about this is that static and dynamic type checking become mutually exclusive; the f(x+1) example. I'm not conversant enough with static type checking practices to know: how should one pass a value to a Literal when the value being passed is not static? For example, if mode were added to open(..., mode: Literal["r", "w", "r+", "w+", ...]) and mode were dynamically generated, how would one express it so as to pass muster with mypy? On Sat, 2021-09-18 at 07:34 -0700, Guido van Rossum wrote:

...

Hi Paul,

I cannot follow your proposal (or complaint?) without an example,

Here's how I suppose it could work at runtime.

def g(a: Literal[1, 2, 3]): ...

def f(a: Literal[1, 2]): g(a)

x: Literal[1] = 1 # Could be a value read from a JSON file

f(x) # Passes both static and runtime checks f(x+1) # Passes at runtime, fails static check f(x+2) # Fails both checks

Do you agree with these outcomes? (I don't have Pydantic handy, so I don't actually know what it does, but based on your description it should check the value of x, x+1 and x+2 against Literal[1, 2].) If not, what would you want? If yes, can you show an example of what you are talking about?

--Guido

On Sat, Sep 18, 2021 at 12:00 AM Paul Bryan <pbryan@anode.ca> wrote:

...
Background

typing.Literal was defined with static type checking in mind. During static type checking, Literal requires the value being checked to be a literal value, matching one of the Literal value(s) specified. In a static type checking context, this behaviour is both intuitive and obvious.

Problem

During dynamic type checking, Literal can be specified in a type hint, but there is no obvious way to determine at runtime whether the value being checked is sourced from a Python literal or elsewhere. For example, it could have been loaded from JSON. Furthermore, it is not necessarily desirable to make such a determination at runtime.

Pydantic currently validates a value against a Literal type hint without regard to whether it is a Python literal. It appears to simply check for type and value equality against the defined Literal values in the type hint. (This is originally how I interpreted Literal would behave when I first encountered it.) I believe this makes Pydantic’s use of Literal incompatible with static type checking tools like mypy.

Ideas

1. Codify Pydantic’s current behavior without changing static type checking behavior (in PEP 586 or elsewhere) This would lead to many situations where code would pass type checking at runtime, but fail static type checking due to constraints on literal values.

2. Relax Literal’s “literalness” static type checking behavior I believe this would conflict with developers’ objectives of keeping the value “safe” by requiring it be literally supplied, not allowing it to be deserialized or built dynamically. (Discussed previously on this list.)

3. Just use Enum I would argue Enum is unnecessary verbose, requiring the definition of a class, separate members, subclassing and dunders (e.g. __str__) to handle string values. It requires unnecessary naming of values, and requires avoiding “invalid” names. (3.11’s StrEnum could mitigate some of this?)

4. Define a new “relaxed” Literal type hint Define a new type hint, paralleling Literal’s syntax, relaxing the requirement that the value be an actual literal. Candidate names like Figure[...], Value[...], enum[...]. It would likely not be included in stdlib, so would likely be implemented by dynamic type checking libraries.

What are your thoughts? Thanks in advance for your consideration.

_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org

Carl Meyer

19 Sep 19 Sep

3:08 a.m.

Hi Paul, On Sat, Sep 18, 2021 at 11:24 AM Paul Bryan <pbryan@anode.ca> wrote:

...

What seems problematic to me about this is that static and dynamic type checking become mutually exclusive; the f(x+1) example.

I'm not sure "mutually exclusive" is a good description of the situation, since I think in practice there is pretty much always _some_ way to write the code that makes both happy. But it is undoubtedly true (and inherent to the nature of static analysis) that static type checks must sometimes reject code that would pass runtime type checks. In a sound type system the reverse _should_ not be true (runtime type checks should not fail on code that passes static type checks), but Python's type system is not sound (a type system can't be both sound and gradual without adding runtime checks; the Any type is unsound), so it is also possible in Python for code to pass static type checks and still fail runtime type checks. The existence of both of these possibilities is embedded in the nature of Python typing and is not specifically related to Literal types.

...

I'm not conversant enough with static type checking practices to know: how should one pass a value to a Literal when the value being passed is not static? For example, if mode were added to open(..., mode: Literal["r", "w", "r+", "w+", ...]) and mode were dynamically generated, how would one express it so as to pass muster with mypy?

The only reason typing open's mode argument with a Literal _might_ be considered is that it is so unlikely anyone would want to pass a dynamic value for it, since the code that uses the resulting file object is so likely to be itself dependent on the specific mode. But if there were some realistic use case for this, the options are to either add runtime conditionals that allow your static type checker to narrow its statically known type to a Literal type (e.g. something along the lines of `if type(x) is str and x in ["r", "w", "r+", "w+"]:`), or if your static type checker doesn't have advanced enough narrowing to do this, use a cast.

...

def f(x: Literal["A", "B", "C", 1, 2, 3]) -> None: if isinstance(x, int): g(cast(Literal[1, 2, 3], x))

def g(x: Literal[1, 2, 3]) -> None:

This is a case that static type checkers should be able to handle; if they know x is a `Literal["A", "B", "C", 1, 2, 3]`, and then they know that x is also an int, they can narrow its type to `Literal[1, 2, 3]`, and no cast should ideally be needed here. (I haven't surveyed mypy, Pyre, and pyright to find out if they actually do handle this specific case, though Eric's message suggests that pyright at least probably does.) Carl

Paul Bryan

5:39 a.m.

Got it, thanks. On Sat, 2021-09-18 at 21:08 -0600, Carl Meyer wrote:

...

Hi Paul,

On Sat, Sep 18, 2021 at 11:24 AM Paul Bryan <pbryan@anode.ca> wrote:

...
What seems problematic to me about this is that static and dynamic type checking become mutually exclusive; the f(x+1) example.

I'm not sure "mutually exclusive" is a good description of the situation, since I think in practice there is pretty much always _some_ way to write the code that makes both happy. But it is undoubtedly true (and inherent to the nature of static analysis) that static type checks must sometimes reject code that would pass runtime type checks. In a sound type system the reverse _should_ not be true (runtime type checks should not fail on code that passes static type checks), but Python's type system is not sound (a type system can't be both sound and gradual without adding runtime checks; the Any type is unsound), so it is also possible in Python for code to pass static type checks and still fail runtime type checks. The existence of both of these possibilities is embedded in the nature of Python typing and is not specifically related to Literal types.

...
I'm not conversant enough with static type checking practices to know: how should one pass a value to a Literal when the value being passed is not static? For example, if mode were added to open(..., mode: Literal["r", "w", "r+", "w+", ...]) and mode were dynamically generated, how would one express it so as to pass muster with mypy?

The only reason typing open's mode argument with a Literal _might_ be considered is that it is so unlikely anyone would want to pass a dynamic value for it, since the code that uses the resulting file object is so likely to be itself dependent on the specific mode. But if there were some realistic use case for this, the options are to either add runtime conditionals that allow your static type checker to narrow its statically known type to a Literal type (e.g. something along the lines of `if type(x) is str and x in ["r", "w", "r+", "w+"]:`), or if your static type checker doesn't have advanced enough narrowing to do this, use a cast.

...
def f(x: Literal["A", "B", "C", 1, 2, 3]) -> None: if isinstance(x, int): g(cast(Literal[1, 2, 3], x))

def g(x: Literal[1, 2, 3]) -> None:

This is a case that static type checkers should be able to handle; if they know x is a `Literal["A", "B", "C", 1, 2, 3]`, and then they know that x is also an int, they can narrow its type to `Literal[1, 2, 3]`, and no cast should ideally be needed here. (I haven't surveyed mypy, Pyre, and pyright to find out if they actually do handle this specific case, though Eric's message suggests that pyright at least probably does.)

Carl

Carl Meyer

18 Sep 18 Sep

3:13 p.m.

Hi Paul, I don't think it is true that the Literal type is intended to require as part of the semantics of the type that the value must originate from a Python literal. There is no such implication in PEP 586; in fact it says merely in the Core Semantics section that "if we define some variable foo to have type Literal[3], we are declaring that foo must be exactly equal to 3 and no other value." It is true that a Literal type will generally only be statically known when it originates from a Python literal expression. But this is just a specific case of the general limitations of static analysis: that statically-known types are a wider approximation of the actual runtime types. And it is not necessarily always true, for example it would be perfectly valid for a static type checker to narrow the static type of `x` within the body of `if type(x) is int and x == 1:` to `Literal[1]`, no matter the origin of `x`. (I'm not saying any current type checkers bother to do exactly this, since it's an edge case that requires the type identity check to rule out the possibility of oddly-behaving subclasses of int, but in principle there would be nothing wrong with it if they did.) So I think your original conception of the semantics of Literal, and pydantic's current validation, are both entirely correct, and there is no problem to be solved here. Carl On Sat, Sep 18, 2021 at 1:00 AM Paul Bryan <pbryan@anode.ca> wrote:

...

Background

typing.Literal was defined with static type checking in mind. During static type checking, Literal requires the value being checked to be a literal value, matching one of the Literal value(s) specified. In a static type checking context, this behaviour is both intuitive and obvious.

Problem

During dynamic type checking, Literal can be specified in a type hint, but there is no obvious way to determine at runtime whether the value being checked is sourced from a Python literal or elsewhere. For example, it could have been loaded from JSON. Furthermore, it is not necessarily desirable to make such a determination at runtime.

Pydantic currently validates a value against a Literal type hint without regard to whether it is a Python literal. It appears to simply check for type and value equality against the defined Literal values in the type hint. (This is originally how I interpreted Literal would behave when I first encountered it.) I believe this makes Pydantic’s use of Literal incompatible with static type checking tools like mypy.

Ideas

1. Codify Pydantic’s current behavior without changing static type checking behavior (in PEP 586 or elsewhere) This would lead to many situations where code would pass type checking at runtime, but fail static type checking due to constraints on literal values.

2. Relax Literal’s “literalness” static type checking behavior I believe this would conflict with developers’ objectives of keeping the value “safe” by requiring it be literally supplied, not allowing it to be deserialized or built dynamically. (Discussed previously on this list.)

3. Just use Enum I would argue Enum is unnecessary verbose, requiring the definition of a class, separate members, subclassing and dunders (e.g. __str__) to handle string values. It requires unnecessary naming of values, and requires avoiding “invalid” names. (3.11’s StrEnum could mitigate some of this?)

4. Define a new “relaxed” Literal type hint Define a new type hint, paralleling Literal’s syntax, relaxing the requirement that the value be an actual literal. Candidate names like Figure[...], Value[...], enum[...]. It would likely not be included in stdlib, so would likely be implemented by dynamic type checking libraries.

What are your thoughts? Thanks in advance for your consideration.

_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: carl@oddbird.net

Carl Meyer

3:27 p.m.

On Sat, Sep 18, 2021 at 9:13 AM Carl Meyer <carl@oddbird.net> wrote:

...

It is true that a Literal type will generally only be statically known when it originates from a Python literal expression. But this is just a specific case of the general limitations of static analysis: that statically-known types are a wider approximation of the actual runtime types.

To clarify this point: there are many other situations where the same symptom you observe (static type checking may fail where runtime checking would pass) can occur without the involvement of Literal types. Consider this example: ``` def f(x: object) -> None: g(x) def g(x: int) -> None ... f(1) ``` This code must fail static type checking because the call to g(x) is invalid since it passes an `object` where an `int` is required. The statically known type of `x` within `f` is only `object`, since call-sites of `f` will be allowed to pass any object. And yet in the specific case of the call `f(1)`, where the argument happens to be an integer, runtime type checking will be fine. This is exactly analogous to the case you are describing, if we had instead `g(x: Literal[1]) -> None` and `f(x: int) -> None` -- this is statically a type error, because the static type check must consider that `x` could take on any value that is part of the type `int`, even though any given specific runtime call might check out fine, if `x` happens to in fact be `1` for that call. Carl

Paul Bryan

5:35 p.m.

Thanks for the clarifications. So casting is the solution to avoid this problem with static type checkers? Switching back to Literals... def f(x: Literal["A", "B", "C", 1, 2, 3]) -> None: if isinstance(x, int): g(cast(Literal[1, 2, 3], x)) else: ... def g(x: Literal[1, 2, 3]) -> None: ... On Sat, 2021-09-18 at 09:27 -0600, Carl Meyer wrote:

...

On Sat, Sep 18, 2021 at 9:13 AM Carl Meyer <carl@oddbird.net> wrote:

...
It is true that a Literal type will generally only be statically known when it originates from a Python literal expression. But this is just a specific case of the general limitations of static analysis: that statically-known types are a wider approximation of the actual runtime types.

To clarify this point: there are many other situations where the same symptom you observe (static type checking may fail where runtime checking would pass) can occur without the involvement of Literal types. Consider this example:

``` def f(x: object) -> None: g(x)

def g(x: int) -> None ...

f(1) ```

This code must fail static type checking because the call to g(x) is invalid since it passes an `object` where an `int` is required. The statically known type of `x` within `f` is only `object`, since call-sites of `f` will be allowed to pass any object. And yet in the specific case of the call `f(1)`, where the argument happens to be an integer, runtime type checking will be fine.

This is exactly analogous to the case you are describing, if we had instead `g(x: Literal[1]) -> None` and `f(x: int) -> None` -- this is statically a type error, because the static type check must consider that `x` could take on any value that is part of the type `int`, even though any given specific runtime call might check out fine, if `x` happens to in fact be `1` for that call.

Carl

Guido van Rossum

6:01 p.m.

Wait, your problem is that the static checkers aren't smart enough? I thought that you were complaining about the dynamic checkers. I think it's unavoidable that they produce different results (see Carl Meyer's explanation). I'm okay with that. In the long run, static type checkers should probably become smarter. E.g. after x: Literal[1] y = x+1 we could infer that y's type is Literal[2]. (Maybe pyright already does this?) On Sat, Sep 18, 2021 at 10:35 AM Paul Bryan <pbryan@anode.ca> wrote:

...

Thanks for the clarifications. So casting is the solution to avoid this problem with static type checkers?

Switching back to Literals...

def f(x: Literal["A", "B", "C", 1, 2, 3]) -> None: if isinstance(x, int): g(cast(Literal[1, 2, 3], x)) else: ...

def g(x: Literal[1, 2, 3]) -> None: ...

On Sat, 2021-09-18 at 09:27 -0600, Carl Meyer wrote:

On Sat, Sep 18, 2021 at 9:13 AM Carl Meyer <carl@oddbird.net> wrote:

It is true that a Literal type will generally only be statically known when it originates from a Python literal expression. But this is just a specific case of the general limitations of static analysis: that statically-known types are a wider approximation of the actual runtime types.

To clarify this point: there are many other situations where the same symptom you observe (static type checking may fail where runtime checking would pass) can occur without the involvement of Literal types. Consider this example:

``` def f(x: object) -> None: g(x)

def g(x: int) -> None ...

f(1) ```

This code must fail static type checking because the call to g(x) is invalid since it passes an `object` where an `int` is required. The statically known type of `x` within `f` is only `object`, since call-sites of `f` will be allowed to pass any object. And yet in the specific case of the call `f(1)`, where the argument happens to be an integer, runtime type checking will be fine.

This is exactly analogous to the case you are describing, if we had instead `g(x: Literal[1]) -> None` and `f(x: int) -> None` -- this is statically a type error, because the static type check must consider that `x` could take on any value that is part of the type `int`, even though any given specific runtime call might check out fine, if `x` happens to in fact be `1` for that call.

Carl

_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Eric Traut

6:13 p.m.

I agree with Carl. Literal types do not need to originate from literal value expressions. They can also come from type narrowing operations and other sources (including casts). Here's a variant of your example that doesn't involve a cast. ```python def f(x: Literal[1, 2, 3]) -> None: if x == 1 or x == 2: reveal_type(x) # Literal[1, 2] g(x) else: reveal_type(x) # Literal[3] print('cannot call g!') def g(x: Literal[1, 2]) -> None: ... ``` Type narrowing can also generate literal types when they are not explicitly annotated as literals. ```python def f(x: bool) -> None: if x: return reveal_type(x) # pyright reveals: Literal[False] ``` ```python class Color(Enum): Red = 0 Green = 1 Blue = 2 def f(x: Color) -> None: if x is Color.Red or x is Color.Green: return reveal_type(x) # pyright and mypy reveal Literal[Color.Blue] ``` Don't think of `Literal` as meaning "this value is guaranteed to have originated from a literal expression". Think of it instead as a narrower (more specific) type within the the type system. `Literal[1]` is a subtype of `int`. `Literal[False]` is a subtype of `bool`. I don't see any problem with dynamic (runtime) enforcement of literal types. If a symbol's type is annotated as `Literal[1]`, it should contain a value of `1` at runtime. -Eric -- Eric Traut Contributor to pyright & pylance Microsoft

1163

Age (days ago)

1164

Last active (days ago)

List overview

Download

9 comments

4 participants

participants (4)

Carl Meyer
Eric Traut
Guido van Rossum
Paul Bryan

Literals in dynamic type checking

tags

participants (4)