Further questions on interaction of LiteralString with Generics
Hi typing people! We have been implementing LiteralString (PEP 675) support in PyCharm and even though the PEP is written wonderfully and most of the proposal seems straightforward to implement, I just can't wrap my head around how this feature should work with generics without breaking existing code. There is a brief section about it in the PEP but it doesn't cover my concerns, so I decided to bring it here. Sorry if it has been discussed already, I couldn't find anything relevant in the typing-sig archives. The main question is "how far" inferred LiteralStrings should be propagated through substitution of type parameters during the inference. Consider the following example def expects_literal_string(s: LiteralString) -> None: pass s: str = ... xs = ['foo', 'bar'] # inferred type: list[LiteralString] expects_literal_string(''.join(xs)) # ok! it's appealing to infer the type list[LiteralString] for xs here right away to make the result of ''.join(xs) be LiteralString as well (there is an overload for str.join in Typeshed) however doing so will trigger a typing error for the following unassuming code xs.append(s) # type error: expected LiteralString, got str Of course, the same reasoning applies to user-defined generics, not having their own literal syntax T = TypeVar('T') class Box(Generic[T]): def __init__(self, x: T) -> None: self.value = x def set(self, x: T) -> None: self.value = x box = Box('foo') box.set(s) # type error: expected LiteralString, got str One approach (though very limiting) might be to infer list[LiteralString] only in cases where a collection literal is used directly, not assigned to a name, e.g. expects_literal_string(''.join(['foo', 'bar'])) # ok! but even then it might cause some problems, such as here, where a type containing LiteralString "over-constraints" possible values for other parameters sharing the same TypeVar def couple(first: T, second: T) -> tuple[T, T]: return first, second couple(['foo'], [s]) # type error: expecting list[LiteralString], got list[str] I would argue that even if the first argument was explicitly annotated here (for instance to pass it to str.join), it would still be a surprise for a user that an existing call to couple now leads to a type checker error. xs: list[LiteralString] = ['foo'] expects_literal_string(''.join(xs)) couple(xs, [s]) # type error: expecting list[LiteralString], got list[str] I see that Mypy support of LiteralString is still in progress. I'm curious how other type checkers (pyright, Pyre) handle such cases gracefully. I guess there are a number of different strategies to allow gradually introducing LiteralString to existing type hinted code base. For instance: Do you upcast LiteralString to str, "erasing" it, whenever it's captured in a type parameter? Do you require all generic types possibly containing LiteralString values to be explicitly type hinted? Thanks for sharing the knowledge! -- Mikhail Golubev Software Developer JetBrains http://www.jetbrains.com The drive to develop
Hi Mikhail. I presume that PyCharm already has support for Literal, which was introduced in PEP 586. You can think of LiteralString as "the union of all possible literal strings". Unlike bool and enum literals, you can't feasibly enumerate all possible string literals, so this is conceptual rather than real. It's still a useful concept from the perspective of type checking because you can treat LiteralString just like any other union of string literals. To answer all of your questions above, you can just ask yourself "how do we handle Literal['a', 'b'] in this situation?". The answer should be the same for LiteralString. In pyright, the default inference rules for list expressions always widen literals to their non-literal counterparts. To use your example `xs = ['foo', 'bar']`, pyright infers the type of `xs` as `list[str]`. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=literals. Mypy likewise infers `list[str]` in this case. These default inference rules can be overridden through the use of bidirectional type inference (also referred to as an "expected type" or a "type context"). For example, `xs: list[LiteralString] = ['foo', 'bar']` overrides the default type inference rules for list expressions. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=bidirectional-type-i.... You also asked about constraint solving for type variables. Pyright's constraint solver does not produce literals unless they are required to meet the constraints. Normally, the solved type is widened to its non-literal counterpart. For example, if you have a function `def func(a: T) -> T: ...`, the expression `func('hi')` evaluates to type `str`, not `Literal['hi']` or `LiteralString`. However, if an additional constraint is present that requires a literal type solution, it will oblige. For example, `x: LiteralString = func("hi")` is fine, and so is `x: Literal['hi', 'bye'] = func("hi")`. Obviously, `x: Literal['bye'] = func("hi")` will produce an error. I'll note that mypy and pyright differ somewhat when it comes to the handling of literals in their constraint solvers. You can read about those differences here: https://microsoft.github.io/pyright/#/mypy-comparison?id=constraint-solver-l.... Apologies if that was more detail than you were looking for. My basic advice for LiteralString is to simply treat it the same as you would any union of Literal strings. -- Eric Traut Author of pyright
As Eric suggests, there should be nothing special about `LiteralString`. The PEP doesn't say it so clearly, but I think it's just the case that every `Literal[expr]` type where expr has type `str` is a subtype of `LiteralString`, and `LiteralString` is a subtype of `str`. Everything else is a consequence. The issue with the list literals is a general issue with collection literals, not about LiteralString specifically. A list literal where the elements all have type T can actually have type `list[S]` for all types `S` s.t. T is a consistent subtype of S. So `['foo', 'bar']` can have type `list[Literal['foo'] | Literal['bar']]` and also `list[LiteralString]` and `list[str]` and `list[str | bytes]` and `list[Iterable[Any]]` and `list[object]`, etc. That is, ['foo', 'bar'] might be a list of objects that just happens to currently contain only strings. But once you determine a type for that value, it cannot change. If you determine it's a list[object] then someone might put an int or something in it and you can't later say that it's actually a list of literal strings. And if you determine that it's a list[LiteralString] then that necessarily prevents it being used as a list[object]. Eric mentions bidirectional type inference, and that's something that you could try. If you do it for literals as you suggest, it will be "easy". If you do it for variables like `xs = ['foo']; ... xs ...` then you will have to find some way to consider all the contexts that `xs` occurs in. Another idea that seems similar to bidirectional inference (I think) is to treat the type of ['foo', 'bar'] as involving an unknown type that is bounded by a subtype: list[Unknown > Literal['foo'] | Literal['bar']]. Then when you generate consistent subtype constraints for your example of couple(['foo'], [s]) you would have constraints: list[Unknown > Literal['foo']] < T list[str] < T And because `list` is invariant then you have a consistency ("equality) constraint that (Unknown > Literal['foo']) = str which you can solve by str. On Sat, Jun 17, 2023 at 5:47 PM Eric Traut <eric@traut.com> wrote:
Hi Mikhail. I presume that PyCharm already has support for Literal, which was introduced in PEP 586. You can think of LiteralString as "the union of all possible literal strings". Unlike bool and enum literals, you can't feasibly enumerate all possible string literals, so this is conceptual rather than real. It's still a useful concept from the perspective of type checking because you can treat LiteralString just like any other union of string literals. To answer all of your questions above, you can just ask yourself "how do we handle Literal['a', 'b'] in this situation?". The answer should be the same for LiteralString.
In pyright, the default inference rules for list expressions always widen literals to their non-literal counterparts. To use your example `xs = ['foo', 'bar']`, pyright infers the type of `xs` as `list[str]`. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=literals. Mypy likewise infers `list[str]` in this case.
These default inference rules can be overridden through the use of bidirectional type inference (also referred to as an "expected type" or a "type context"). For example, `xs: list[LiteralString] = ['foo', 'bar']` overrides the default type inference rules for list expressions. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=bidirectional-type-i... .
You also asked about constraint solving for type variables. Pyright's constraint solver does not produce literals unless they are required to meet the constraints. Normally, the solved type is widened to its non-literal counterpart. For example, if you have a function `def func(a: T) -> T: ...`, the expression `func('hi')` evaluates to type `str`, not `Literal['hi']` or `LiteralString`. However, if an additional constraint is present that requires a literal type solution, it will oblige. For example, `x: LiteralString = func("hi")` is fine, and so is `x: Literal['hi', 'bye'] = func("hi")`. Obviously, `x: Literal['bye'] = func("hi")` will produce an error.
I'll note that mypy and pyright differ somewhat when it comes to the handling of literals in their constraint solvers. You can read about those differences here: https://microsoft.github.io/pyright/#/mypy-comparison?id=constraint-solver-l... .
Apologies if that was more detail than you were looking for.
My basic advice for LiteralString is to simply treat it the same as you would any union of Literal strings.
--
Eric Traut Author of pyright _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: kmillikin@google.com
Hi Eric, Kevin Thanks a lot for the insights. And, honestly, the more details, the better. I'm glad you both went for a lengthier answer :) Thanks to your answers I figured that the problem to begin with is that our current support of Literal is quite limited in many areas. We narrow the type of immediate literal strings arguments from str to Literal['string_content'] at call sites where a Literal is expected (and also do the same for the standard collection literals), having a crude version of the bidirectional inference for arguments, but it doesn't go any further than that. For example Eric's example with returning a Literal['hi'] from a generic identity function wouldn't work properly in PyCharm. But because in practice the majority of Literal usages are in function parameter types and it's seldom expected as a return value type, we got away with this naive approach. StringLiteral, on the other hand, is designed to be propagated through "safe" functions and str methods, so something more involved such as the following from collections import deque def expects_deque_of_literal_string(xs: deque[LiteralString]): pass foo_string = 'foo' expects_deque_of_literal_string(deque([foo_string.upper()])) which is correctly typed in pyright, would require revising our constraint solving internals. But at least it's clear where to start now thanks to your hints. P.S. BTW pyright's documentation is just brilliant, kudos to Eric for maintaining it. On Mon, Jun 19, 2023 at 10:55 AM Kevin Millikin via Typing-sig < typing-sig@python.org> wrote:
As Eric suggests, there should be nothing special about `LiteralString`.
The PEP doesn't say it so clearly, but I think it's just the case that every `Literal[expr]` type where expr has type `str` is a subtype of `LiteralString`, and `LiteralString` is a subtype of `str`. Everything else is a consequence.
The issue with the list literals is a general issue with collection literals, not about LiteralString specifically. A list literal where the elements all have type T can actually have type `list[S]` for all types `S` s.t. T is a consistent subtype of S. So `['foo', 'bar']` can have type `list[Literal['foo'] | Literal['bar']]` and also `list[LiteralString]` and `list[str]` and `list[str | bytes]` and `list[Iterable[Any]]` and `list[object]`, etc. That is, ['foo', 'bar'] might be a list of objects that just happens to currently contain only strings.
But once you determine a type for that value, it cannot change. If you determine it's a list[object] then someone might put an int or something in it and you can't later say that it's actually a list of literal strings. And if you determine that it's a list[LiteralString] then that necessarily prevents it being used as a list[object].
Eric mentions bidirectional type inference, and that's something that you could try. If you do it for literals as you suggest, it will be "easy". If you do it for variables like `xs = ['foo']; ... xs ...` then you will have to find some way to consider all the contexts that `xs` occurs in.
Another idea that seems similar to bidirectional inference (I think) is to treat the type of ['foo', 'bar'] as involving an unknown type that is bounded by a subtype: list[Unknown > Literal['foo'] | Literal['bar']]. Then when you generate consistent subtype constraints for your example of couple(['foo'], [s]) you would have constraints:
list[Unknown > Literal['foo']] < T list[str] < T
And because `list` is invariant then you have a consistency ("equality) constraint that (Unknown > Literal['foo']) = str which you can solve by str.
On Sat, Jun 17, 2023 at 5:47 PM Eric Traut <eric@traut.com> wrote:
Hi Mikhail. I presume that PyCharm already has support for Literal, which was introduced in PEP 586. You can think of LiteralString as "the union of all possible literal strings". Unlike bool and enum literals, you can't feasibly enumerate all possible string literals, so this is conceptual rather than real. It's still a useful concept from the perspective of type checking because you can treat LiteralString just like any other union of string literals. To answer all of your questions above, you can just ask yourself "how do we handle Literal['a', 'b'] in this situation?". The answer should be the same for LiteralString.
In pyright, the default inference rules for list expressions always widen literals to their non-literal counterparts. To use your example `xs = ['foo', 'bar']`, pyright infers the type of `xs` as `list[str]`. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=literals. Mypy likewise infers `list[str]` in this case.
These default inference rules can be overridden through the use of bidirectional type inference (also referred to as an "expected type" or a "type context"). For example, `xs: list[LiteralString] = ['foo', 'bar']` overrides the default type inference rules for list expressions. For more details, refer to https://microsoft.github.io/pyright/#/type-inference?id=bidirectional-type-i... .
You also asked about constraint solving for type variables. Pyright's constraint solver does not produce literals unless they are required to meet the constraints. Normally, the solved type is widened to its non-literal counterpart. For example, if you have a function `def func(a: T) -> T: ...`, the expression `func('hi')` evaluates to type `str`, not `Literal['hi']` or `LiteralString`. However, if an additional constraint is present that requires a literal type solution, it will oblige. For example, `x: LiteralString = func("hi")` is fine, and so is `x: Literal['hi', 'bye'] = func("hi")`. Obviously, `x: Literal['bye'] = func("hi")` will produce an error.
I'll note that mypy and pyright differ somewhat when it comes to the handling of literals in their constraint solvers. You can read about those differences here: https://microsoft.github.io/pyright/#/mypy-comparison?id=constraint-solver-l... .
Apologies if that was more detail than you were looking for.
My basic advice for LiteralString is to simply treat it the same as you would any union of Literal strings.
--
Eric Traut Author of pyright _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: kmillikin@google.com
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mikhail.golubev@jetbrains.com
-- Mikhail Golubev Software Developer JetBrains http://www.jetbrains.com The drive to develop
participants (3)
-
Eric Traut
-
Kevin Millikin
-
Mikhail Golubev