are unions ordered?

Ran into an interesting user issue when unifying Union[T, List[T]] against List[str]. The binding of T depends on whether you express the type as Union[T, List[T]] or Union[List[T], T] since str matches List[str]. We've been doing the simplest thing and matching against the types in the order they appear in the union until the first one matches, but now I'm wondering if unions are conceptually unordered and we should use a smarter algorithm. martin

On Fri, Jul 08, 2022 at 05:33:46PM -0700, Martin DeMello via Typing-sig wrote:
Ran into an interesting user issue when unifying Union[T, List[T]] against List[str]. The binding of T depends on whether you express the type as Union[T, List[T]] or Union[List[T], T] since str matches List[str]. We've been doing the simplest thing and matching against the types in the order they appear in the union until the first one matches, but now I'm wondering if unions are conceptually unordered and we should use a smarter algorithm.
Hi Martin, Sorry if I am missing something here, since I lack context (who is "we" in your post?) but the documentation is clear that argument union is ignored: Union[S, T] = Union[T, S] https://docs.python.org/3/library/typing.html#typing.Union I don't understand what you mean by str matching List[str]. mypy does not do that, Given example.py: ``` from typing import List def f(arg: List[str]) -> None: pass def g(arg: str) -> None: pass f("hello") g(["a", "b"]) ``` mypy reports both errors correctly: ``` example.py:10: error: Argument 1 to "f" has incompatible type "str"; expected "List[str]" example.py:11: error: Argument 1 to "g" has incompatible type "List[str]"; expected "str" Found 2 errors in 1 file (checked 1 source file) ``` -- Steve

I'm not familiar with any documentation (in PEP 484 or elsewhere) that explicitly states that unions in the Python type system are unordered, but I have made that assumption in Pyright. I think mypy makes a similar assumption. In type theory, it makes sense that unions would be order-independent and fully commutative — i.e. `X | Y` should be equivalent to `Y | X`. There are a few places where this comes up. One is when determining whether two types are compatible, especially when a union is used for the specialization of an invariant type parameter. I think we can agree that the following is type safe and should be allowed. ```python def func(a: list[str | int]): b: list[int | str] = a ``` Another place where union ordering becomes important is in the constraint solver. Martin, I'm guessing this is the case that you're referring to? The way I solve the union ordering problem in the constraint solver in Pyright is to attempt type matches for each subtype within the union and "score" each successful match. The score represents the "simplicity" of the solution. I choose the one with the simplest solution. For example: ```python from typing import TypeVar T = TypeVar("T") def func1(a: T | list[T]) -> T: ... def func2(a: list[T] | T) -> T: ... def test(val: list[int]): reveal_type(func1(val)) # int reveal_type(func2(val)) # int ``` In this example, the constraint solver attempts to match `list[int]` against both `list[T]` and `T`. Both of these are valid solutions given the constraints, but the simplest solution is `T = int` (versus `T = list[int]`). The "simplicity calculation" is a heuristic, and I've needed to tune it over time to achieve the best results, but it seems to work well and produce deterministic results that do not depend on union ordering — unless two solutions happen to result in exactly the same simplicity score. Interestingly, mypy emits false positive errors for the above example. -Eric -- Eric Traut Contributor to Pyright & Pylance Microsoft

Regarding Eric's example, consider: ``` def func1(a: T | list[T]) -> T: if a and isinstance(a, list): return a[0] return a ``` That seems like a valid, correctly typed function, so I'm surprised that a type checker would infer that the return type when called with a list[int] would be int, when it may return the empty list. Naively I would have assumed an inference of `list[INT] | int` (in either order), since that's all that can be proved from the types + context. Danny On Sat, 9 Jul 2022, 08:39 Eric Traut, <eric@traut.com> wrote:
I'm not familiar with any documentation (in PEP 484 or elsewhere) that explicitly states that unions in the Python type system are unordered, but I have made that assumption in Pyright. I think mypy makes a similar assumption.
In type theory, it makes sense that unions would be order-independent and fully commutative — i.e. `X | Y` should be equivalent to `Y | X`.
There are a few places where this comes up. One is when determining whether two types are compatible, especially when a union is used for the specialization of an invariant type parameter. I think we can agree that the following is type safe and should be allowed.
```python def func(a: list[str | int]): b: list[int | str] = a ```
Another place where union ordering becomes important is in the constraint solver. Martin, I'm guessing this is the case that you're referring to?
The way I solve the union ordering problem in the constraint solver in Pyright is to attempt type matches for each subtype within the union and "score" each successful match. The score represents the "simplicity" of the solution. I choose the one with the simplest solution. For example:
```python from typing import TypeVar
T = TypeVar("T")
def func1(a: T | list[T]) -> T: ... def func2(a: list[T] | T) -> T: ...
def test(val: list[int]): reveal_type(func1(val)) # int reveal_type(func2(val)) # int ```
In this example, the constraint solver attempts to match `list[int]` against both `list[T]` and `T`. Both of these are valid solutions given the constraints, but the simplest solution is `T = int` (versus `T = list[int]`). The "simplicity calculation" is a heuristic, and I've needed to tune it over time to achieve the best results, but it seems to work well and produce deterministic results that do not depend on union ordering — unless two solutions happen to result in exactly the same simplicity score.
Interestingly, mypy emits false positive errors for the above example.
-Eric
-- Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: diabolo.dan@gmail.com

A case I think is worth consideration: the runtime use of type hints for serialization/deserialization of values. If deserializing a value that's annotated as `list[str | int]`, if UnionType argument order is applied, every deserialized list element would resolve to a `str` type. This is in fact how a library I maintain currently performs deserialization. To date, for cases like this I've recommended the order be switched so as to avoid every element falling into the catch-all `str` deserialization. On Sat, 2022-07-09 at 07:39 +0000, Eric Traut wrote:
I'm not familiar with any documentation (in PEP 484 or elsewhere) that explicitly states that unions in the Python type system are unordered, but I have made that assumption in Pyright. I think mypy makes a similar assumption.
In type theory, it makes sense that unions would be order-independent and fully commutative — i.e. `X | Y` should be equivalent to `Y | X`.
There are a few places where this comes up. One is when determining whether two types are compatible, especially when a union is used for the specialization of an invariant type parameter. I think we can agree that the following is type safe and should be allowed.
```python def func(a: list[str | int]): b: list[int | str] = a ```
Another place where union ordering becomes important is in the constraint solver. Martin, I'm guessing this is the case that you're referring to?
The way I solve the union ordering problem in the constraint solver in Pyright is to attempt type matches for each subtype within the union and "score" each successful match. The score represents the "simplicity" of the solution. I choose the one with the simplest solution. For example:
```python from typing import TypeVar
T = TypeVar("T")
def func1(a: T | list[T]) -> T: ... def func2(a: list[T] | T) -> T: ...
def test(val: list[int]): reveal_type(func1(val)) # int reveal_type(func2(val)) # int ```
In this example, the constraint solver attempts to match `list[int]` against both `list[T]` and `T`. Both of these are valid solutions given the constraints, but the simplest solution is `T = int` (versus `T = list[int]`). The "simplicity calculation" is a heuristic, and I've needed to tune it over time to achieve the best results, but it seems to work well and produce deterministic results that do not depend on union ordering — unless two solutions happen to result in exactly the same simplicity score.
Interestingly, mypy emits false positive errors for the above example.
-Eric
-- Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: pbryan@anode.ca

Hi All, FWIW, pydantic considers unions to be "somewhat ordered" - in other words we try hard to identify the best union choice to validate against regardless of order, but in the absence of an alternative we have to eventually try union choices in the order they're defined in the union. Example of what this means: Example 1 Union[str, int]: - input "x" -> output "x" - input 123 -> output 123 - input "123" -> output "123" Here, even if we're in "lax mode" and "123" could be legitimately coerced to the number 123 we can infer the best option, in all these cases Union[str, int] would be the same as Union[int, str]. Example 2 Union[int, bool]: - Input True -> output "True" - input 1 -> output 1 - input "1" -> output 1 Union[bool, int]: - Input True -> output "True" - input 1 -> output 1 - input "1" -> output True Note in the last case the string "1" can be coerced to both the int 1 and the bool True, thus the output differs depending on the order of the union choices. While you might (probably will) disagree and disapprove of the "lax mode" coercion. I hope you'll agree that given we have it, there's no option but to try choices in the order they're defined. AFAIK, in strict mode, Union behaviour is independent of order, but I could be wrong. (Note some of these cases refer to the as yet unreleased Pydantic V2, and pydantic-core. Pydantic V1 has some unfortunate rough edges). Samuel -- Samuel Colvin On Sat, 9 Jul 2022 at 18:03, Paul Bryan <pbryan@anode.ca> wrote:
A case I think is worth consideration: the runtime use of type hints for serialization/deserialization of values.
If deserializing a value that's annotated as `list[str | int]`, if UnionType argument order is applied, every deserialized list element would resolve to a `str` type. This is in fact how a library I maintain currently performs deserialization. To date, for cases like this I've recommended the order be switched so as to avoid every element falling into the catch-all `str` deserialization.
On Sat, 2022-07-09 at 07:39 +0000, Eric Traut wrote:
I'm not familiar with any documentation (in PEP 484 or elsewhere) that explicitly states that unions in the Python type system are unordered, but I have made that assumption in Pyright. I think mypy makes a similar assumption.
In type theory, it makes sense that unions would be order-independent and fully commutative — i.e. `X | Y` should be equivalent to `Y | X`.
There are a few places where this comes up. One is when determining whether two types are compatible, especially when a union is used for the specialization of an invariant type parameter. I think we can agree that the following is type safe and should be allowed.
```python def func(a: list[str | int]): b: list[int | str] = a ```
Another place where union ordering becomes important is in the constraint solver. Martin, I'm guessing this is the case that you're referring to?
The way I solve the union ordering problem in the constraint solver in Pyright is to attempt type matches for each subtype within the union and "score" each successful match. The score represents the "simplicity" of the solution. I choose the one with the simplest solution. For example:
```python from typing import TypeVar
T = TypeVar("T")
def func1(a: T | list[T]) -> T: ... def func2(a: list[T] | T) -> T: ...
def test(val: list[int]): reveal_type(func1(val)) # int reveal_type(func2(val)) # int ```
In this example, the constraint solver attempts to match `list[int]` against both `list[T]` and `T`. Both of these are valid solutions given the constraints, but the simplest solution is `T = int` (versus `T = list[int]`). The "simplicity calculation" is a heuristic, and I've needed to tune it over time to achieve the best results, but it seems to work well and produce deterministic results that do not depend on union ordering — unless two solutions happen to result in exactly the same simplicity score.
Interestingly, mypy emits false positive errors for the above example.
-Eric
-- Eric Traut Contributor to Pyright & Pylance Microsoft _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: pbryan@anode.ca
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: samcolvin@gmail.com

Also working on runtime serialization/deserialization library I treat unions as unordered and either require it to be unambiguous which type is right or include a marker in serialization. My serialization format is json and as an example @configurable class Foo: x: int | str serializes to {"x": 1} or {"x": "1"} where type is distinguishable either way. If instead you have @configurable class Foo: x: list[int] | tuple[int, ...] I serialize roughly to {"type": "list[int]", "value": [1]}. Deserializing similarly requires type to be given whenever it is ambiguous which to pick. For most types it's distinguishable with serialization rules I use so it's somewhat rare to need "type". The one place where I do commonly need it is for polymorphic serialization/deserialization.

Sorry, my message was a bit unclear. Eric has it right, I was talking about pytype's constraint solver, where unifying T | List[T] against List[str] gives us T = List[str] and List[T] = List[List[str]], but just because we considered the T first. I was wondering whether the other typecheckers also just considered the options in the order they were given, but I like Eric's description of a simplicity score. martin On Fri, Jul 8, 2022 at 7:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 08, 2022 at 05:33:46PM -0700, Martin DeMello via Typing-sig wrote:
Ran into an interesting user issue when unifying Union[T, List[T]] against List[str]. The binding of T depends on whether you express the type as Union[T, List[T]] or Union[List[T], T] since str matches List[str]. We've been doing the simplest thing and matching against the types in the order they appear in the union until the first one matches, but now I'm wondering if unions are conceptually unordered and we should use a smarter algorithm.
Hi Martin,
Sorry if I am missing something here, since I lack context (who is "we" in your post?) but the documentation is clear that argument union is ignored:
Union[S, T] = Union[T, S]
https://docs.python.org/3/library/typing.html#typing.Union
I don't understand what you mean by str matching List[str]. mypy does not do that, Given example.py:
``` from typing import List
def f(arg: List[str]) -> None: pass
def g(arg: str) -> None: pass
f("hello") g(["a", "b"]) ``` mypy reports both errors correctly:
``` example.py:10: error: Argument 1 to "f" has incompatible type "str"; expected "List[str]" example.py:11: error: Argument 1 to "g" has incompatible type "List[str]"; expected "str" Found 2 errors in 1 file (checked 1 source file) ```
-- Steve _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mdemello@google.com

Pyre treats unions as unordered, and I agree with Eric that it makes sense that unions should be order-independent and fully commutative, at least in static type checkers. Regarding the other issue of ambiguity, I believe currently Pyre tends to prefer the "simple" solution as well, where simplicity is measured with heuristics. But my general view on this is that in theory, both `T = List[str]` and `T = List[List[str]]` should be valid solutions in the aforementioned example. From type checking perspective, It does not really matter which solution the type checker "prefers", unless there exist downstream operations that put additional constraints on `T` (e.g. having `x: T = ...; y: str = x[0][0]` would rule out `List[str]`). - Jia
participants (8)
-
Daniel Cooper
-
Eric Traut
-
Jia Chen
-
Martin DeMello
-
Mehdi Drissi
-
Paul Bryan
-
Samuel Colvin
-
Steven D'Aprano