[Python-ideas] Re: Inspired by Scala, a new syntax for Union type

30 Aug 2019

      Andrew Barnert wrote:
...
...
I never really understood the importance of
Optional. Often it can be left out altogether and in other cases I find
Union[T, None] more expressive (explicit) than Optional[T] (+
the latter saves only 3 chars).
Especially for people not familiar with typing, the meaning of Optional is
not obvious at first sight. Union[T, None] on the other hand is pretty clear.
Also in other cases, where the default (fallback) is different from None,
you'd have to use Union anyway. For example a function that normally returns
an object of type T but in some circumstances it cannot and then it returns
the reason as a str, i.e. -> Union[T, str];
Optional won't help here.
But this should be very rare.
Most functions that can return a fallback value return a fallback value of the expected
return type. For example, a get(key, default) method will return the default param, and
On Aug 29, 2019, at 16:03, Dominik Vilsmeier dominik.vilsmeier@gmx.de wrote:
the caller should pass in a default value of the type they’re expecting to look up. So,
this shouldn’t be get(key: KeyType, default: T) -> Union[ValueType, T], it should be
get(key: KeyType, default: ValueType) -> ValueType. Or maybe get(key: KeyType, default:
Optional[ValueType]=None) -> Optional[ValueType].
Most functions that want to explain why they failed do so by raising an exception, not
by returning a string.
And what other cases are there?
Well, I actually made this up, so I can't think of any other real cases either :-)
...
Of course you could be trying to add type checking to some weird legacy codebase that
doesn’t do things Pythonically, so you have to use Union returns. But that’s specific to
that one weird codebase.
Meanwhile, Optional return values are common all over Python.
Also, Python’s typing system is a lot easier to grasp if you’re familiar with an
established modern-typed language (Swift, Scala, Haskell, F#, etc.), and they also use
Optional[T] (or optional<T> or Maybe t or some other spelling of the same idea) all
over be place—so often that many of them have added shortcuts like T? to make it easier to
write and less intrusive to read.
I don't have experience in any of these languages (basically I'm self-taught Python), so I learned it mostly from the docs (also `Optional`). That doesn't necessarily imply understanding the importance of the concept, but I acknowledge that `Optional[T]` is much easier to read than `Union[T, None]`; the former has less visual overhead and it reads more like "natural" language, so once you combine this with the fact that functions return `None` when they don't hit a `return` statement (or the convention of explicitly putting `return None` at the end), the meaning of `Optional[T]` becomes more clear.
...
I think there may be a gap in the docs. They make perfect sense to someone with
experience in one of those languages, but a team that has nobody with that experience
might be a little lost. There’s a mile-high overview, a theory paper, and then basically
just reference docs that expect you to already know all the key concepts that you don’t
already know. Maybe that’s something that an outsider who’s trying to learn from the docs
plus trial and error could help improve?
...
Scanning through the docs and PEP I can't find
strongly motivating examples for Optional (over Union[T, None]).
E.g. in the following:
def lookup(self, name: str) -> Optional[Node]:
       nodes = self.get(name)
       if nodes:
           return nodes[-1]
       return None
I would rather write Union[Node, None] because that's much more explicit
about what happens.
Then introducing ~T in place of Optional[T] just further
obfuscates the meaning of the code:
def lookup(self, name: str) -> ~Node:
The ~ is easy to be missed (at least by human readers) and the meaning not
obvious.
That’s kind of funny, because I had to read your Union[Node, None] a couple times
before I realized you hadn’t written Union[Node, Node]. :)
I had a similar thought when writing this, so I get the point. I'm not arguing against `Optional` I just think it's less self-explanatory than `Union[T, None]` when you see it for the first time and if you're not familiar with the concept in general. But that doesn't mean you shouldn't familiarize yourself with it :-)
...
I do dislike ~ for other reasons (but I already mentioned them, Guido isn’t convinced,
so… fine, I don’t hate it that much). But I don’t think ~ is easy to miss. It’s not like a
period or backtick that can be mistaken for grit on your screen; it’s more visible than
things like - that everyone expects to be able to pick out.
As I mentioned in my other relpy to Guido, patterns like -341 are easily recognizable as a negative number (i.e. you won't miss the `-`) because our brains are accustomed to seeing it. ~Noun on the other hand is not something you're likely to encounter in everyday language and thus it is an unfamiliar pattern. Noun? on the other hand is easily recognizable. Regarding the meaning, `T?` should be pretty clear (read as "maybe T", i.e. maybe you hit a return statement with `T` and if not it's going to be `None` by default); for `~` on the other hand I'm not aware of any meaning in natural language. I did a bit of internet search for symbols representing "optional" but I couldn't find any (e.g. none of the icon websites I tried gave satisfying results, or any results at all). Also the guys over at ux.stackexchange seem to agree that the only way to mark something optional is to write "optional" (https://ux.stackexchange.com/q/102930, https://ux.stackexchange.com/q/9684). Python code reads very natural, but I'm not convinced `~` would add to that; it's rather a step away.
Personally, for me that's not important, I'm more of the style "look-up the docs and learn from there" rather than relying on my intuition of the meaning of something. But from other discussions on this list I had the impression that Python wants to keep possible confusions to a minimum, especially for newcomers (I remember the discussion about `while ... except` , with the main argument against, that this syntax could easily be confused). With `~` there probably won't be a confusion in that sense, but someone reading it for the first time will definitely need to look it up (which is fine i.m.o.).
...
...
For Union on the other hand it would be
more helpful to have a shorter syntax, int | str seems pretty clear, but what
prevents tuples (int, str) from being interpreted as unions by type checkers.
This doesn't require any changes to the built-in types and it is aligned with the already
existing syntax for checking multiple types with isinstance or
issubclass: isinstance(x, (int, str)). Having used this a couple
of times, whenever I see a tuple of types I immediately think of them as or
options.
The biggest problem with tuple is that in every other language with a similar
type system, (int, str) means Tuple[int, str].
I think {int, str}, which someone proposed in one of the earlier discussions, is nice.
What else would a set of types mean (unless you’re doing mathematical type theory rather
than programming language typing)? But it’s unfortunate that things like isinstance and
except take a tuple of types (and it has to be a tuple, not any other kind of iterable),
so a set might be just as confusing for hardcore Python types as a tuple would be for
polyglots.
The possible confusion with `Tuple[x, y]` is a strong counter-argument, but as you mention, `{int, str}` doesn't have this particular problem. The unfortunate part about `isinstance` is that it takes _only_ a tuple and not any kind of collection of types.
...
If the compatibility issue isn’t a big deal (and I trust Guido that is isn’t), I think
int | str is the best option. It’a an operator that means union, it’s used for sum/union
types in other languages, it makes perfect sense if you read it as “int or str”… I cant
imagine anyone being confused or put off by it.
I also like the `int | str` syntax, and I can't imagine that it will cause any kind of confusion. One difference about `int | str` and `{int, str}` however is that successfully interpreting the meaning of the ` | ` syntax likely requires a context while for the `{ }` syntax it is clear that it defers the interpretation to whatever context it is used in. For example:

    def foo(x: int | str):
    def foo(x: {int, str}):

Here it's pretty clear that both versions indicate multiple type options.

However when reading something like:

    x = int | str

it's not immediately clear what this means and what `x` actually is (or represents). Probably `x` is used in type annotations later on but someone reading this statement (and maybe being unfamiliar with typing) could also assume something of the following:

1. Some kind of type chain with fallbacks, so that you can do `x(2.0) == 2` with a fallback on `str`: `x('foo') == 'foo'`.
2. (shell style) Some kind of compound type, piping the output from `int` to `str`: `x(2.3) == str(int(2.3)) == '2'`.

Only if you're familiar with `__or__`'ing types or you see this in a typing context it becomes clear that this means a type union.

On the other hand `{int, str}` is just a collection of types, nothing more, no further meaning attached to it. Whatever meaning is eventually assigned to such a type collection is deferred to the context that uses it, e.g. type annotations or usage with `isinstance`. Similar for `x = (int, str)`, this can be used as `ininstance(foo, x)` or `type_chain(foo, x)` or `type_pipe(foo, x)`. Here it's the functions that give meaning to the type collection, not the collection itself.