[Python-ideas] "Sum" Type hinting [was: Type hinting for path-related functions]
Stephen J. Turnbull
stephen at xemacs.org
Sat May 14 01:28:51 EDT 2016
Executive summary:
Inference of str vs bytes (in particular, which component of a "sum"
type in general) is not always possible, and it's not obvious how
beneficial it is. Proposal: have typecheckers recognize "suspicious"
unions, and complain if they are not "converted" to a component type
before passing to a polymorphic function.
Koos Zevenhoven writes:
> We'll need to address this, unless we want the type checker to not
> know whether os.path.* etc. return str or bytes and to carry around
> Union[str, bytes]. In theory, it would be possible to infer whether it
> is str or bytes, as described.
I'm -0.5 = "more trouble than it's worth". Rationale:
*Sometimes* it would be possible. OTOH, modules like os.path *will*
end up carrying around Union[str, bytes], precisely because they will
be called in both contexts, and that's not predictable at the time of
type-checking os.path.
So what you really want is a Sum type constructor, where Sum is in the
sense of category theory's sum functor. That is, not only is the
result Type a (disjoint) union, but you're also guaranteed that each
operation respects the component type.
I could argue that this is why Guido was right to remove the
restriction that os.fspath return str. The __fspath__ method is part
of the "sum category os.path"[1], which is a collection of operations
on bytes and on str that parallel each other because they are
equivalent representations of filesystem paths. os.fspath therefore
is also part of this category. If you want to take the result out of
this category and make it str, use os.fsdecode (which is *not* part of
this category *because* the codomain is str, not Sum[bytes,str]).
Note that Union is still a useful type if we have Sum. For example,
the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) ->
Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does,
although in Python it doesn't). So the "sum" requirement that the map
respect "original" type is a language design question, not a
restriction we should apply for the fun of it.
Now, how would the type checker treat __fspath__? Borrowing a type
from Ethan, it needs to observe that there was bytes object in the
main function that got passed to the Antipathy Sum[str,bytes] -> Path
constructor. It would then tag that Path p as a bytes-y Path, and
further tag the value of os.fspath(p) as bytes. So we would need to
add the concept of a ComponentType-y object of Type SumType to the
typing module. That sounds complicated to me, and typically not all
that useful: people living at the level of bytes will be swimming in
bytes, ditto str-people. Polymorphism "just works" for them. People
working at the interface will be using functions like os.fsdecode that
ensure type in the nature of things. They have to be more careful,
but use of a Union[str, bytes] as an argument to foo(b: bytes) will
already be flagged.
I think TRT here is to provide a way to tell the type checker that
certain Unions should be converted as soon as possible, and not allow
passing them even to polymorphic functions. So:
def foo(d: DirEntry) -> Int:
s = os.fspath(d)
with open(s) as f: # Flag use of s as Union[str, bytes].
# open() doesn't care, but the type
# checker knows that such Unions
# should only be passed to conversion
# functions.
return len(f)
while
def foo(d: DirEntry) -> Int:
s = os.fspath(d)
s = os.fsdecode(s) # Converted to str immediately, clear since
# fsdecode Type is "SuspiciousUnion -> str".
with open(s) as f:
return len(f)
is OK. This is on the "catch bugs early" principle, and respects the
generally accepted principle that in "most" applications, encoded
bytes should be decoded to str "at the boundary". Of course this
would be optional (eg, it would be off in a standalone check of
os.path for type sanity), and not all Unions are subject to this
treatment (eg, it would make no sense for numbers).
On the other hand, *inside* a function that is respectfully
polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes]
would tell the typechecker to ensure that in foo's implementation, if
bytes go in, bytes come out, if str goes in, str comes out. That
might be very useful (but I don't have a strong opinion).
A nit: what would len(s: Sum[str, bytes]) -> Int mean? In this case
it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm
not sure it's all that obvious in other possible cases.
Finally, in many cases Sum types just are not going to be useful. You
might want to alias Number to Sum[Float, Int] because (in Python)
Float + Float -> Float and Int + Int -> Int, but then you have two
problems. First, you can't do inference on mixed-type arithmetic, Sum
doesn't permit that.[2] Second, you can't do inference on Int / Int.
Footnotes:
[1] The quotes mean "I think I could make this precise but it would
be boring. Bear with me, and don't be distracted by the fact that
lots of stuff in this 'category' aren't in the os.path *module* -- eg,
open() and __fspath__ itself." I hope that the categorical language
will be a useful metaphor for those who understand category theory,
but really the whole thing is a hand-wavy path to "what implementation
would look like".
[2] This is also a handwavy statement. Technically, the arithmetic
we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could
do anything with mixed types, as Sum can only provide restrictions on
unary: Sum -> Sum operation. So a more accurate way to put the point
is that it's not really obvious what properties Tuple[Sum, Sum] should
have, especially in the context of a function whose value is of the
same Sum Type. And of course the problem with Int/Int is real -- it's
hard to imagine how that could be handled by a general rule about
Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type,
the value should be of that type! But we decided otherwise when
designing Python 3.
More information about the Python-ideas
mailing list