[Python-ideas] "Sum" Type hinting [was: Type hinting for path-related functions]

Guido van Rossum guido at python.org
Sat May 14 13:27:48 EDT 2016

I'm confused by the terminology in this discussion. PEP 484 has type
variables with constraints, and the typing module predefines AnyStr =
TypeVar('AnyStr', str, bytes). This is then used to define various
polymorphic functions, e.g. in os.path:

def abspath(path: AnyStr) -> AnyStr: ...

This means that the type checker will know that the type of abspath(b'.')
is bytes, the type of abspath('.') is str.

There's also e.g.

def relpath(path: AnyStr, start: AnyStr = ...) -> AnyStr: ...

which means that path and start must be either both bytes or both str.

For reference, this code lives in typeshed:

Is that at all useful?

On Fri, May 13, 2016 at 10:28 PM, Stephen J. Turnbull <stephen at xemacs.org>

> Executive summary:
> Inference of str vs bytes (in particular, which component of a "sum"
> type in general) is not always possible, and it's not obvious how
> beneficial it is.  Proposal: have typecheckers recognize "suspicious"
> unions, and complain if they are not "converted" to a component type
> before passing to a polymorphic function.
> Koos Zevenhoven writes:
>  > We'll need to address this, unless we want the type checker to not
>  > know whether os.path.* etc. return str or bytes and to carry around
>  > Union[str, bytes]. In theory, it would be possible to infer whether it
>  > is str or bytes, as described.
> I'm -0.5 = "more trouble than it's worth".  Rationale:
> *Sometimes* it would be possible.  OTOH, modules like os.path *will*
> end up carrying around Union[str, bytes], precisely because they will
> be called in both contexts, and that's not predictable at the time of
> type-checking os.path.
> So what you really want is a Sum type constructor, where Sum is in the
> sense of category theory's sum functor.  That is, not only is the
> result Type a (disjoint) union, but you're also guaranteed that each
> operation respects the component type.
> I could argue that this is why Guido was right to remove the
> restriction that os.fspath return str.  The __fspath__ method is part
> of the "sum category os.path"[1], which is a collection of operations
> on bytes and on str that parallel each other because they are
> equivalent representations of filesystem paths.  os.fspath therefore
> is also part of this category.  If you want to take the result out of
> this category and make it str, use os.fsdecode (which is *not* part of
> this category *because* the codomain is str, not Sum[bytes,str]).
> Note that Union is still a useful type if we have Sum.  For example,
> the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) ->
> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does,
> although in Python it doesn't).  So the "sum" requirement that the map
> respect "original" type is a language design question, not a
> restriction we should apply for the fun of it.
> Now, how would the type checker treat __fspath__?  Borrowing a type
> from Ethan, it needs to observe that there was bytes object in the
> main function that got passed to the Antipathy Sum[str,bytes] -> Path
> constructor.  It would then tag that Path p as a bytes-y Path, and
> further tag the value of os.fspath(p) as bytes.  So we would need to
> add the concept of a ComponentType-y object of Type SumType to the
> typing module.  That sounds complicated to me, and typically not all
> that useful: people living at the level of bytes will be swimming in
> bytes, ditto str-people.  Polymorphism "just works" for them.  People
> working at the interface will be using functions like os.fsdecode that
> ensure type in the nature of things.  They have to be more careful,
> but use of a Union[str, bytes] as an argument to foo(b: bytes) will
> already be flagged.
> I think TRT here is to provide a way to tell the type checker that
> certain Unions should be converted as soon as possible, and not allow
> passing them even to polymorphic functions.  So:
>     def foo(d: DirEntry) -> Int:
>         s = os.fspath(d)
>         with open(s) as f:      # Flag use of s as Union[str, bytes].
>                                 # open() doesn't care, but the type
>                                 # checker knows that such Unions
>                                 # should only be passed to conversion
>                                 # functions.
>             return len(f)
> while
>     def foo(d: DirEntry) -> Int:
>         s = os.fspath(d)
>         s = os.fsdecode(s)      # Converted to str immediately, clear since
>                                 # fsdecode Type is "SuspiciousUnion ->
> str".
>         with open(s) as f:
>             return len(f)
> is OK.  This is on the "catch bugs early" principle, and respects the
> generally accepted principle that in "most" applications, encoded
> bytes should be decoded to str "at the boundary".  Of course this
> would be optional (eg, it would be off in a standalone check of
> os.path for type sanity), and not all Unions are subject to this
> treatment (eg, it would make no sense for numbers).
> On the other hand, *inside* a function that is respectfully
> polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes]
> would tell the typechecker to ensure that in foo's implementation, if
> bytes go in, bytes come out, if str goes in, str comes out.  That
> might be very useful (but I don't have a strong opinion).
> A nit: what would len(s: Sum[str, bytes]) -> Int mean?  In this case
> it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm
> not sure it's all that obvious in other possible cases.
> Finally, in many cases Sum types just are not going to be useful.  You
> might want to alias Number to Sum[Float, Int] because (in Python)
> Float + Float -> Float and Int + Int -> Int, but then you have two
> problems.  First, you can't do inference on mixed-type arithmetic, Sum
> doesn't permit that.[2]  Second, you can't do inference on Int / Int.
> Footnotes:
> [1]  The quotes mean "I think I could make this precise but it would
> be boring.  Bear with me, and don't be distracted by the fact that
> lots of stuff in this 'category' aren't in the os.path *module* -- eg,
> open() and __fspath__ itself."  I hope that the categorical language
> will be a useful metaphor for those who understand category theory,
> but really the whole thing is a hand-wavy path to "what implementation
> would look like".
> [2]  This is also a handwavy statement.  Technically, the arithmetic
> we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could
> do anything with mixed types, as Sum can only provide restrictions on
> unary: Sum -> Sum operation.  So a more accurate way to put the point
> is that it's not really obvious what properties Tuple[Sum, Sum] should
> have, especially in the context of a function whose value is of the
> same Sum Type.  And of course the problem with Int/Int is real -- it's
> hard to imagine how that could be handled by a general rule about
> Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type,
> the value should be of that type!  But we decided otherwise when
> designing Python 3.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160514/6cf5ea9a/attachment.html>

More information about the Python-ideas mailing list