[Python-ideas] "Sum" Type hinting [was: Type hinting for path-related functions]
Guido van Rossum
guido at python.org
Sat May 14 13:27:48 EDT 2016
I'm confused by the terminology in this discussion. PEP 484 has type
variables with constraints, and the typing module predefines AnyStr =
TypeVar('AnyStr', str, bytes). This is then used to define various
polymorphic functions, e.g. in os.path:
def abspath(path: AnyStr) -> AnyStr: ...
This means that the type checker will know that the type of abspath(b'.')
is bytes, the type of abspath('.') is str.
There's also e.g.
def relpath(path: AnyStr, start: AnyStr = ...) -> AnyStr: ...
which means that path and start must be either both bytes or both str.
For reference, this code lives in typeshed:
https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi
Is that at all useful?
On Fri, May 13, 2016 at 10:28 PM, Stephen J. Turnbull <stephen at xemacs.org>
wrote:
> Executive summary:
>
> Inference of str vs bytes (in particular, which component of a "sum"
> type in general) is not always possible, and it's not obvious how
> beneficial it is. Proposal: have typecheckers recognize "suspicious"
> unions, and complain if they are not "converted" to a component type
> before passing to a polymorphic function.
>
> Koos Zevenhoven writes:
>
> > We'll need to address this, unless we want the type checker to not
> > know whether os.path.* etc. return str or bytes and to carry around
> > Union[str, bytes]. In theory, it would be possible to infer whether it
> > is str or bytes, as described.
>
> I'm -0.5 = "more trouble than it's worth". Rationale:
>
> *Sometimes* it would be possible. OTOH, modules like os.path *will*
> end up carrying around Union[str, bytes], precisely because they will
> be called in both contexts, and that's not predictable at the time of
> type-checking os.path.
>
> So what you really want is a Sum type constructor, where Sum is in the
> sense of category theory's sum functor. That is, not only is the
> result Type a (disjoint) union, but you're also guaranteed that each
> operation respects the component type.
>
> I could argue that this is why Guido was right to remove the
> restriction that os.fspath return str. The __fspath__ method is part
> of the "sum category os.path"[1], which is a collection of operations
> on bytes and on str that parallel each other because they are
> equivalent representations of filesystem paths. os.fspath therefore
> is also part of this category. If you want to take the result out of
> this category and make it str, use os.fsdecode (which is *not* part of
> this category *because* the codomain is str, not Sum[bytes,str]).
>
> Note that Union is still a useful type if we have Sum. For example,
> the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) ->
> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does,
> although in Python it doesn't). So the "sum" requirement that the map
> respect "original" type is a language design question, not a
> restriction we should apply for the fun of it.
>
> Now, how would the type checker treat __fspath__? Borrowing a type
> from Ethan, it needs to observe that there was bytes object in the
> main function that got passed to the Antipathy Sum[str,bytes] -> Path
> constructor. It would then tag that Path p as a bytes-y Path, and
> further tag the value of os.fspath(p) as bytes. So we would need to
> add the concept of a ComponentType-y object of Type SumType to the
> typing module. That sounds complicated to me, and typically not all
> that useful: people living at the level of bytes will be swimming in
> bytes, ditto str-people. Polymorphism "just works" for them. People
> working at the interface will be using functions like os.fsdecode that
> ensure type in the nature of things. They have to be more careful,
> but use of a Union[str, bytes] as an argument to foo(b: bytes) will
> already be flagged.
>
> I think TRT here is to provide a way to tell the type checker that
> certain Unions should be converted as soon as possible, and not allow
> passing them even to polymorphic functions. So:
>
> def foo(d: DirEntry) -> Int:
> s = os.fspath(d)
> with open(s) as f: # Flag use of s as Union[str, bytes].
> # open() doesn't care, but the type
> # checker knows that such Unions
> # should only be passed to conversion
> # functions.
> return len(f)
>
> while
>
> def foo(d: DirEntry) -> Int:
> s = os.fspath(d)
> s = os.fsdecode(s) # Converted to str immediately, clear since
> # fsdecode Type is "SuspiciousUnion ->
> str".
> with open(s) as f:
> return len(f)
>
> is OK. This is on the "catch bugs early" principle, and respects the
> generally accepted principle that in "most" applications, encoded
> bytes should be decoded to str "at the boundary". Of course this
> would be optional (eg, it would be off in a standalone check of
> os.path for type sanity), and not all Unions are subject to this
> treatment (eg, it would make no sense for numbers).
>
> On the other hand, *inside* a function that is respectfully
> polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes]
> would tell the typechecker to ensure that in foo's implementation, if
> bytes go in, bytes come out, if str goes in, str comes out. That
> might be very useful (but I don't have a strong opinion).
>
> A nit: what would len(s: Sum[str, bytes]) -> Int mean? In this case
> it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm
> not sure it's all that obvious in other possible cases.
>
> Finally, in many cases Sum types just are not going to be useful. You
> might want to alias Number to Sum[Float, Int] because (in Python)
> Float + Float -> Float and Int + Int -> Int, but then you have two
> problems. First, you can't do inference on mixed-type arithmetic, Sum
> doesn't permit that.[2] Second, you can't do inference on Int / Int.
>
>
> Footnotes:
> [1] The quotes mean "I think I could make this precise but it would
> be boring. Bear with me, and don't be distracted by the fact that
> lots of stuff in this 'category' aren't in the os.path *module* -- eg,
> open() and __fspath__ itself." I hope that the categorical language
> will be a useful metaphor for those who understand category theory,
> but really the whole thing is a hand-wavy path to "what implementation
> would look like".
>
> [2] This is also a handwavy statement. Technically, the arithmetic
> we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could
> do anything with mixed types, as Sum can only provide restrictions on
> unary: Sum -> Sum operation. So a more accurate way to put the point
> is that it's not really obvious what properties Tuple[Sum, Sum] should
> have, especially in the context of a function whose value is of the
> same Sum Type. And of course the problem with Int/Int is real -- it's
> hard to imagine how that could be handled by a general rule about
> Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type,
> the value should be of that type! But we decided otherwise when
> designing Python 3.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160514/6cf5ea9a/attachment.html>
More information about the Python-ideas
mailing list