[Python-ideas] "Sum" Type hinting [was: Type hinting for path-related functions]

Koos Zevenhoven k7hoven at gmail.com
Sun May 15 06:02:29 EDT 2016

On Sat, May 14, 2016 at 8:28 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Executive summary:
> Inference of str vs bytes (in particular, which component of a "sum"
> type in general) is not always possible, and it's not obvious how
> beneficial it is.  Proposal: have typecheckers recognize "suspicious"
> unions, and complain if they are not "converted" to a component type
> before passing to a polymorphic function.

Maybe you mean to propose that the concept of sum types should be
introduced in typing/mypy. I don't think we want that, because
annotating types with them would probably be clumsy, and there is too
much overlap with things that are already available (unless of course
some of that gets thrown away in favor of Sum).

> Koos Zevenhoven writes:
>  > We'll need to address this, unless we want the type checker to not
>  > know whether os.path.* etc. return str or bytes and to carry around
>  > Union[str, bytes]. In theory, it would be possible to infer whether it
>  > is str or bytes, as described.
> I'm -0.5 = "more trouble than it's worth".  Rationale:

I could imagine that being the conclusion at this point. I just feel
bad for reducing the precision in the "type-hintability" of things
like os.path.* as a consequence of the new fspath PEP, which I'm
otherwise already quite happy with. That's the problem I'm trying to
solve. I'm perfectly fine with it if the conclusion is that it's not
worth it.

I just don't want to say I broke the type hinting for os.path.* for
people that want mypy to catch their errors of mixing bytes and str
paths in the wrong way, or errors of mixing str/bytes-polymorphic path
code with str concatenation etc. in the wrong way. But if I'm told
Unions are sufficient, I'll be happy with that, because I'm not a
significant user of type hints anyway. So please, someone tell me

In my OP in the original thread, I describe how the os.path functions
could presently be hinted (I guess someone may have already done this,
but I did not check). In the future, It may be preferable to tell the
type checker that passing a PurePath into os.path functions results in
str being returned, as I also explain using a parametrizable type.

> *Sometimes* it would be possible.  OTOH, modules like os.path *will*
> end up carrying around Union[str, bytes], precisely because they will
> be called in both contexts, and that's not predictable at the time of
> type-checking os.path.

My proposal is aimed at making it possible to *always* have mypy know the types.

> So what you really want is a Sum type constructor, where Sum is in the
> sense of category theory's sum functor.  That is, not only is the
> result Type a (disjoint) union, but you're also guaranteed that each
> operation respects the component type.

(Side note: I don't know why you capitalize T in "Type" here, it does
not seem to make sense in the context that Guido has just been
describing as Type. Is this confusion an argument against calling Type

My point was to parametrize the type (hint) for path objects [pretty
much exactly (like) a generic], which roughly means that the type
checker can be told how one disjoint union as the argument type gets
turned into another disjoint union as the return type.

I suppose one might write this in English as "In a typical os.path.*
function such as os.path.dirname, Union[PathABC[str], str] and
Union[PathABC[bytes], bytes] get turned into str and bytes,

Mypy/PEP483-4 approaches this problem by using Generics and TypeVars.
Parametrizing the path type by the underlying type such as str in
PathABC[str] would allow using a TypeVar to indicate "respectively".

However, there is also @overload to indicate this.

> I could argue that this is why Guido was right to remove the
> restriction that os.fspath return str.  The __fspath__ method is part
> of the "sum category os.path"[1], which is a collection of operations
> on bytes and on str that parallel each other because they are
> equivalent representations of filesystem paths.  os.fspath therefore
> is also part of this category.  If you want to take the result out of
> this category and make it str, use os.fsdecode (which is *not* part of
> this category *because* the codomain is str, not Sum[bytes,str]).

Indeed, as mentioned in my previous post, you seem to be describing
exactly my earlier arguments, but using a different (or additional)
set of terminology.

> Note that Union is still a useful type if we have Sum.  For example,
> the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) ->
> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does,
> although in Python it doesn't).  So the "sum" requirement that the map
> respect "original" type is a language design question, not a
> restriction we should apply for the fun of it.

Yes, Unions will always be needed.

> Now, how would the type checker treat __fspath__?  Borrowing a type
> from Ethan, it needs to observe that there was bytes object in the
> main function that got passed to the Antipathy Sum[str,bytes] -> Path
> constructor.  It would then tag that Path p as a bytes-y Path, and
> further tag the value of os.fspath(p) as bytes.  So we would need to
> add the concept of a ComponentType-y object of Type SumType to the
> typing module.  That sounds complicated to me, and typically not all
> that useful: people living at the level of bytes will be swimming in
> bytes, ditto str-people.  Polymorphism "just works" for them.  People
> working at the interface will be using functions like os.fsdecode that
> ensure type in the nature of things.  They have to be more careful,
> but use of a Union[str, bytes] as an argument to foo(b: bytes) will
> already be flagged.

Indeed, as mentioned, there are already approaches for this.

> I think TRT here is to provide a way to tell the type checker that
> certain Unions should be converted as soon as possible, and not allow
> passing them even to polymorphic functions.  So:
>     def foo(d: DirEntry) -> Int:
>         s = os.fspath(d)
>         with open(s) as f:      # Flag use of s as Union[str, bytes].
>                                 # open() doesn't care, but the type
>                                 # checker knows that such Unions
>                                 # should only be passed to conversion
>                                 # functions.
>             return len(f)
> while
>     def foo(d: DirEntry) -> Int:
>         s = os.fspath(d)
>         s = os.fsdecode(s)      # Converted to str immediately, clear since
>                                 # fsdecode Type is "SuspiciousUnion -> str".
>         with open(s) as f:
>             return len(f)

There is nothing suspicious about that Union there. "Union[str, bytes]
-> str" is a perfectly good type hint for os.fsdecode. You can't get
any more precise than that.

Some nitpicking: Your function foo should probably be hinted to work
with any path, not just DirEntry, besides, there seems to be no right
way to currently import DirEntry, you won't actually be able to use
that as a static type hint.

I'm not sure this example has anything to do with what you (or I) are
proposing, except that os.fsdecode and os.fsencode indeed make sure
that the output type is no longer a Union. So maybe you are just
calling Unions in general suspicious, because without a context you
can't know exactly which runtime type it should be.

Luckily our PEP will turn your example into:

def foo(d: TheTypeHintForAnyPath) -> int:
    with open(d) as f:
        return len(f)

> is OK.  This is on the "catch bugs early" principle, and respects the
> generally accepted principle that in "most" applications, encoded
> bytes should be decoded to str "at the boundary".  Of course this
> would be optional (eg, it would be off in a standalone check of
> os.path for type sanity), and not all Unions are subject to this
> treatment (eg, it would make no sense for numbers).

One definitely should not use os.fsdecode or os.fsencode just to help
the type checker. I hope that is not what you meant.

> On the other hand, *inside* a function that is respectfully
> polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes]
> would tell the typechecker to ensure that in foo's implementation, if
> bytes go in, bytes come out, if str goes in, str comes out.  That
> might be very useful (but I don't have a strong opinion).

Indeed, this is almost the same as TypeVar or an overload.

> A nit: what would len(s: Sum[str, bytes]) -> Int mean?  In this case
> it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm
> not sure it's all that obvious in other possible cases.

Also here.

- Koos

> Finally, in many cases Sum types just are not going to be useful.  You
> might want to alias Number to Sum[Float, Int] because (in Python)
> Float + Float -> Float and Int + Int -> Int, but then you have two
> problems.  First, you can't do inference on mixed-type arithmetic, Sum
> doesn't permit that.[2]  Second, you can't do inference on Int / Int.
> Footnotes:
> [1]  The quotes mean "I think I could make this precise but it would
> be boring.  Bear with me, and don't be distracted by the fact that
> lots of stuff in this 'category' aren't in the os.path *module* -- eg,
> open() and __fspath__ itself."  I hope that the categorical language
> will be a useful metaphor for those who understand category theory,
> but really the whole thing is a hand-wavy path to "what implementation
> would look like".
> [2]  This is also a handwavy statement.  Technically, the arithmetic
> we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could
> do anything with mixed types, as Sum can only provide restrictions on
> unary: Sum -> Sum operation.  So a more accurate way to put the point
> is that it's not really obvious what properties Tuple[Sum, Sum] should
> have, especially in the context of a function whose value is of the
> same Sum Type.  And of course the problem with Int/Int is real -- it's
> hard to imagine how that could be handled by a general rule about
> Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type,
> the value should be of that type!  But we decided otherwise when
> designing Python 3.

More information about the Python-ideas mailing list