On Wed, Aug 13, 2014 at 8:41 PM, Guido van Rossum <guido@python.org> wrote:

On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:

This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information.

In most cases it really doesn't matter though -- some types are better left concrete, especially strings and numbers. If you read the mypy docs you'll find that there are generic types, so that it's possible to define a function as taking an Iterable[T] and returning a Mapping[T, int]. What's not currently possible is expressing additional constraints on T such as that it must be a String. When I last talked to Jukka he explained that he was going to add something for that too (@Jukka: structured types?).

I wrote another message where I touched this. Mypy is likely to support something like this in the future, but I doubt it's usually worth the complexity. If a type signature is very general, at some point it describes the implementation in sufficient detail that you can't modify the code without changing the type. For example, we could plausibly allow anything that just supports split(), but if we change the implementation to use something other than split(), the signature would have to change. If we use more specific types (such as str), we leave us the freedom to modify the implementation within the bounds of the str interface. Standard library functions often only accept concrete str objects, so the moment you start using an abstract string type you lose access to much of the stdlib.

And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.

On this point one of us must be confused. Let's assume it's me. :-) Mypy has a few different IO types that can express the difference between text and binary files. I think there's some work that needs to be done (and of course the built-in open() function has a terribly ambiguous return type :-( ), but it should be possible to say that a text file is an Interable[str] and a binary file is an Iterable[bytes]. So together with the structured (?) types it should be possible to specify the signature of word_count() just as you want it. However, in most cases it's overkill, and you wouldn't want to do that for most code.

See my other message where I show that you can do this right now, except for the problem with open().

Also, it probably wouldn't work for more realistic examples -- as soon as you replace the split() method call with something that takes punctuation into account, you're probably going to write it in a way that works only for text strings anyway, and very few people will want or need to write the polymorphic version. (But if they do, mypy has a handy @overload decorator that they can use. :-)

Anyway, I agree it would be good to make sure that some of these more advanced things can actually be spelled before we freeze our commitment to a specific syntax, but let's not assume that just because you can't spell every possible generic use case it's no good.

It's always easy to come up with interesting corner cases where a type system would break down, but luckily, these are often almost non-existent in the wild :-) I've learned that examples should be motivated by patterns in existing, 'real' code, as otherwise you'll waste your time on things that happen maybe once a million lines (or maybe only in code that *you* write).

Jukka

--
--Guido van Rossum (python.org/~guido)