
On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert < abarnert@yahoo.com.dmarc.invalid> wrote:
On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <guido@python.org> wrote:
def word_count(input: List[str]) -> Dict[str, int]: result = {} #type: Dict[str, int] for line in input: for word in line.split(): result[word] = result.get(word, 0) + 1 return result
I just realized why this bothers me.
This function really, really ought to be taking an Iterable[String] (except that we don't have a String ABC). If you hadn't statically typed it, it would work just fine with, say, a text file—or, for that matter, a binary file. By restricting it to List[str], you've made it a lot less usable, for no visible benefit.
Heh. :-) I had wanted to write an additional paragraph explaining that it's easy to change this to use typing.Iterable instead of typing.List, but I forgot to add that.
And, while this is less serious, I don't think it should be guaranteeing that the result is a Dict rather than just some kind of Mapping. If you want to change the implementation tomorrow to return some kind of proxy or a tree-based sorted mapping, you can't do so without breaking all the code that uses your function.
Yeah, there's a typing.Mapping for that.
And if even Guido, in the motivating example for this feature, is needlessly restricting the usability and future flexibility of a function, I suspect it may be a much bigger problem in practice.
Well, so it was actually semi-intentional. :-)
This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information.
In most cases it really doesn't matter though -- some types are better left concrete, especially strings and numbers. If you read the mypy docs you'll find that there are generic types, so that it's possible to define a function as taking an Iterable[T] and returning a Mapping[T, int]. What's not currently possible is expressing additional constraints on T such as that it must be a String. When I last talked to Jukka he explained that he was going to add something for that too (@Jukka: structured types?).
And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.
On this point one of us must be confused. Let's assume it's me. :-) Mypy has a few different IO types that can express the difference between text and binary files. I think there's some work that needs to be done (and of course the built-in open() function has a terribly ambiguous return type :-( ), but it should be possible to say that a text file is an Interable[str] and a binary file is an Iterable[bytes]. So together with the structured (?) types it should be possible to specify the signature of word_count() just as you want it. However, in most cases it's overkill, and you wouldn't want to do that for most code. Also, it probably wouldn't work for more realistic examples -- as soon as you replace the split() method call with something that takes punctuation into account, you're probably going to write it in a way that works only for text strings anyway, and very few people will want or need to write the polymorphic version. (But if they do, mypy has a handy @overload decorator that they can use. :-) Anyway, I agree it would be good to make sure that some of these more advanced things can actually be spelled before we freeze our commitment to a specific syntax, but let's not assume that just because you can't spell every possible generic use case it's no good. -- --Guido van Rossum (python.org/~guido)