On Thu, Aug 14, 2014 at 9:12 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Guido van Rossum schrieb am 14.08.2014 um 07:24:
> On Wed, Aug 13, 2014 at 9:06 PM, Jukka Lehtosalo wrote:
>> You could use AnyStr to make the example work with bytes as well:
>>
>>   def word_count(input: Iterable[AnyStr]) -> Dict[AnyStr, int]:
>>       result = {}  #type: Dict[AnyStr, int]
>>
>>       for line in input:
>>           for word in line.split():
>>               result[word] = result.get(word, 0) + 1
>>       return result
>>
>> Again, if this is just a simple utility function that you use once or
>> twice, I see no reason to spend a lot of effort in coming up with the most
>> general signature. Types are an abstraction and they can't express
>> everything precisely -- there will always be a lot of cases where you can't
>> express the most general type. However, I think that relatively simple
>> types work well enough most of the time, and give the most bang for the
>> buck.
>
> I heartily agree. But just for the type theorists amongst us, if I really
> wanted to write the most general type, how would I express that the AnyStr
> in the return type matches the one in the argument? (I think pytypedecl
> would use something like T <= AnyStr.)

That's how Cython's "fused types" (generics) work, at least. They go by
name: same name of the type, same type. Otherwise, use alias names, which
make the types independent from each other.

http://docs.cython.org/src/userguide/fusedtypes.html

While it's a matter of definition what way to go here (same type or not),
practice has shown that it's clearly the right decision to make identical
types the default.

I don't understand those docs at all, but I do think I understand the rule "same name, same type" and I think I like it. Let me be clear -- in this example:

def word_count(input: Iterable[AnyStr]) -> Mapping[AnyStr, int]:
    ...

the implication would be that if the input is Iterable[bytes] the output is Mapping[bytes, int] while if the input is Iterable[str] the output is Mapping[str, int]. Have I got that right? I hope so, because I think it is a nice simplifying rule that covers a lot of cases in practice. (Note: AnyStr is a predefined type in mypy that means "str or bytes".)

BTW there are a lot of messy things to consider around bytes, and IIUC mypy currently doesn't really cover them. Often when you write code that accepts a bytes instance, in practice it will accept anything that supports the buffer protocol (e.g. bytearray and memoryview). Except when you are going to use it as a dict key, then bytearray won't work. And if you say that you are returning bytes, you probably shouldn't be returning a memoryview or bytearray. I don't expect that any type system we can come up with will be quite precise enough to cover all the cases, so we probably shouldn't lose too much sleep over this.

--
--Guido van Rossum (python.org/~guido)