[Python-ideas] Type Hinting Kick-off

Eugene Toder eltoder at gmail.com
Thu Dec 25 22:49:38 CET 2014


On Wed, Dec 24, 2014 at 11:16 PM, Guido van Rossum <guido at python.org> wrote:
> No problem. :-) I apologize for reformatting the text I am quoting from you,
> it looked as if it was sent through two different line clipping functions.
Oh yes, sorry for that.

>> b) Write the type of an overloaded function:
>>
>> @overload
>> def foo(x: str) -> str: ...
>> @overload
>> def foo(x: bytes) -> bytes: ...
>>
>> foo # type: Intersection[Callable[[str], str], Callable[[bytes], bytes]]
>
> The static checker can figure that out for itself, but that doesn't mean we
> necessarily need a way to spell it.
It can be useful to be able to write the type, even if type checker can infer
it in some cases. For example, to annotate the type of a C function that has
overloading. Also, the type checker may need to print it when reporting an
error. Also, to declare the type of an argument as an overloaded function.
The last one is admittedly rare.
If the Intersection type is implemented for the first reason this should "just
work" as well.

> I just copied this from mypy (where it is called typevar()). I guess in that
> example one would use an *unconstrained* type variable. The use case for the
> behavior I described is AnyStr -- if I have a function like this I don't
> want the type checker to assume the more precise type:
>
> def space_for(s: AnyStr) -> AnyStr:
>     if isinstance(s, str): return ' '
>     else: return b' '
>
> If someone defined a class MyStr(str), we don't want the type checker to
> think that space_for(MyStr(...)) returns a MyStr instance, and it would be
> impossible for the function to even create an instance of the proper
> subclass of str (it can get the class object, but it can't know the
> constructor signature).
>
> [snip]
>
> OTOH one of the ideas on the table is to add keyword options to Var(), which
> might make it possible to have type variables with different semantics.
> There are other use cases, some of which are discussed in the tracker:
> https://github.com/ambv/typehinting/issues/18
Rather https://github.com/ambv/typehinting/issues/2? Yes, keyword arguments
will make the syntax easier to understand.

I thought more about this, and I think I understand what you are after. The
syntax confused me somewhat. I also recalled that type variables may need
lower bounds in addition to upper bounds. Is AnyStr the main use case of this
feature? If that's the case, there are other ways to achieve the same effect
with more general features.

First, some motivational examples. Say we have a class for an immutable list:

class ImmutableList(Generic[X]):
    def prepend(self, item: X) -> ImmutableList[X]: ...

The type of prepend is actually too restrictive: we should be able to add
items that are superclass of X and get a list of that more general type:

Y = Var('Y') >= X   # must be a superclass of X

class ImmutableList(Generic[X]):
    def prepend(self, item: Y) -> ImmutableList[Y]: ...

Alternative syntax for Y, based on the Java keyword:

Y = Var('Y').super(X)

This will be handy to give better types to some methods of tuple and
frozenset.

Next, let's try to write a type for copy.copy. There are many details that can
be changed, but here's a sketch. Naturally, it should be

def copy(obj: X) -> X: ...

But copy doesn't work for all types, so there must be some constraints on X:

X = Var('X')
X <= Copyable[X] # must be a subclass of Copyable[X]

(Alternative syntax: X.extends(Copyable[X]); this also shows why constraints
are not listed in the constructor.) Copyable is a protocol:

@protocol
class Copyable(Generic[X]):
    def __copy__(self: X) -> X: ...

And for some built-in types:

Copyable.register(int)
Copyable.register(str)
...

This approach can be used to type functions that special-case built-in types,
and rely on some known methods for everything else.

In my example with XEmployee the function could either return its argument,
or make a copy -- the Employee class can require all its subclasses to
implement some copying protocol (e.g. a copy() method). In fact, since the
longest() function from your document always returns one of its arguments,
its type can be written as:

X = Var('X') <= Union[str, bytes]

def longest(a: X, b: X) -> X: ...

that is, it doesn't need to restrict the return type to str or bytes :-)

Finally, the feature from your document:

AnyStr = Var('AnyStr').restrictTo(str, bytes) # the only possible values

However, this can be achieved by adding more useful features to protocols:

# explicit_protocol is a non-structural protocol: only explicitly registered
# types are considered conforming. This is very close to type classes.
# Alternatively, protocols with no methods can always be explicit.
@explicit_protocol
class StringLike(Generic[X]):
    # This type can be referenced like a class-level attribute.
    # The name "type" is not special in any way.
    type = X

StringLike.register(str)
StringLike.register(bytes)

AnyStr = Var('AnyStr')
AnyStr <= StringLike[AnyStr]
AnyStrRet = StringLike[AnyStr].type

def space_for(x: AnyStr) -> AnyStrRet: ...

There are many details that can be tweaked, but it is quite powerful, and
solves the simpler problem as well.

> I strongly disagree with this. Python's predecessor, ABC, used a number of
> non-standard terms for common programming language concepts, for similar
> reasons. But the net effect was just that it looked weird to anyone familiar
> with other languages, and for the users who were a completely blank slate,
> well, "HOW-TO" was just as much jargon that they had to learn as
> "procedure". Also, the Python users who will most likely need to learn about
> this stuff are most likely library developers.
Very good point. I should clarify that I don't suggest to change the
terminology. I'm only talking about the syntax. The majority of the languages
that use union type seem to use t1|t2 syntax for it. AFAIU this syntax was
rejected to avoid changes in CPython. This is a shame, because it is
widespread and reads really well:

    def foo(x: Some|Another): ...

Also, Type|None is so short and clear that there's no need for the special
Optional[] shorthand. Given we won't use |, I think

    def foo(x: AnyOf[Some, Another]): ...

reads better than

    def foo(x: Union[Some, Another]): ...

but this may be getting into the bikeshedding territory :-)

> This was proposed as the primary notation during the previous round of
> discussions here. You are right that if we propose to "fix up" type
> annotations that appear together with a default value we should also be able
> in principle to change these shortcuts into the proper generic type objects.
> Yet I am hesitant to adopt the suggestion -- people may already be using
> e.g. dictionaries as annotations for some other purpose, and there is the
> question you bring up whether we should promote these to concrete or
> abstract collection types.
I have some experience using this notation internally, and it worked quite
well. To be specific, I do not suggest for Python to automatically convert
these annotations to proper generic types. This should be done internally in
the type checker. If we want other tools to understand this syntax, we can
expose functions typing.isTypeAnnotation(obj) and
typing.canonicalTypeAnnotation(obj). With this approach, I don't believe this
use of lists and dicts adds any more problems for the existing uses of
annotations.
The decision of whether to use concrete or abstract types is likely not a
hard one. Given my experience, I'd use concrete types, because they are so
common. But this does depend on the bigger context of how annotations are
expected to be used.

> Also, I should note that, while I mentioned it as a possibility, I am
> hesitant to endorse the shortcut of "arg: t1 = None" as a shorthand for
> "arg: Union[t1, None] = None" because it's unclear whether runtime
> introspection of the __annotations__ object should return t1 or the inferred
> Union object.
FWIW, I'm -0.5 on this, but if this is implemented, I think __annotations__
should return t1, and typing.canonicalTypeAnnotation accept an optional second
argument for the default value.

> Agreed this is an area that needs more thought. In mypy you can actually
> write the entire annotation in string quotes -- mypy has to be able to parse
> type expressions anyway (in fact it has to be able to parse all of Python
> :-). I do think that the example you present feels rather obscure.
Is the intention to keep this -- i.e. require the type checker to potentially
parse strings as Python code? Complex expressions in strings feel a bit like
a hack :-)
The code of this kind comes up regularly in containers, like the ImmutableList
above. Some standard types have methods likes this as well:

class Set(Generic[X]):
    def union(self, other: Set[X]) -> Set[X]: ...

> Yeah, it does look quite handy, if the ambiguity with forward references can
> be resolved. Also it's no big deal to have to declare a type variable -- you
> can reuse them for all subsequent function definitions, and you usually
> don't need more than two or three.
Agreed. Having to declare a forward reference or a type variable both seem to
add very little burden, and are possibly rare enough. I'm not sure which one
should get a shorter syntax.

While on the subject: what are the scoping rules for type variables? I hope
they are lexically scoped: the names used in the enclosing class or function
are considered bound to those values, rather than fresh variables that shadow
them. I used this fact in the examples above. E.g. union() above accepts only
the sets with the same elements, not with any elements, and in

def foo(x: X) -> X:
    def bar(y: X) -> X: return y
    return bar(x)

X in bar() must be the same type as in foo().


Eugene


More information about the Python-ideas mailing list