[Python-ideas] Type Hinting Kick-off

Guido van Rossum guido at python.org
Fri Dec 26 04:41:49 CET 2014


On Thu, Dec 25, 2014 at 1:49 PM, Eugene Toder <eltoder at gmail.com> wrote:

> On Wed, Dec 24, 2014 at 11:16 PM, Guido van Rossum <guido at python.org>
> wrote:
> [Eugene]
> >> b) Write the type of an overloaded function:
> >>
> >> @overload
> >> def foo(x: str) -> str: ...
> >> @overload
> >> def foo(x: bytes) -> bytes: ...
> >>
> >> foo # type: Intersection[Callable[[str], str], Callable[[bytes], bytes]]
> >
> > The static checker can figure that out for itself, but that doesn't mean
> we
> > necessarily need a way to spell it.
> It can be useful to be able to write the type, even if type checker can
> infer
> it in some cases. For example, to annotate the type of a C function that
> has
> overloading.
>

mypy solves that using @overload in a stub file. That's often more precise.


> Also, the type checker may need to print it when reporting an
> error. Also, to declare the type of an argument as an overloaded function.
> The last one is admittedly rare.
> If the Intersection type is implemented for the first reason this should
> "just
> work" as well.
>

Hm, looks like the case for Intersection is still pretty weak. Anyway, we
can always add stuff later. But whatever we add in 3.5 we cannot easily
take back.


> > I just copied this from mypy (where it is called typevar()). I guess in
> that
> > example one would use an *unconstrained* type variable. The use case for
> the
> > behavior I described is AnyStr -- if I have a function like this I don't
> > want the type checker to assume the more precise type:
> >
> > def space_for(s: AnyStr) -> AnyStr:
> >     if isinstance(s, str): return ' '
> >     else: return b' '
> >
> > If someone defined a class MyStr(str), we don't want the type checker to
> > think that space_for(MyStr(...)) returns a MyStr instance, and it would
> be
> > impossible for the function to even create an instance of the proper
> > subclass of str (it can get the class object, but it can't know the
> > constructor signature).
> >
> > [snip]
> >
> > OTOH one of the ideas on the table is to add keyword options to Var(),
> which
> > might make it possible to have type variables with different semantics.
> > There are other use cases, some of which are discussed in the tracker:
> > https://github.com/ambv/typehinting/issues/18
> Rather https://github.com/ambv/typehinting/issues/2? Yes, keyword
> arguments
> will make the syntax easier to understand.
>

Yes, that's the issue I meant.


> I thought more about this, and I think I understand what you are after. The
> syntax confused me somewhat. I also recalled that type variables may need
> lower bounds in addition to upper bounds. Is AnyStr the main use case of
> this
> feature? If that's the case, there are other ways to achieve the same
> effect
> with more general features.
>

I don't know if this is the main use case (we should ask Jukka when he's
back from vacation). I'm hesitant to propose more general features without
at least one implementation. Perhaps you could try to see how easy those
more general features would be implementable in mypy?


> First, some motivational examples. Say we have a class for an immutable
> list:
>
> class ImmutableList(Generic[X]):
>     def prepend(self, item: X) -> ImmutableList[X]: ...
>
> The type of prepend is actually too restrictive: we should be able to add
> items that are superclass of X and get a list of that more general type:
>
> Y = Var('Y') >= X   # must be a superclass of X
>
> class ImmutableList(Generic[X]):
>     def prepend(self, item: Y) -> ImmutableList[Y]: ...
>
> Alternative syntax for Y, based on the Java keyword:
>
> Y = Var('Y').super(X)
>

Neither syntax is acceptable to me, but let's assume we can do this with
some other syntax. Your example still feels like it was carefully
constructed to prove your point -- it would make sense in a language where
everything is type-checked and types are the basis for everything, and
users are eager to push the type system to its limits. But I'm carefully
trying to avoid moving Python in that direction.


> This will be handy to give better types to some methods of tuple and
> frozenset.
>

I assume you're talking about the case where e.g. I have a frozenset of
Managers and I use '+' to add an Employee; we then know that the result is
a frozenset of Employees. But if we assume covariance, that frozenset of
Managers is also a frozenset of Employees, so (assuming we have a way to
indicate covariance) the type-checker should be able to figure this out. Or
are you perhaps trying to come up with a way to spell covariance? (The
issue #2 above has tons of discussion about that, although I don't think it
comes to a clear conclusion.)


> Next, let's try to write a type for copy.copy.
>

Eek. That sounds like a bad idea -- copy.copy() uses introspection and I
don't think there's much hope to be able to spell its type. (Also I usually
consider the use of copy.copy() a code smell. Perhaps there's a connection.
:-)


> There are many details that can
> be changed, but here's a sketch. Naturally, it should be
>
> def copy(obj: X) -> X: ...
>
> But copy doesn't work for all types, so there must be some constraints on
> X:
>
> X = Var('X')
> X <= Copyable[X] # must be a subclass of Copyable[X]
>
> (Alternative syntax: X.extends(Copyable[X]); this also shows why
> constraints
> are not listed in the constructor.) Copyable is a protocol:
>
> @protocol
> class Copyable(Generic[X]):
>     def __copy__(self: X) -> X: ...
>
> And for some built-in types:
>
> Copyable.register(int)
> Copyable.register(str)
> ...
>
> This approach can be used to type functions that special-case built-in
> types,
> and rely on some known methods for everything else.
>

Sorry, I'm not sold on this. I also worry that the register() calls are
hard to track for a type checker -- but that's minor (I actually don't know
if this would be a problem for mypy). I just don't see the point in trying
to create a type system powerful enough to describe copy.copy().


> In my example with XEmployee the function could either return its argument,
> or make a copy -- the Employee class can require all its subclasses to
> implement some copying protocol (e.g. a copy() method).
>

That sounds like an artificial requirement on the implementation designed
to help the type checker. I'm inclined to draw the line well before that
point. (Otherwise Raymond Hettinger would throw a fit. :-)


> In fact, since the
> longest() function from your document always returns one of its arguments,
>

But that was just the shortest way to write such an example. The realistic
examples (e.g. URL parsing or construction) aren't that simple.


> its type can be written as:
>
> X = Var('X') <= Union[str, bytes]
>
> def longest(a: X, b: X) -> X: ...
>
> that is, it doesn't need to restrict the return type to str or bytes :-)
>

Now you're just wasting my time. :-)


> Finally, the feature from your document:
>
> AnyStr = Var('AnyStr').restrictTo(str, bytes) # the only possible values
>
> However, this can be achieved by adding more useful features to protocols:
>
> # explicit_protocol is a non-structural protocol: only explicitly
> registered
> # types are considered conforming. This is very close to type classes.
> # Alternatively, protocols with no methods can always be explicit.
> @explicit_protocol
> class StringLike(Generic[X]):
>     # This type can be referenced like a class-level attribute.
>     # The name "type" is not special in any way.
>     type = X
>
> StringLike.register(str)
> StringLike.register(bytes)
>
> AnyStr = Var('AnyStr')
> AnyStr <= StringLike[AnyStr]
> AnyStrRet = StringLike[AnyStr].type
>
> def space_for(x: AnyStr) -> AnyStrRet: ...
>
> There are many details that can be tweaked, but it is quite powerful, and
> solves the simpler problem as well.
>

I'm afraid you've lost me. But (as you may have noticed) I'm not really the
one you should be convincing -- if you can convince Jukka to (let you) add
something like this to mypy you may have a better case. Even so, I want to
limit the complexity of what we add to Python 3.5 -- TBH basic generic
types are already pushing the limits. I would much rather be asked to add
more stuff to 3.6 than to find out that we've added so much to 3.5 that
people can't follow along. Peter Norvig mentioned that the subtleties of
co/contra-variance of generic types in Java were too complex for his
daughter, and also reminded me that Josh Bloch has said somewhere that he
believed they made it too complex.


> > I strongly disagree with this. Python's predecessor, ABC, used a number
> of
> > non-standard terms for common programming language concepts, for similar
> > reasons. But the net effect was just that it looked weird to anyone
> familiar
> > with other languages, and for the users who were a completely blank
> slate,
> > well, "HOW-TO" was just as much jargon that they had to learn as
> > "procedure". Also, the Python users who will most likely need to learn
> about
> > this stuff are most likely library developers.
> Very good point. I should clarify that I don't suggest to change the
> terminology. I'm only talking about the syntax. The majority of the
> languages
> that use union type seem to use t1|t2 syntax for it. AFAIU this syntax was
> rejected to avoid changes in CPython. This is a shame, because it is
> widespread and reads really well:
>
>     def foo(x: Some|Another): ...
>

Yes, but we're not going to change it, and it will be fine.


> Also, Type|None is so short and clear that there's no need for the special
> Optional[] shorthand. Given we won't use |, I think
>
>     def foo(x: AnyOf[Some, Another]): ...
>
> reads better than
>
>     def foo(x: Union[Some, Another]): ...
>
> but this may be getting into the bikeshedding territory :-)
>

Right. :-)


> > This was proposed as the primary notation during the previous round of
> > discussions here. You are right that if we propose to "fix up" type
> > annotations that appear together with a default value we should also be
> able
> > in principle to change these shortcuts into the proper generic type
> objects.
> > Yet I am hesitant to adopt the suggestion -- people may already be using
> > e.g. dictionaries as annotations for some other purpose, and there is the
> > question you bring up whether we should promote these to concrete or
> > abstract collection types.
> I have some experience using this notation internally, and it worked quite
> well. To be specific, I do not suggest for Python to automatically convert
> these annotations to proper generic types. This should be done internally
> in
> the type checker. If we want other tools to understand this syntax, we can
> expose functions typing.isTypeAnnotation(obj) and
> typing.canonicalTypeAnnotation(obj). With this approach, I don't believe
> this
> use of lists and dicts adds any more problems for the existing uses of
> annotations.
>

But I can see a serious downside as well. There will likely be multiple
tools that have to be able to read the type hinting annotations, e.g. IDEs
may want to use the type hints (possibly from stub files) for code
completion purposes. Also someone might want to write a decorator that
extracts the annotations and asserts that arguments match at run time. The
more handy shorthands we invent, the more complex all such tools will have
to be.


> The decision of whether to use concrete or abstract types is likely not a
> hard one. Given my experience, I'd use concrete types, because they are so
> common. But this does depend on the bigger context of how annotations are
> expected to be used.
>

That's how I'm leaning as well.


> > Also, I should note that, while I mentioned it as a possibility, I am
> > hesitant to endorse the shortcut of "arg: t1 = None" as a shorthand for
> > "arg: Union[t1, None] = None" because it's unclear whether runtime
> > introspection of the __annotations__ object should return t1 or the
> inferred
> > Union object.
> FWIW, I'm -0.5 on this, but if this is implemented, I think __annotations__
> should return t1, and typing.canonicalTypeAnnotation accept an optional
> second
> argument for the default value.
>

You may just have killed the idea. Let's keep it simpler.

> Agreed this is an area that needs more thought. In mypy you can actually
> > write the entire annotation in string quotes -- mypy has to be able to
> parse
> > type expressions anyway (in fact it has to be able to parse all of Python
> > :-). I do think that the example you present feels rather obscure.
> Is the intention to keep this -- i.e. require the type checker to
> potentially
> parse strings as Python code? Complex expressions in strings feel a bit
> like
> a hack :-)
>

I know. :-)


> The code of this kind comes up regularly in containers, like the
> ImmutableList
> above. Some standard types have methods likes this as well:
>
> class Set(Generic[X]):
>     def union(self, other: Set[X]) -> Set[X]: ...
>

How complex does it really have to be? Perhaps Name[Name, Name, ...] is the
only form (besides a plain Name) that we really need? Anything more complex
can probably be reduced using type aliases. Then again my earlier argument
is clearly for keeping things simple, and perhaps an explicit forward
declaration is simpler. The run-time representation would still be somewhat
problematic. I'll try to remember to report back once I have tried to
implement this.


> > Yeah, it does look quite handy, if the ambiguity with forward references
> can
> > be resolved. Also it's no big deal to have to declare a type variable --
> you
> > can reuse them for all subsequent function definitions, and you usually
> > don't need more than two or three.
> Agreed. Having to declare a forward reference or a type variable both seem
> to
> add very little burden, and are possibly rare enough. I'm not sure which
> one
> should get a shorter syntax.
>

I don't think it's quite a toss-up. A type variable is a special feature.
But a forward reference is not much different from a backward reference --
you could easily imagine a language (e.g. C++ :-) where forward references
don't require special syntax. The rule that 'X' means the same as X but is
evaluated later is pretty simple, whereas the rule the 'X' introduces a
type variable is pretty complex. So even if we *didn't* use string quotes
for forward references I still wouldn't want to use that syntax for type
variables.


> While on the subject: what are the scoping rules for type variables? I hope
> they are lexically scoped: the names used in the enclosing class or
> function
> are considered bound to those values, rather than fresh variables that
> shadow
> them. I used this fact in the examples above. E.g. union() above accepts
> only
> the sets with the same elements, not with any elements, and in
>
> def foo(x: X) -> X:
>     def bar(y: X) -> X: return y
>     return bar(x)
>
> X in bar() must be the same type as in foo().
>

Why don't you install mypy and check for yourself? (I expect it's as you
desire, but while I have mypy installed, I'm on vacation and my family is
asking for my attention.)

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20141225/631f9271/attachment-0001.html>


More information about the Python-ideas mailing list