[Python-ideas] Proposal: Use mypy syntax for function annotations
Terry Reedy
tjreedy at udel.edu
Thu Aug 14 04:27:25 CEST 2014
Guido, as requesting, I read your whole post before replying. Please to
the same. This response is both critical and supportive.
On 8/13/2014 3:44 PM, Guido van Rossum wrote:
> Yesterday afternoon I had an inspiring conversation with Bob Ippolito
> (man of many trades, author of simplejson) and Jukka Lehtosalo (author
> of mypy: http://mypy-lang.org/).
My main concern with static typing is that it tends to be
anti-duck-typing, while I consider duck-typing to be a major *feature*
of Python. The example in the page above is "def fib(n: int):". Fib
should get an count (non-negative integer) value, but it need not be an
int, and 'half' the ints do not qualify. Reading the tutorial, I could
not tell if it supports numbers.Number (which should approximate the
domain from above.)
Now consider an extended version (after Lucas).
def fib(n, a, b):
i = 0
while i <= n:
print(i,a)
i += 1
a, b = b, a+b
The only requirement of a, b is that they be addable. Any numbers should
be allowed, as in fib(10, 1, 1+1j), but so should fib(5, '0', '1').
Addable would be approximated from below by Union(Number, str).
> Bob gave a talk at EuroPython about
> what Python can learn from Haskell (and other languages); yesterday he
> gave the same talk at Dropbox. The talk is online
> (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad
> strokes comes down to three suggestions:
>
> (a) Python should adopt mypy's syntax for function annotations
-+ Syntax with no meaning is a bit strange. On the other hand, syntax
not bound to semantics, or at least not bound to just one meaning is
quite pythonic. '+' has two standard meanings, plus custom meanings
embodied in .__add__ methods.
+ The current semantics of annotations is that they are added to
functions objects as .__annotations__ (for whatever use) *and* used as
part of inspect.signature and included in help(ob) responses. In other
words, annotations are already used in the stdlib.
>>> def f(i:int) -> float: pass
>>> from inspect import signature as sig
>>> str(sig(f))
'(i:int) -> float'
>>> help(f)
Help on function f in module __main__:
f(i:int) -> float
Idle calltips include them also. A appropriately flexible standardized
notation would enhance this usage and many others.
+-+ I see the point of "The goal is to make it possible to add type
checking annotations to 3rd party modules (and even to the stdlib) while
allowing unaltered execution of the program by the (unmodified) Python
3.5 interpreter." On the other hand, "pip install mypytyping" is not a
huge burden. On the third hand, in the stdlib allows use in the stdlib.
> (b) Python's use of mutabe [mutable] containers by default is wrong
The premise of this is partly wrong and partly obsolete. As far as I can
remember, Python *syntax* only use tuples, not lists: "except (ex1,
ex2):", "s % (val1, val2)", etc. The use of lists as the common format
for data interchange between functions has largely been replaced by
iterators. This fact makes Python code much more generic, and
anti-generic static typing more wrong.
In remaining cases, 'wrong' is as much a philosophical opinion as a fact.
> (c) Python should adopt some kind of Abstract Data Types
I would have to look at the talk to know what Jukka means.
> Proposals (b) and (c) don't feel particularly actionable (if you
> disagree please start a new thread, I'd be happy to discuss these
> further if there's interest) but proposal (a) feels right to me.
> So what is mypy? It is a static type checker for Python written by
> Jukka for his Ph.D. thesis. The basic idea is that you add type
> annotations to your program using some custom syntax, and when running
> your program using the mypy interpreter, type errors will be found
> during compilation (i.e., before the program starts running).
>
> The clever thing here is that the custom syntax is actually valid Python
> 3, using (mostly) function annotations: your annotated program will
> still run with the regular Python 3 interpreter. In the latter case
> there will be no type checking, and no runtime overhead, except to
> evaluate the function annotations (which are evaluated at function
> definition time but don't have any effect when the function is called).
>
> In fact, it is probably more useful to think of mypy as a heavy-duty
> linter than as a compiler or interpreter; leave the type checking to
> mypy, and the execution to Python. It is easy to integrate mypy into a
> continuous integration setup, for example.
>
> To read up on mypy's annotation syntax, please see the mypy-lang.org
> <http://mypy-lang.org> website.
I did not see a 'reference' page, but the tutorial comes pretty close.
http://mypy-lang.org/tutorial.html
Beyond that, typings.py would be definitive,
https://github.com/JukkaL/mypy/blob/master/lib-typing/3.2/typing.py
> Here's just one complete example, to give a flavor:
> from typing import List, Dict
>
> def word_count(input: List[str]) -> Dict[str, int]:
The input annotation should be Iterable[str], which mypy does have.
> result = {} #type: Dict[str, int]
> for line in input:
> for word in line.split():
> result[word] = result.get(word, 0) + 1
> return result
The information that input is an Iterable[str] can be used either within
the definition of word_count or at places where word_count is called. A
type aware checker, either in the editor or compiler, could check that
the only uses of 'input' within the function is as input to functions
declared to accept an Iterable or in for statements.
Checking that the input to word_count is specifically Iterable[str] as
opposed to any other Iterable may not be possible. But I think what can
be done, including enhancing help information, might be worth it.
For instance, the parameter to s.join is named 'iterable'. Something
more specific, either 'iterable_of_strings' or 'strings: Iterable[str]'
would be more helpful. Indeed, there have been people posting on python
list who thought that 'iterable' means iterable and that .join would
call str() on each object. I think there are other cases where a
parameter is given a bland under-informative type name instead of a
context-specific semantic name just because there was no type annotation
available. There are places where the opposite problem occurs, too
specific instead of too general, where iterable parameters are still
called 'list'.
> Note that the #type: comment is part of the mypy syntax; mypy uses
> comments to declare types in situations where no syntax is available --
> although this particular line could also be written as follows:
>
> result = Dict[str, int]()
>
> Either way the entire function is syntactically valid Python 3, and a
> suitable implementation of typing.py (containing class definitions for
> List and Dict, for example) can be written to make the program run
> correctly. One is provided as part of the mypy project.
>
> I should add that many of mypy's syntactic choices aren't actually new.
> The basis of many of its ideas go back at least a decade: I blogged
> about this topic in 2004
> (http://www.artima.com/weblogs/viewpost.jsp?thread=85551 -- see also the
> two followup posts linked from the top there).
>
> I'll emphasize once more that mypy's type checking happens in a separate
> pass: no type checking happens at run time (other than what the
> interpreter already does, like raising TypeError on expressions like 1+"1").
>
> There's a lot to this proposal, but I think it's possible to get a PEP
> written, accepted and implemented in time for Python 3.5, if people are
> supportive. I'll go briefly over some of the action items.
>
> *(1) A change of direction for function annotations*
>
> PEP 3107 <http://legacy.python.org/dev/peps/pep-3107/>, which introduced
> function annotations, is intentional non-committal about how function
> annotations should be used. It lists a number of use cases, including
> but not limited to type checking. It also mentions some rejected
> proposals that would have standardized either a syntax for indicating
> types and/or a way for multiple frameworks to attach different
> annotations to the same function. AFAIK in practice there is little use
> of function annotations in mainstream code, and I propose a conscious
> change of course here by stating that annotations should be used to
> indicate types and to propose a standard notation for them.
There are many uses for type information and I think Python should
remain neutral among them.
> (We may have to have some backwards compatibility provision to avoid
> breaking code that currently uses annotations for some other purpose.
> Fortunately the only issue, at least initially, will be that when
> running mypy to type check such code it will produce complaints about
> the annotations; it will not affect how such code is executed by the
> Python interpreter. Nevertheless, it would be good to deprecate such
> alternative uses of annotations.)
I can imagine that people who have used annotations might feel a bit
betrayed by deprecation of a new-in-py3 feature. But I do not think it
necessary to do so. Tools that work with mypy annotations, including
mypy itself, should only assume mypy typing if typing is imported. No
'import typing', no 'Warning: annotation does not follow typing rules."
If 'typing' were a package with a 'mypy' module, the door would be
left open to other 'blessed' typing modules.
> *(2) A specification for what to add to Python 3.5*
>
> There needs to be at least a rough consensus on the syntax for
> annotations, and the syntax must cover a large enough set of use cases
> to be useful. Mypy is still under development, and some of its features
> are still evolving (e.g. unions were only added a few weeks ago). It
> would be possible to argue endlessly about details of the notation, e.g.
> whether to use 'list' or 'List', what either of those means (is a
> duck-typed list-like type acceptable?) or how to declare and use type
> variables, and what to do with functions that have no annotations at all
> (mypy currently skips those completely).
>
> I am proposing that we adopt whatever mypy uses here, keeping discussion
> of the details (mostly) out of the PEP. The goal is to make it possible
> to add type checking annotations to 3rd party modules (and even to the
> stdlib) while allowing unaltered execution of the program by the
> (unmodified) Python 3.5 interpreter. The actual type checker will not be
> integrated with the Python interpreter, and it will not be checked into
> the CPython repository. The only thing that needs to be added to the
> stdlib is a copy of mypy's typing.py module. This module defines several
> dozen new classes (and a few decorators and other helpers) that can be
> used in expressing argument types. If you want to type-check your code
> you have to download and install mypy and run it separately.
>
> The curious thing here is that while standardizing a syntax for type
> annotations, we technically still won't be adopting standard rules for
> type checking.
Fine with me, as that is not the only use. And even for type checking,
there is the choice between accept unless clearly wrong, versus reject
unless clearly right.
> This is intentional. First of all, fully specifying all
> the type checking rules would make for a really long and boring PEP (a
> much better specification would probably be the mypy source code).
> Second, I think it's fine if the type checking algorithm evolves over
> time, or if variations emerge.
As in the choice between accept unless clearly wrong, versus reject
unless clearly right.
> The worst that can happen is that you
> consider your code correct but mypy disagrees; your code will still run.
>
> That said, I don't want to /completely/ leave out any specification. I
> want the contents of the typing.py module to be specified in the PEP, so
> that it can be used with confidence. But whether mypy will complain
> about your particular form of duck typing doesn't have to be specified
> by the PEP. Perhaps as mypy evolves it will take options to tell it how
> to handle certain edge cases. Forks of mypy (or entirely different
> implementations of type checking based on the same annotation syntax)
> are also a possibility. Maybe in the distant future a version of Python
> will take a different stance, once we have more experience with how this
> works out in practice, but for Python 3.5 I want to restrict the scope
> of the upheaval.
As usual, we should review the code before acceptance. It is not clear
to me how much of the tutorial is implemented, as it says "Some of these
features might never see the light of day. " ???
> *Appendix -- Why Add Type Annotations?
> *
> The argument between proponents of static typing and dynamic typing has
> been going on for many decades. Neither side is all wrong or all right.
> Python has traditionally fallen in the camp of extremely dynamic typing,
> and this has worked well for most users, but there are definitely some
> areas where adding type annotations would help.
The answer to why on the mypy page is 'easier to find bugs', 'easier
maintenance'. I find this under-convincing as sufficient justification
in itself. I don't think there are many bugs on the tracker due to
calling functions with the wrong type of object. Logic errors, ignored
corner cases, and system idiosyncrasies are much more of a problem.
Your broader list is more convincing.
> - Editors (IDEs) can benefit from type annotations; they can call out
> obvious mistakes (like misspelled method names or inapplicable
> operations) and suggest possible method names. Anyone who has used
> IntelliJ or Xcode will recognize how powerful these features are, and
> type annotations will make such features more useful when editing Python
> source code.
>
> - Linters are an important tool for teams developing software. A linter
> doesn't replace a unittest, but can find certain types of errors better
> or quicker. The kind of type checking offered by mypy works much like a
> linter, and has similar benefits; but it can find problems that are
> beyond the capabilities of most linters.
Currently, Python linters do not have standard type annotations to work
with. I suspect that programs other than mypy would use them if available.
> - Type annotations are useful for the human reader as well! Take the
> above word_count() example. How long would it have taken you to figure
> out the types of the argument and return value without annotations?
Under a minute, including the fact the the annotation was overly
restrictive. But then I already know that only a mutation method can
require a list.
> Currently most people put the types in their docstrings; developing a
> standard notation for type annotations will reduce the amount of
> documentation that needs to be written, and running the type checker
> might find bugs in the documentation, too. Once a standard type
> annotation syntax is introduced, it should be simple to add support for
> this notation to documentation generators like Sphinx.
>
> - Refactoring. Bob's talk has a convincing example of how type
> annotations help in (manually) refactoring code. I also expect that
> certain automatic refactorings will benefit from type annotations --
> imagine a tool like 2to3 (but used for some other transformation)
> augmented by type annotations, so it will know whether e.g. x.keys() is
> referring to the keys of a dictionary or not.
>
> - Optimizers. I believe this is actually the least important
> application, certainly initially. Optimizers like PyPy or Pyston
> <https://github.com/dropbox/pyston> wouldn't be able to fully trust the
> type annotations, and they are better off using their current strategy
> of optimizing code based on the types actually observed at run time. But
> it's certainly feasible to imagine a future optimizer also taking type
> annotations into account.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list