[Python-ideas] Optional static typing -- the crossroads

Fri Aug 15 01:56:39 CEST 2014

I have read pretty much the entire thread up and down, and I don't think I
can keep up with responding to every individual piece of feedback. (Also, a
lot of responses cancel each other out. :-)

I think there are three broad categories of questions to think about next.

(A) Do we even need this?

(B) What syntax to use?

(C) Does/should it support <feature X>?

Taking these in turn:

(A) Do we even need a standard for optional static typing?

Many people have shown either support for the idea, or pointed to some
other system that addresses the same issue. On the other hand, several
people have claimed that they don't need it, or that they worry it will
make Python less useful for them. (However, many of the detractors seem to
have their own alternative proposal. :-)

In the end I don't think we can ever know for sure -- but my intuition
tells me that as long as we keep it optional, there is a real demand. In
any case, if we don't start building something we'll never know whether
it'll be useful, so I am going to take a leap of faith and continue to
promote this idea.

I am going to make one additional assumption: the main use cases will be
linting, IDEs, and doc generation. These all have one thing in common: it
should be possible to run a program even though it fails to type check.
Also, adding types to a program should not hinder its performance (nor will
it help :-).

(B) What syntax should a standard system for optional static typing use?

There are many interesting questions here, but at the highest level there
are a few choices that constrain the rest of the discussion, and I'd like
to start with these. I see three or four "families" of approaches, and I
think the first order is to pick a family.

(1) The mypy family. (http://mypy-lang.org/) This is characterized by its
use of PEP 3107 function annotations and the constraint that its syntax
must be valid (current) Python syntax that can be evaluated without errors
at function definition time. However, mypy also supports collecting
annotations in separate "stub" files; this is how it handles annotations
for the stdlib and C extensions. When mypy annotations occur inline (not in
a stub file) they are used to type check the body of the annotated function
as well as input for type checking its callers.

(2) The pytypedecl family. (https://github.com/google/pytypedecl) This is a
custom syntax that can only be used in separate stub files. Because it is
not constrained by Python's current syntax, its syntax is slightly more
elegant than mypy.

(3) The PyCharm family. (
http://www.jetbrains.com/pycharm/webhelp/using-docstrings-to-specify-types.html)
This is a custom syntax that lives entirely in docstrings. There is also a
way to use stub files with this. (In fact, every viable approach has to
support some form of stub files, if only to describe signatures for C
extensions.)

(I suppose we could add a 4th family that puts everything in comments, but
I don't think anyone is seriously working on such a thing, and I don't see
any benefits.)

There's also a variant of (1) that Łukasz Langa would like to see -- use
the syntactic position of function annotations but using a custom syntax
(e.g. one similar to the pytypedecl syntax) that isn't evaluated at
function-definition time. This would have to use "from __future__ import
<something>" for backward compatibility. I'm skeptical about this though;
it is only slightly more elegant than mypy, and it would open the
floodgates of unconstrained language design.

So how to choose? I've read passionate attacks and defenses of each
approach. I've got a feeling that the three projects aren't all that
different in maturity (all are well beyond the toy stage, none are quite
ready for prime time). In terms of specific type system features (e.g.
forward references, generic types, duck typing) I expect they are all
acceptable, and all probably need some work (and there's no reason to
assume that work can't be done). All support stubs so you can specify
signatures for code you can't edit (whether C extension, stdlib or just
opaque 3rd party code).

To me there is no doubt that (1) is the most Pythonic approach. When we
discussed PEP 3107 (function annotations) it was always my goal that these
would eventually be used for type annotations. There was no consensus at
the time on what the rules for type checking should be, but their syntactic
position was never in doubt. So we decided to introduce "annotations" in
Python 3 in the hope that 3rd party experiments would eventually produce
something satisfactory. Mypy is one such experiment. One of the important
lessons I draw from mypy is that type annotations are most useful to
linters, and should (normally) not be used to enforce types at run time.
They are also not useful for code generation. None of that was obvious when
we were discussing PEP 3107!

I don't buy the argument that PEP 3107 promises that annotations are
completely free of inherent semantics. It promises compatibility, and I
take that very seriously, but I think it is reasonable to eventually
deprecate other uses of annotations -- there aren't enough significant
other uses for them to warrant crippling type annotations forever. In the
meantime, we won't be breaking existing use of annotations -- but they may
confuse a type checker, whether a stand-alone linter like mypy or built
into an IDE like PyCharm, and that may serve as an encouragement to look
for a different solution.

Most of the thornier issues brought up against mypy wouldn't go away if we
adopted another approach: whether to use concrete or abstract types, the
use of type variables, how to define type equivalence, the relationship
between a list of ints and a list of objects, how to spell "something that
implements the buffer interface", what to do about JSON, binary vs. text
I/O and the signature of open(), how to check code that uses isinstance(),
how to shut up the type checker when you know better... The list goes on.
There will be methods whose type signature can't be spelled (yet). There
will be code distributed with too narrowly defined types. Some programmers
will uglify their code to please the type checker.

There are questions about what to do for older versions of Python. I find
mypy's story here actually pretty good -- the mypy codec may be a hack, but
so is any other approach. Only the __future__ approach really loses out
here, because you can't add a new __future__ import to an old version.

So there you have it. I am picking the mypy family and I hope we can start
focusing on specific improvements to mypy. I also hope that somebody will
write converters from pytypedecl and PyCharm stubs into mypy stubs, so that
we can reuse the work already put into stub definitions for those two
systems. And of course I hope that PyCharm and pytypedecl will adopt mypy's
syntax (initially in addition to their native syntax, eventually as their
sole syntax).

PS. I realize I didn't discuss question (C) much. That's intentional -- we
can now start discussing specific mypy features in separate threads (or in
this one :-).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140814/82c1429e/attachment-0001.html>