[Python-Dev] static typing of input arguments in signatures (was: Language Summit notes)

Sun Apr 13 06:48:16 CEST 2014

Stefan Behnel, 12.04.2014 19:11:
> Guido van Rossum, 10.04.2014 03:08:
>> - Jukka Lehtosalo gave a talk and answered questions about mypy, his design
>> and implementation of pragmatic type annotations (no new syntax required,
>> uses Python 3 function annotations).
> 
> FWIW, signature type annotations aren't enough for a static compiler like
> Cython, which also benefits from local and global variable declarations,
> static functions, etc. However, we initially discussed this feature in the
> project some five years ago or so and never actually implemented it, so I
> finally decided to add support for it to Cython. There already was a way to
> provide static Cython/C type declarations in pure Python code, also for
> function arguments, but it's nice to have a way that is also naturally
> runtime inspectable in the signature.
> 
> It essentially looks like this now:
> 
>     def func(plain_python_type: dict,
>              named_python_type: 'list',
>              explicit_python_type: {'type': dict},
>              explicit_named_python_type: {'type': 'tuple'},
>              explicit_c_type: {'ctype': 'int'}):
>         ...

It may not be obvious to everyone, so I guess I should add a comment on why
this wasn't considered important enough during the last five years to
implement it before (and that didn't change much now). One thing I learned
in the Cython project is that it's often a bad idea to statically type
input arguments at all. The reason is that it violates the principle of
being liberal with input (but strict with output).

There are cases where this is ok, e.g. using the C type "double" is
perfectly ok most of the time, because it's not (really) range restricting
and all numeric-ish Python types can happily coerce to it. Using integer
types is ok only if their restricted range really is enough, but can lead
to the Py2.5 problem of having too small integer types if things grow
larger. Similarly, using self-defined (extension) types to type input
arguments is ok because at the native level, the code will almost certainly
depend on their internal implementation details, so accepting anything else
here would be useless and wrong (CPython does the same for "self" type
checking, for example).

However, what I've often seen people do is to write something like the
above example, i.e. they use explicit Python types for input arguments.
This leads to an overly narrow API that rejects lots of otherwise
acceptable input types. What people usually want as input type is something
like "iterable", or "mapping". What many users end up writing instead is
"list" or "dict". The reason is that in Python code, input really *is* a
list or a dict most of the time, and at the C level (to which Cython
translates), list and dict have (limited) performance advantages, whereas
"iterable" and "mapping" do not. So using something like the ABC types
Iterable or Mapping for static typing is almost completely useless, except
for documentation purposes (and maybe a bit earlier type checking, in some
cases).

The point where this is extremely visible is string/bytes input. I've seen
many people actually use "str" or "bytes" to type their input. In Python 2,
"str" is a very bad idea, because it excludes "unicode". In Py3, that's
less of a problem, because all reasonable string input really is "str" (or,
more rarely, a subtype). In both Py2 and Py3, however, statically typing
input as "bytes" is a *very* bad idea, because it excludes everything
bytes-ish: bytearray, memoryview, buffers. Many users don't see this need
at first. While using buffers instead is easy enough in Cython, getting it
right for a specific use case and properly using it to interface with
external native libraries is less straight forward and requires more
thought than just writing "I want bytes".

So, what I've learned from seven years of Cython is that static typing in
signatures is actually less interesting than you might think at first
sight. It might be ok for documentation purposes, although its verboseness
makes that also a bit questionable. But for actual input checking, there's
substantially more than just the (generic) type, even at the C level, so
users who properly understand the problem don't use static argument typing
in many cases and instead write their own input validation, conversion,
normalisation code. Or just let the code raise an exception if the input
doesn't work the way it's being used. Errors don't always have to be raised
at signature evaluation time.

Stefan