[Python-3000] Type annotations: annotating generators

Sun May 21 01:00:18 CEST 2006

Tony Lownds wrote:
>
> Paul Boddie wrote:
> >
> > What's the general opinion on systems which attempt to infer and
> > predict inappropriate type usage?

[...]

> > Couldn't such systems be a better aid to program reliability?
> > Would "optional" type declarations be relevant to the operation
> > of such systems?
> 
> I've been hacking on such a system, called "t_types". It is in pre- 
> release form right now. It actually deduces type usage using bytecode
> simulation, before run-time.

I've been working on something similar, which is partly why I asked the 
question. See here:

http://www.python.org/pypi/analysis

However, it's not that similar: it traverses the AST (from the compiler 
module) for want of a representation that doesn't drop useful information 
from the program. For a more efficient approach, I can imagine a virtual 
machine which operates on types and constraints as opposed to actual data, 
and I imagine that the PyPy "flow object space" might do something like this, 
but finding some kind of concise confirmation of such suspicions in the PyPy 
documentation is often a challenge.

There are other systems which take similar approaches: Shedskin, Starkiller, 
Pylint (not the Logilab one), along with various other experiments. It makes 
me think that a common framework, possibly involving some of the Logilab 
projects (as previously suggested to me by one of their developers), may 
provide a good opportunity for consolidation here.

> For t_types, starting the simulation with types more specific than  
> "anything" is important for reasonable results. In general I think
> optional type declarations are relevant to such systems, whether a
> special syntax is adopted or decorators are used.

Is there some kind of unstated consensus on whole-program analysis? Collin 
Winter expressed some reservations in a private communication, but I'd be 
interested to hear why people seem to disregard that approach so lightly. I 
don't doubt that accurate type inference is a hard problem - Mark Dufour's 
thesis provides some level of confirmation of that - but I'm inclined to 
believe that there's a wide open space between today's Python and ubiquitous 
type declarations that could hold a more appropriate solution.

[...]

> from t_types import t
> 
> @t.signature(t.int|t.None, returns=t.int)
> def test43(foo=None):
>    if None is 1:
>      # should be dead code
>      return ''
>    if 1 is None:
>      # should be dead code
>      return ''
>    if foo is None:
>      return 1
>    else:
>      # foo should have type of t.int here
>      return foo
> 
> @t.signature(returns=t.list[t.int])
> def test26():
>    x = []
>    x[0] = 1
>    return x

Stripping the import and decorators from this and running it through the 
analysis tools doesn't identify dead code, mostly because I haven't 
implemented optimisations for the identity operators in the way envisaged 
above (although it is done for isinstance calls), but it does tell you what 
the return types are, provided there are some calls to the above functions. 
Whilst that is almost like declaring the types, it is only necessary to 
provide usage of a function at the top of a "call hierarchy" - you wouldn't 
write an explicit call to test26 if such invocations already existed in other 
places.

Accurate prediction of the contents of things like lists can be hard, though, 
but instead of declaring such things everywhere, one thing I mentioned in my 
communications with Collin Winter was the possibility of "semantic" marking 
of such types: instead of stating list[int], or to take a more illustrative 
example, instead of stating list[Element|Text], you refer to an ElementList 
(possibly by subclassing list) and then have the inferencer work out that 
only Element and Text objects are ever stored in such objects. Shedskin 
manages to find such things out all by itself, but I'm not certain that it 
can always do so. Having some level of abstract type annotation, whilst not 
as nice as having the system work everything out by itself, is certainly 
better than the maintenance nightmare of going round adjusting type 
declarations upon a minor type-related modification to a program, however.

Paul