[Python-3000] Type annotations: annotating generators
Paul Boddie
paul at boddie.org.uk
Sun May 21 01:00:18 CEST 2006
Tony Lownds wrote:
>
> Paul Boddie wrote:
> >
> > What's the general opinion on systems which attempt to infer and
> > predict inappropriate type usage?
[...]
> > Couldn't such systems be a better aid to program reliability?
> > Would "optional" type declarations be relevant to the operation
> > of such systems?
>
> I've been hacking on such a system, called "t_types". It is in pre-
> release form right now. It actually deduces type usage using bytecode
> simulation, before run-time.
I've been working on something similar, which is partly why I asked the
question. See here:
http://www.python.org/pypi/analysis
However, it's not that similar: it traverses the AST (from the compiler
module) for want of a representation that doesn't drop useful information
from the program. For a more efficient approach, I can imagine a virtual
machine which operates on types and constraints as opposed to actual data,
and I imagine that the PyPy "flow object space" might do something like this,
but finding some kind of concise confirmation of such suspicions in the PyPy
documentation is often a challenge.
There are other systems which take similar approaches: Shedskin, Starkiller,
Pylint (not the Logilab one), along with various other experiments. It makes
me think that a common framework, possibly involving some of the Logilab
projects (as previously suggested to me by one of their developers), may
provide a good opportunity for consolidation here.
> For t_types, starting the simulation with types more specific than
> "anything" is important for reasonable results. In general I think
> optional type declarations are relevant to such systems, whether a
> special syntax is adopted or decorators are used.
Is there some kind of unstated consensus on whole-program analysis? Collin
Winter expressed some reservations in a private communication, but I'd be
interested to hear why people seem to disregard that approach so lightly. I
don't doubt that accurate type inference is a hard problem - Mark Dufour's
thesis provides some level of confirmation of that - but I'm inclined to
believe that there's a wide open space between today's Python and ubiquitous
type declarations that could hold a more appropriate solution.
[...]
> from t_types import t
>
> @t.signature(t.int|t.None, returns=t.int)
> def test43(foo=None):
> if None is 1:
> # should be dead code
> return ''
> if 1 is None:
> # should be dead code
> return ''
> if foo is None:
> return 1
> else:
> # foo should have type of t.int here
> return foo
>
> @t.signature(returns=t.list[t.int])
> def test26():
> x = []
> x[0] = 1
> return x
Stripping the import and decorators from this and running it through the
analysis tools doesn't identify dead code, mostly because I haven't
implemented optimisations for the identity operators in the way envisaged
above (although it is done for isinstance calls), but it does tell you what
the return types are, provided there are some calls to the above functions.
Whilst that is almost like declaring the types, it is only necessary to
provide usage of a function at the top of a "call hierarchy" - you wouldn't
write an explicit call to test26 if such invocations already existed in other
places.
Accurate prediction of the contents of things like lists can be hard, though,
but instead of declaring such things everywhere, one thing I mentioned in my
communications with Collin Winter was the possibility of "semantic" marking
of such types: instead of stating list[int], or to take a more illustrative
example, instead of stating list[Element|Text], you refer to an ElementList
(possibly by subclassing list) and then have the inferencer work out that
only Element and Text objects are ever stored in such objects. Shedskin
manages to find such things out all by itself, but I'm not certain that it
can always do so. Having some level of abstract type annotation, whilst not
as nice as having the system work everything out by itself, is certainly
better than the maintenance nightmare of going round adjusting type
declarations upon a minor type-related modification to a program, however.
Paul
More information about the Python-3000
mailing list