[Types-sig] STICK - static type and interface checking for python

scott scott@chronis.pobox.com
Fri, 28 Jan 2000 19:06:45 -0500


With much help from the discussions on this list and a colleague, I've
managed to come up with a type checking system for python.  It's not
done, by any means, but it certainly does a lot.  It's the result of a
few weeks of obsessively banging my head against the problem until
very recently, when it became apparent that there is atleast one
system which, as a whole, is quite feasible and provides for a lot of
the functionality we've all wanted -- polymorphism, safety, and a
framework for flexibility.

Instead of rushing to complete the system, I figured it best to get
this guy out there, hopefully as a means to further our discussions
until we come up with a more coherent picture of what we want.  With
any luck, we'll be able to hammer it into what we want instead of me
hammering it into what I want.  

Even without such luck, I really think it's worth a good look, because
it is largely functional, and most of all because it approaches a
single system much more so than the grab bags of mini specifications
we've been considering to date.  

It is available at
	
	ftp://chronis.pobox.com/pub/python/stick-20000128.tgz

I'm attaching the NOTES file from the project to try to peak some of
your curiousities into actually downloading and playing with stick.


Who, Me?'ly
scott
--------------------------------------------------------------------------
				 Type Safety

STICK derives its type safety from a few guiding principles.  First, any time
any operation happens in the python compile tree that involves information
loss about types, STICK assumes that all possible information that might be
lost is in fact lost.  I just call this being conservative.  Second, stick has
strict assignment checking.  When assigning a type T to a type T1, if the
union of T and T1 is not the same as T itself, the assignment is illegal.
Third, the types T and T1 are always derived from the set of all types that
the corresponding values may ever have throughout the life of the body of
checked code.  This last quality is assured partially from each of the first
two guiding principles.   

STICK does *not* make any significant attempt to do flow analysis.  The
closest it comes to flow analysis is looking at modules at the point in the
parse tree where the modules are imported to attempt to find a few extra
potential name errors.  I don't believe there's any need for stick to mark
particular pieces of code with type information that is not reliable accross
the entire body of code.  In fact, I believe that if a type system were to
rely on doing this marking of code segments in python, it would have no choice
but to be either unmanageable or full of thousands of type-safety holes.  


While I do not have a mathematical proof of type safety, and I'm not one who
is prone to doing such work, my study of type checking and understanding of
python lead me to believe STICK is type safe in the sense that there are no
surprising holes in the system. I could well be wrong, so if anyone can show
where I am wrong it would be greatly appreciated.  I am fully aware of the
affects of eval, exec, __import__ and the possibility of mucking with code
objects and/or byte code.  These things will never be type checkable, but I
think the system would be type safe if it were able to consider all of these
uses illegal.  Making it do that seems trivial when compared to the task of
designing and implementing this system, though.


				   Keywords

This type checking implementation is keyword sparse.  It introduces only
two new keywords, 'datatype' and 'decl'.  Along with these keywords, there
are a number of operators that are put to use, such as '->' and '!'.  These
operators could sometimes be replaced by other keywords for the sake of
clarity, but I chose to use them since introducing key words is so frowned
upon in python.  Whether or not these things are operators or keywords is 
irrelevant to this project.  This project is a prototype, and replacing such
symbols is trivial.  What is important is clarity of semantics.  See section
on Grammar below. 


			     Runtime implications

Since this system is polymorphic, it requires alterations to pythons runtime
environment.  Specifically, the runtime evironment must be able to get the
type of just about any arbitrary python value for the pupose of evaluating
deconstructors (see the file samples/or_test.py for more info on that).

Since it is questionable whether including runtime checks that reinforce the
compile time checks will cause potentially serious performance problems, and
since the compile time checks seem to be full proof in the absense of exec,
eval, __import__ and family, I recomend *not* having any runtime changes to
python except that which is necessary to implement the deconstructors.  Also,
those particular problem functions and statements can be checked for and
warned about.  Reduplicating compile checks at run time is atleast largely if
not completely redundant.


				   Grammar

The grammar for STICK python is in the file 'Grammar'.  The relationship
between grammars and  semantics is interesting.  One can always come up with a
different grammar that supports identical semantics.  One can also come up
with a different grammar that affects the way the system works.

I am fully aware of the fact that the Grammar that STICK presents will not be
exactly what the python community is looking for.  It is my own creation and
has my own idea of 'Pythonic' written all over it.  But everyone has a
different idea of what's more Pythonic and what's not.   Because of this, I
think STICK should welcome any changes to it's grammar that seems to satisfy
more people than myself, and should be subject to final say from Guido himself
-- that is, if STICK turns out to be viable for python in the first place.
Changes to the grammar which do not alter STICK's semantics I consider
trivial.  All other suggested changes would probably fair well only if they
take into consideration the semantics of the system, and most crucially, how
type safety is achieved in the system.  

Any suggested alterations to the grammar which imply a loss of type safety in
any way should be weighed heavily with the benefits such a change may bring
out.  Type safety is important, not only as a means to early code checking,
but also as a means to optimizers that can produce reliable code and
documentation that is uniform, thorough and correct.


				 Polymorphism

As I understand the term polymorphism, there are a few specific meanings which
all revolve around the idea that a value can be of a set of types rather than
a specific type.  STICK provides for polymorphism insofar as it supports a
means of dealing with arbitrary OR'd types.  Additionally, STICK provides a
way of using something akin to what has been called parameterized types, but
that I find more useful and intuitive less cluttery.  I am unaware of the
technical term for this, but it looks like this:

#
# function can take any type and will return lists of that type.  The
# particular type that the function takes can vary accross different function
# calls.  STICK currently supports this.
#
decl f: def(~A) -> [~A]

Additionally, STICK will attempt to add a similar such feature to it's notion
of interfaces.  But that comes later.  It is likely, but I'm not sure exactly
what form it would be available in.


			     Disambiguating Types

When you have a value that could be one of two fairly distinct types, you
can't do much with that value.  So there is a need to disambiguate the types.
This has been discussed at length on the types-sig.  The general consensus has
thus far been divided between using a type-checker-override operator (Greg
Stein's !) and thinking of alternative ways of making grammatical constructs
from other languages more pythonic.  STICK uses a refreshing approach that has
none of the drawbacks of either of these systems.  I don't know if any other
languages use something like this, but I got the idea as a sort of scaled down
typecase and a sortof inverted use of constructors (a la ML), all with a bit
of OOP interface.  I call it 'deconstructing' types for lack of a better term.

There was a discussion on the types-sig where guido said that he
believed in 'structural' type checking, or atleast he thought he did
;) This placed in contrast to using constructors.   STICK changes this
contrast to an inversion: instead of using constructors, use
deconstructors when you want to split up sets of types.  This way, the
type creation is structural, as it seems natural for python.

Most importantly, and you don't have to do *any* flow analysis to guarantee
type safety, which in the face of python's dynamism seems like a necessary
starting point.

The way STICK disambiguates is by making datatype objects that have OR'd types
have methods which are callable to get the appropriate type or a default.  For
more info, see samples/or_test.py.  This precludes the necessity of a
grammatical construct, and wraps up all the things that a user would have to
supply to a builtin method inside datatype objects.

				     
				     None

None always requires special attention.  In STICK, None has it's own
distinct type.  Many other systems have a None parallell that is
implicitly in the set of every type: sort of defined as the empty set.
This is worth consideration for STICK.  This idea does prevent the
type checker from catching confusion between None and any other type,
but it also has it's conveniences: less need to use the type engine,
making the interface to deconstructors even more easy by making the
second argument optional without the fear of raising an exception at
run time.

				 Declarations

One thing STICK is good at is limiting the need for declarations.  Every
assignable value acts as if it were declared with whatever type is first
assigned.   This means that 'x = 1' is equivalent to 'decl x: int; x = 1', so
long as that is x's first usage.  It also means that x = [ 1, 2, 3, "foo"][0]
is the same as declaring x to be of type 'int | string'.  Limiting
declarations to variables which are instantiated with some value that is a
subset of the total values for that variable makes life easy.


				  Interfaces

Interfaces aren't done in stick yet.  The idea I have for them is to make
interfaces distinct from classes or modules, and to allow inheritance that
will create subtypes.  Correspondingly, there can be a way to declare that
instances of a class or a module conforms to an interface.  With the very
strict base of plain type checking, this shouldn't be so hard.  You can guess
the semantics from reading this and looking at the grammar.

				   Feedback

All feedback is most welcome, and should probably be directed to the
types-sig (types-sig@python.org).  I ask only one favor, which is to
try to keep the posts short and addressing particular issues, even if
the particular issue is with STICK overall.  It's hard for everyone on
the sig to read huge proposals and rants.  And before Tim Peters calls
STICK half-assed, I'll go ahead an say it.  It's certainly half assed
in terms of some coding style and some organizational issues and the
degree of completion is <= 0.1-assed.  But it is still the result of a
huge amount of work, research, and primarily thought, on the part of
everyone who has contributed to the types-sig goals of recent,
including one particular Josh Marcus. Where it exceeds compared to
other proposals is that the system as a whole seems relatively safe
and flexible and coherent, so you can build on it well.  Where it
lacks is coordination and feedback and contribution from everyone out
there.  Hopefully this presentation will change that.

Scott Cotton
IC Group, Inc
scott@foobox.com