[Python-3000] Type Expressions

Tue Apr 18 11:19:37 CEST 2006

Hi,

There is a massive amount of discussion about adding type checking to Python going back a long way, however most discussions gloss over exactly what type expressions are exactly. I'm working on this question with the goal of putting up a proposal of some kind to this list. 

So far I have some conclusions/opinions which I'd like to share and get feedback and help on. Here's some points which I want to get out there:

(a) A lot of work on types does not consider composite types. A composite type is something like:  [int, int, str], ie a type consisting of atomic types held together by collections. As a programmer I want to be able to formally define composite types. I am parsing a CSV file I'd like to be able to declare the file format as a type.  These types are very important in the following contexts: 
  (1) reading input and validating it. (XML anyone?)
  (2) unit testing - did my code create the right output?
  (3) documenting the types my code uses for human readers in a standard way

(b) Much of the academic papers I have read on types are focussed on compile-time static type checking. This is understandable, but Python is a dynamic language and an object can change it's structure at run-time so programs will always outwit the compiler. So IMHO Python needs to have a _dynamic_ type expression system. This also gives Python programmers access to type expressions, they should not be the sole preserve of the compiler writers.

(b) I work with some heavy-hitting java developers who don't write a line of code without unit tests. Lots of unit tests. Even developers using a statically typed language rely on unit tests for program correctness. I trust Python programmers are the same. So any discussion of type systems needs to consider what benefits to unit test writing. IMHO dynamic type expressions with dynamic type checking predicates are very useful in unit tests. They can help answer the question "is my program creating the right structures?" Even when those structures are untyped collections.

(c) Simple composite type expressions with dynamic predicates are in my view more useful and universal than complex Interface or Adaption schemes. I would prefer to have a flexible and expressive type expression facility than Interfaces. 

(e) Unlike a compiled type system, a dynamic type system does have access to values. So we can define types where values are given. e.g. the type ('href', str) is the set of all 2-tuples where the first element is 'href'. Compiled static type systems don't normally allow this. 

(d) Working code. So far I have been coding some simple examples of type expressions with some predicates. I am (ab)using the existing Python syntax to get type expressions. Here are some examples:

	(int, int) # a tuple with two ints

	[int, etc]  # a list of ints
			# (etc = zero or more of the previous element)

	{'b':[12, 34], 'c':list, 'a':long} # dict 

	Union(int, float)

	function(a=int, b=float, returns=float) 

	pyobject(xcoord=int, ycoord=int) # an instance

	These are all true:

	isSubType(list, [42])
	isSubType((int, etc), (int, int, int))
	isSubType(Union(int, float, str), Union(float, int))
	isSubType({'a':Number, 'b':object}, {'b':[12, 34], 'c':list, 'a':long})

(e) Python needs a succinct type expression syntax! Really.

(f) Predicates on types: "isInstance(T1, x)". Given a Python object, can x be used where type T1 is expected? And isSubType(T1, T2) - can T2 be used where a T1 is expected? This is a _structural_ subtyping predicate. Given point (e), isSubType() and isInstance() look like the same thing. If a value is also a valid type expression there's no distinction. For now my implementation of isSubType() accepts instances too.

(f) A dynamic type expression system also allows programmers to _generate_ objects from type expressions. I have a function generateRandom(T) which returns a random instance of type T. In a loop this is a fun way to break code. 

(g) Parametric types are simple function calls: lambda T: {'stack': [T], 'pop': function(returns=T) , 'push': function(newtop=T)}. Type names are variable bindings, just like values.

(h) Interfaces are just rather complex type expressions.

In summary a type system should be _dynamic_ and not static.  It's reflection API should be trivial to use. It should help programmers to answer these questions: Is this data I just received in the right structure? Did my program just output something in the right structure? What structure am I expected to pass to this library? My view is that a dynamic type system used in unit testing will find more defects than a static type system. And it will be fun.