Python Type System: An idea for unification

John Max Skaller skaller at
Fri Aug 27 22:21:41 EDT 1999

While implementing Viper, I've been playing with the python type
system. Some of the results seem interesting enough to discuss,
and may be useful for C -&- J Python.

One of the things that is often requested for python is that
builtin types be class instances. Viper does _not_ do this:
AFAICS it is theoretically impossible (and  less useful than
the technique described below).

Here's what I do instead. Every object has an associated
type object. The function 'type(x)' gets the type object
associated with an object x. 

When a lookup on an object fails, lookup is done
on the type object of the object, before finally
issuing an AttributeError.

If the special lookup on the type object succeeds
and the result is a callable object, it is bound
to the original object to make a bound method,
as if the type object were a class, and the lookup
were for an instance attribute.

Now, for the builtin Viper objects, there is a 
python file '' which is loaded at startup
by the viperi interpreter. This file contains
the definitions of the type objects for the builtin types,
it is an ordinary python module!

What do the definitions look like? Well, we need the
type objects to support the alternate lookup mechanism,
so the obvious thing to do is use classes. Here is an

	class PyIntType:
		def succ(x): return x + 1
		def pred(x): return x - 1

To use this we try out the Viper code:

	x = 1
	print x . succ()

and it prints "2". The object x is NOT a class instance,
but it's type is a class, 


and the class methods of that class can be called on
any int object. If you say:

	import types
	print types.IntType

you will get

	"<class py_types.PyIntType>"

Viper has no special Type type: if you say

	print types.TypeType

you will get

	"<class py_types.PyClassType>"

which is just an ordinary class defining the type
of classes. The potential infinite regress looking up
types is halted whenever the type of an object is itself.

It is important to note that for a PyInstance -- the kind of
object created by a class like:

	class X: pass
	x = X()

the _class_ of x is X but the _type_ of x is
py_types.PyInstanceType = types.InstanceType.

Several advantages of this scheme are immediately apparent:

	1) the type system is unified
	2) the type of an object is a USER accessible object

To see what (2) means: you can modify the methods existing
types support by simply changing the file ''.
Indeed, you can add new methods to a type by simply
adding them to the class at run time! [In fact,
you can change the type of an object at run time -- 
a risky technique, but possibly useful]

But even better, you can define new types, and you
can do it IN PYTHON (well, in Viper anyhow :-)
You do not need to write any C code to add a new type to the type system.
(Although this will still be necessary to interface external C functions)

Indeed -- and this is what most people want I think -- you can
use INHERITANCE to derive new types from old ones
(provided the old type is a class, of course).

Now, in Viper, the type of an object can be a class,
but it doesn't _have_ to be. It can be _any_ object,
even the object itself. Some care is needed that
if a lookup is done, it will terminate: it is safe if there
are no circularities, or the only cycle is of length 1.

What use are objects other than classes as types?
I have NOT tried this yet, but I will explain the idea
anyhow. Consider the type:


representing a list of integers: unlike the usual list type,
we want a list that can _only_ contain integers.
Now, I could add this type to Viper. Indeed, with enough
support from the system, you could do it too, in python (see above).

However, the general notion of 'list of X' for some type X
could be implemented instead. Here's the idea:

	class ListOf:
		def __init__(self,etype):
			self.etype = etype
		def __call__(self): 
			obj = new_object(self) = []
			return obj
		# ...

Here, we use a class ListOf as a _generic_ type for all
lists of the same kind of type, and we use an _instance_
of that class for the type: the argument to the __init__ method
specifies the type:

	ListOfInt = ListOf(IntType)

will construct the type object for a list of ints.  And here is an actual
list of ints:

	x = ListOfInt()

Here, the __call__ method defined for the generic type ListOf
is called, because ListOfInt is an instance of the class ListOf.
The 'new_object' function creates a new object, and sets the type
of the object to its argument, which is an instance of ListOf
for which the 'etype' parameter is IntType.

When an attribute lookup is done on x, it will proceed to the
type object. The type object is ListOfInt, so any attributes
of that object will be searched. Such a search automatically
looks in the class ListOf, if it fails in the instance dictionary,
because it is the class of ListOfInt. Therefore, methods
of 'ListOf' become methods of ListOfInt and thus
methods of a list of integers.

We need to check where the binding occurs. In the
lookup on ListOf, a function found there is callable,
and so is bound to the instance ListOfInt.
But we want such a method to be bound the instance x
as well. So a generic method in ListOf needs TWO object

	1) the first argument binds to ListOfInt
	2) the second argument binds to x

Lets consider an 'append' method:

	class ListOf:
		# ...
		def append(erep, obj, elem):
			if erep.etype is not type(elem):
				raise "Attempt to add element of wrong type to List"

As you can see, we have not only added a user defined 'type'
to the type system by a new technique (other than using a class instance),
but we have added a whole family of them. We have added a new kind
of type: a generic type. 


Why am I shouting? Why is this important? Obviously, it is
useful to have control of Python IN Python: Python is not
only easier to write than C (or Java), it is also more portable.

But there is another reason. Viper is intended to be compiled.
Defining the type system in 'native code', whether it be C,
Java, or Ocaml, or whatever, makes compilation much harder:
all the details of the native code interpreter implementation
have to be preserved in the generated code. It is difficult
to optimse the code, because it is likely to be complex.

But if the type system is written in Python, then the compiler
doesn't need any special work to compile the type system,
it can concentrate on optimising the core language,
and treat the type system as a pure python library component.
Consequently, user defined 'native types' can be quite efficient,
because they're compiled.

In fact, at the moment, user defined types are MORE efficient
than builtin ones! The reason is that the builtin types cannot
have a pointer to their type objects when they're constructed,
because these type objects don't exist when the interpreter is
started: they're created when the interpreter imports the module
'py_types' at startup, and while that importation is hard coded
into the interpreter, the file is parsed and executed like any
other python module: indeed, the same applies to exceptions.

Consequently, builtin types have a __typename__ attribute which
names the type as a string, and the type is obtained by
looking up in the py_types module, using that name as the
dictionary key. This extra indirection is not required for
client type instances, for which the 'new_object' method
can set the type object directly.

Just to re-emphaises: I have NOT tried the generic typing
scheme yet. However, the 'types are arbitrary objects' is already
implemented and works: the types of all my builtin types
are defined in python in, the python
module 'types' executes successfully,and some cursory
assertions in a test file yield the expected behaviour,
including the ability to call type methods as object methods.

John Max Skaller                ph:61-2-96600850              
mailto:skaller at       10/1 Toxteth Rd  Glebe 2037 NSW AUSTRALIA

More information about the Python-list mailing list