[Python-3000] Type Comparisons with Godel Numbers

Fri Apr 21 05:27:30 CEST 2006

With reference to the last Gfdl blog on type checking (http://www.artima.com/weblogs/viewpost.jsp?thread=87182). There is concern that type comparison at runtime using objects will be quite expensive, so this posting is about optimisation.
An idea is to compute a single canonical string which summarizes the _structural_ type of something. Think of it as a Godel number for the type. (http://diary.carolyn.org/godel2.html) Well not actually a Godel number, since they are very expensive to calculate and large. But close. Lets call them "type Godel strings" with the following properties:
* the godel string for identical types will be equal
* the godel strings for common types are small and very fast to compare.
* godel strings are stateless and serializable
You may have experienced something similar in C++ name mangling. (http://en.wikipedia.org/wiki/Name_mangling) There is an algorithm for creating a string from a type. However we are not proposing to mangle anything :-)
Here are C++ name mangling examples: 
   void h(int, char) // is mangled to:  _Z1hic
   void h(int) // is mangled to: _Z1hi
For Python types we do not need the names so we would have something like:
   h(int, str) -> void   -  '1is'
I don't want to guess what the strings would look like.  The algorithm design could use some kind of statistical analysis to come up with an optimally short encoding. Huffman maybe?
Type comparison operators would only need a deep inspection of the types when the godel strings don't match. If most comparisons will be an exact match (not a subtype) the lookup should be faster. My hunch is that in real systems, exact matches will be very common. Some statistical work on real programs would be a good idea... The subtype operator would do: 
   if object_type.godel() == required_type.godel(): 
                     # fast string comparison
      # exact match so OK proceed
   else:  
      if object_type <: type :  # slow to execute
         Yes proceed.
Godel strings would be cacheable. The type comparison operators could be memoized to store the type-subtype lattice, not as a lattice of type objects, but as a lattice type godel strings. The cache would also be able to remember relationships between types for which the original type expression objects have been deleted.
This idea implies that every type object in Python must provide a godel() method as a standard. 
Individual value objects could (provided they were immutable) cache their type's godel string accessible from a godel() method. This would speed up type comparison, since the type checking may not need to visit every sub- component of the object. Mutable objects would see little benefit, since its type can change with every update. At best this plan optimises away the analysis of the required type, but not the mutable instance to be compared. 
The idea also requires that type expressions be immutable, which is probably not a bad thing anyway. What possible use could there be for a mutable type expression?