[Python-Dev] type categories

Thu, 15 Aug 2002 00:28:25 +0300

On Wed, Aug 14, 2002 at 09:09:19AM -0400, Guido van Rossum wrote:
...
> Now I think you've lost me.  How can a category on the one hand be
... 
> Again you've lost me.  I expect there's something here that you assume
...

Oh dear. Here we go again. I'm afraid that it may take several frustrating 
iterations just to get our terminology and assumptions in sync and be able 
to start talking about the actual issues.

> > Type categories are fundamentally different from interfaces.  An 
> > interface must be declared by the type while a category can be an 
> > observation about an existing type. 
> 
> Yup.  (In Python these have often been called "protocols".  Jim Fulton
> calls them "lore protocols".)

Nope. For me protocols are conventions to follow for performing a certain 
task.  A type category is a formally defined set of types.  

For example, the 'iterable' protocol defines conventions for a programmer
to follow for doing iteration.  The 'iterable' category is a set defined
by the membership predicate "hasattr(t, '__iter__')".  The types in the
'iterable' category presumably conform to the 'iterable' protocol so there 
is a mapping between protocols and type categories but it's not quite 1:1.

Protocols live in documentation and lore. Type categories live in the same 
place where vector spaces and other formal systems live.

> > Two types that are defined independently in different libraries may
> > in fact fit under the same category because they implement the same
> > protocol.  With named interfaces they may in fact be compatible but
> > they will not expose the same explicit interface. Requiring them to
> > import the interface from a common source starts to sound more like
> > Java than Python and would introduce dependencies and interface
> > version issues in a language that is wonderfully free from such
> > arbitrary complexities.
> 
> Hm, I'm not sure if you can solve the version incompatibility problem
> by ignoring it. :-)

Oops, I meant interface version *numbers*, not interface versions. A
version number is a unidimentional entity. Variations on protocols and
subprotocols have many dimensions. I find that set theory ("an object that 
has a method called foo and another method called bar") works better than
arithmetic ("an object with version number 2.13 of interface voom").

> Are you familiar with Zope's Interface package?  It solves this
> problem (nicely, IMO) by allowing you to place an interface
> declaration inside a class but also allowing you to make calls to an
> interface registry that declare interfaces for pre-existing classes.

I don't like the bureacracy of declaring interfaces and maintaining 
registeries. I like the ad-hoc nature of Python protocols and I want a 
type system that gives me the tools to use it better, not replace it with 
something more formal.

> > A category is defined mathematically by a membership predicate. So
> > what we need for type categories is a system for writing predicates
> > about types.
> 
> Now I think you've lost me.  How can a category on the one hand be
> observed after the fact and on the other hand defined by a rigorous
> mathematical definition?  How could a program tell by looking at a
> class whether it really is an implementation of a given protocol?

A category is defined mathematically. A protocol is a somewhat more fuzzy
meatspace concept.  A protocol can be associated with a category with
reasonable accuracy so the result of a set operation on categories is
reasonably applicable to the associated protocols. 

Even a human can't always tell whether a class is *really* an implmentation 
of a given protocol. But many protocols can be inferred with pretty good 
accuracy from the presence of methods or members. You can always add a 
member as a flag indicating compliance with a certain protocol if that is
not enough.

My basic assumption is that programmers are fundamentally lazy. It hasn't
ever failed me so far.

This way there is no need to declare all the protocols a class conforms to.
This is important since in many cases the protocol is only "discovered" 
later.  The user of the class knows what protocol is expected and only 
needs to declare that.  It should reduces the tendency to use relatively 
coarse-grained "fat" interfaces because there is not need to declare every 
minor protocol the type conforms to - it may observed by users of this 
type using a type category.

> > Standard Python expressions should not be used for defining a
> > category membership predicate. A Python expression is not a pure
> > function. This makes it impossible to cache the results of which
> > type belongs to what category for efficiency. Another problem is
> > that many different expressions may be equivalent but if two
> > independently defined categories use equivalent predicates they
> > should *be* the same category.  They should be merged at runtime
> > just like interned strings.
> 
> Again you've lost me.  I expect there's something here that you assume
> well-known.  Can you please clarify this?  What on earth do you mean
> by "A Python expression is not a pure function" ?

A function whose result depends only on its inputs and has no side effects.
In this case I would add "and can be evaluated without triggering any 
Python code". Set operations on membership predicates, caching and other
optimizations need such guarantees.

	Oren