[Types-sig] Re: PRE-PROPOSAL: Python Interfaces

Jim Fulton jim@Digicool.com
Tue, 24 Nov 1998 15:34:54 +0000


Note that this discussion should move to the Types SIG:
http://www.python.org/sigs/types-sig/.  I am cross posting to
types-sig@python.org.  Replies should be made to types-sig and omit
comp.lang.python.

John (Max) Skaller wrote:
> 
> On Fri, 06 Nov 1998 12:14:04 +0000, Jim Fulton <jim@digicool.com> wrote:

(meta comment snipped ;)
 
> Skip to the end for positive considerations!
> 
> >PROPOSED APPROACH
> >
> >  At a minimum, I'd like to get some sort of agreement on a basic
> >  high-level approach by the end of the Developer Day interfaces
> >  session.
> >
> >  Here is my proposal:
> >
> >    - There should be a way of defining interfaces in Python.
> 
>         Seems fair enough.
> 
> >      - An interface describes abstract behavior.
> 
>         I have no idea what that really means....

This is intentionally vague.  At minimum it is just a 
tag from a hierarchical tag space.  Where we take it beyond
that is a matter for discussion.

 
> >      - Interfaces are arranged in a hierarchy.  Implementation of a
> >        sub-interface implies implementation of base interfaces.
> 
>         NO! This is vastly too restrictive: this case is almost useless.

Why is this too restrictive.  This is merely a tool to organize
interfaces.  This approach has used by many systems, so I have
trouble buying that it is useless.

 
> >      - Interfaces will almost certainly not be *both* complete
> >        and formal.
> 
>         Why not? It can be done. Indeed, it has been.

I did not say "can not", I said "will not".  I have a little
experience with formal languages, mostly Z, and know that one
can built nifty formal behavioral models, but such formal models 
would be of little practical use to Python programmers and of 
even less use to the Python run-time.
 
> >For example, interfaces might provide syntactical
> >        information (signatures) formally, but may provide behavioral
> >        information informally, as documentation strings.
> 
>         No need. Any abstraction can be defined completely, and formally.
> If less is specified .. it is a weaker (less specific) abstraction.

It can be, but it won't be.  Very few Python programmers would have 
either the means or inclination.
 
> >    - There should be a way for a an object to assert that it
> >      implements one or more interfaces.
> 
>         I suspect this is wrong as a generalisation, although it may
> be a useful special case. To give a specific example, in the function
> call:
> 
>         f(a,b,c)
> 
> it is not enough that a, b, and c each has a particular interface.
> They must also stand in some relation (perhaps asserted by a precondition).

It depends on how complete you are being. In any case, preconditions
on a, b, and c can be simpler if they can test for certain interface
assertions.
 
> >     The mechanism for asserting
> >      conformance to interfaces is independent of an object's class.
> 
>         Agree.
> 
> >    - There should be a way for a class to assert the (default)
> >      interfaces that are implemented by it's instances.  This
> >      mechanism is independent of the classes implementation.  In
> >      particular, this mechanism does not use class inheritance.
> 
>         Yes. But you must understand that it is _not_ objects that
> have behaviours, and it is _not_ objects that have interfaces.
> It is _functions_ that have behaviour and collections of functions
> which must be related by semantic rules.
> 
>         It is fair to provide _convenience_ for functions collected
> as methods of objects .. but this is only a special case.

This is where we probably disagree is some fundamental (religious ;)
way.  I believe that objects should be the primary viewpoint for
reasoning about programs.  

> >    - The language may (is likely to) partially check assertions.
> 
>         The 'checking' side of specification is far less interesting
> than the constructive side!
> 
>         I want to give an example. A 'class' is an example.
> It is a mechanism to _construct_ multiple objects and can be
> leveraged to ensure they all obey some protocol -- because they
> all get the same set of methods. I don't mean to imply such obedience
> follows in all cases, but that with well written code use of classes
> can aid in verification: the construction is not strong enough for a guarrantee
> without programmer effort and reader analysis, but it helps a lot.
> 
>         So I'd lke to suggest that 'checking' interfaces is missing
> the most important application of your abstract goals: the very best
> checks are those that are not necessary! That is, ones that are
> assured _by construction_.
> 
> >    - There should be efficient tools to test:
> >
> >      - Whether an object implements an interface,
> 
>         Objects do not really implement interfaces.
> Interfaces are implemented by functions.

Objects implement object interfaces.  Object interfaces 
include method interfaces, as well as abstract state models
that model dynamic relations between method calls.
 
> >      - Whether instances of a class (that do not make their own
> >        interface assertions) implement an interface, and
> >
> >      - Whether an interface is a sub-interface of some other
> >        interface.
> 
>         This is the wrong terminology. Let's not get confused
> with OO hype here <grin> There is no hierarchy. That is not
> how it works. Interfaces (protocols) are related by categories.
> Just to be sure you underdstand: Python classes _already_
> transcend any notion of heirachy. Python supports
> multiple inheritance! You would have to move from a tree
> model to DAG just to account for python classes!

Of course, the interface "hierarchy" is really a DAG, just as
the class "hierarchy" is really a DAG.

>         In the end, there is a complete theory,
> in which a DAG is again only a trivial example.
> 
>         Category theory provides a complete theory of what you want.
> Consider a collection of nodes with a single pointer. Perhaps they
> form a tree.

The problem is that I don't know what category theory is, and the 
myriad posts you've made haven't shed much light on the subject for
me.  Some material at the end of your post has made it a little 
clearer.

I doubt that anyone else reading this thread who didn't
already know what this was is much clearer on it now.

The beauty of classification systems based on DAGs, which are,
for the sake of simplicity, typically called hierarchies, is that
they are very easily understood, and therefore useful.
 
>         Is there an object which 'is a' tree??

Yes.

> No. No single object has 'tree protocol' or 'tree interface'.

Depends on how you choose to define the interface.

> Is it the collection of objects which forms a tree??

I might, and typically would, provide an object that
manages the nodes and provides some assistance for navigating
to nodes.

> NO NO NO. It is the _pointers_ which form the tree!
> Isn't that obvious, now I've said it ?? <grin>

Your view is way to low-level for me.  The beauty of OO
is that it provides a system for building high-level
abstractions that let you see the tree as a whole, if that
is useful for the problem you are trying to solve.
 
>         OBJECTS ARE POINTS. THEY HAVE NO STRUCTURE.

You are obviously using the term "object" in a way that
is very different from the way that I use it.  To me, objects
are all about providing structure to data and algorithms.
 
>         They can't have structure -- they don't have enough 'dimension'.
> You need _arrows_  to build structures.
> 
> >  - Jargon
> >
> >    I like the term "interface", but I'd be happy to use other
> >    terminology.  A number of folks prefer the term "protocol".
> >    Let's agree on a term and stick to it.
> 
>         To me there is a distinction. A protocol is a dynamic thing.
> It involves not only static properties like having a set of methods
> with certain signatures, but dynamic ones like: "if you call
> method A, and then you call method B then ..." Calling those
> methods in that order is a matter of protocol. But this is only
> my personal taste :-)

I agree with you that we are talking about more than signatures.
This was one of the main points I was trying to make.  Even if you
can (or choose to) only express part of the behavior in the language, 
programmers need to be aware that there is more involved.  This is why
I think that the tagging nature of the scarecrow proposal is far more
important than details like signatures.
 
> >  - What's in an interface?
> >
> >    I've been intentionally vague about what's in an interface.  Most
> >    people would expect interfaces to, at a minimum, include
> >    signatures.  Frankly, I think interfaces would be valuable even if
> >    they only included hierarchy information (base interfaces) and doc
> >    strings.
> 
>         NO! Both ideas are wrong. Neither piece of information
> is really important. What is important is to provide the _correct_
> tools for a general specification.
> 
>         So lets analyse the two ideas. First, method signatures
> in static languages specify the codomain (return type) and
> a superset of the domain: a cartesian product of the parameter
> types. The actual domain is specified by refinement: usually
> using exception specifications (C++) or preconditions (Eiffel).
> 
>         But neither mechanism is really enough.

I agree.  BTW, the signatures need not have "type" information
at all.  For example:

  def f(a,b,c=None): ...

has a signature along the lines:

  accepts three positional arguments, a, b, and c, where c
  may be omitted.

> But specifying
> a set of categories which the method is an arrow of IS
> a complete specification. And it doesn't involving naming ANY types
> at all because it only involves laws of composition (chaining)
> of methods.

From what little I've seen, I can't see how to use it to form
useful specifications.  I'm sure if I worked hard enough at
it I could, but most people don't want to work that hard.

>         This is absolutely mandatory for genericity.
> I'm sure you will understand if I explain this by analogy to
> information hiding: it is NOT enough for an interface specification
> to elide implementation details to be abstract, it must not
> mention any specific types either.

"Type" is too loaded a word.  What do you mean by "type"?  Do you
mean a Python type, which in reality is another kind of class?
Or do you mean abstract data type?

> They're really implementation details too!

This discussion gets messy due to the overloading of the word 
type.

Let's suppose for a moment that, in the future, Python
supported some sort of "type" (ugh) annotations.  Then you
might have an interface like:

 interface Foo:

    def XInterface spam(YInterface a)

Now here we have assigned a return "type" and an argument "type"
but we haven't placed any restrictions on the implementation of
the argument or return value.

OTOH, even if we had:

 interface Foo2:

    def int spam(float a)
 
where 'int' and 'float' named specific implementations, we 
*still* haven't said anything about the implementation of
objects that implement Foo2.  Of course, objects that implement
Foo2 are less flexibly wrt to their arguments, but that may be
acceptable because the lack of flexibility is compensated for
by some higher level of service.

>         But... a 'non-specific' type is just a point with no attributes:
> the same as the next type! So how is this a 'type system'?

What do you mean by a type system?  Some people refer to the current
Python "type" and "class" implementations as a "type system".  They
are speaking mostly about implementation details.

The Interface proposal seeks to introduce a behavioral
type system so that objects can be composed without having
to query specific implementation details.


> That's the whole point. You cannot get 'polymorphism' UNLESS
> you abstract away ALL the properties of objects.

I disagree.

For example:

  def render(device, ob):
    assert (implements(device, DeviceInterface) 
            and implements(ob, DrawableInterface))

Here the render method imposes pretty specific restrictions on
it's arguments. It expects devices and drawables to implement specific
interfaces which are used in the render method.  OTOH, the render
method is highly polymorphic.  Devices might be CRTs, Postscript files, 
pen plotters, whatever.  Drawables might include geometric
shapes, maps, etc.

> You must vest
> all specification in the relationships of methods with each other.

You may find this approach useful, but I don't think it is the
only approach.  
 
>         I will to give a simple but "real world" example: a stack.
> A stack has methods 'push' and 'pop'. It is characterised by the rule:
> 
>         push . pop = identity
> 
> (roughly -- there is an _exact_ but more complex set of rules that
> accounts for empty stacks and stack overflow which I will ignore for clarity).

You're right, this *is* a simple example.  Formal systems can be
very entertaining for such simple examples, but it appears to me
that they don't scale very will.

What happens to the example above if the stack is bounded.

Also, the statement above doesn't tell me, at least not directly
what I consider to be some very useful facts, including:

  - What arguments does push take?  I'm not taking "types" here.
    I'd just like to know whether I have to pass one argument, two, 
    none, etc.

  - Why would I want to call push in the first place?

  - Does push return anything?

  - Same as above for pop. 

  - What is the relationship between the argument(s) passed to
    push and the value returned from pop?

  - Why do I care about identity? 

Frankly, I'd find the following interface to be more useful:

interface Stack:
   """First-in last-out object collections.
   """
\
   def push(x):
     """Add the argument x to the stack

     The object may be retrieved later with the pop method.
     Objects are retrieved in first-in last-out order.
     
     No value is returned.
     """

   def pop():
     """Remove and return an object from the stack.

     The object in the stack that was most recently added is returned.
     """
 
> The exposition above is important because unlike, for example,
> a C++ template class, it doesn't even name the type of the object
> being pushed.

This is a good point.  I certainly consider C++ and Java's "type
systems" to be kind of disgusting. ;)

An interface system that doesn't make you say things that you
don't want to say is a good thing, IMO.  Note that in the
informal interface definition I gave above, I didn't have
to say anything about the objects added to the stack either.
Unlike your example, I was able to say something that you failed 
to say, which was that aobjects are actually added to and
removed from the stack.

> In a C++ stack template, the type is named but arbitrary:
> this is equivalent but more verbose. It is _always_ possible to
> express it without mentioning the type, like above.
> 
> Not mentioning the type, even as a 'variable' is not necessary
> logically, but it is very necessary for a different reason:
> it provides a _syntactic_ guarrantee of abstraction.
> If the language constructions don't let you name the type,
> you cannot get any 'implementation details' into the specification
> for the simple reason there is no syntax for it!

This is a good thing.  OTOH, sometimes it is useful
to be able to set behavioral or even implementation restrictions
in the interface.
 
>         Just how abstract is the category above?
> Isn't that what you want for an interface!

No, it doesn't contain enough information.  And
it's not clear to me how to apply the same approach to make
less trivial statements.
 
>         What is an instance?
> 
>         Answer: any other category X which is the image
> of a functor from Stack. Such as that with the rule:
> 
>         push_int . pop_int = identity_int
> 
> >    A tempting addition would be to add behavioral information, like
> >    preconditions, postconditions, and invariants.  The intension
> >    would be to have these conditions checked at run time.  This is
> >    somewhat problematic, however, because many interesting conditions
> >    will depend on state.  In a specification language, abstract state
> >    is used, but Python run-time checks would need to use concrete
> >    state.  Using concrete state in an interface specification would
> >    violate the separation of interface and implementation.  Then
> >    again, if completeness is not a goal, which I think it can't be,
> >    then many interesting conditions *could* be expressed without
> >    reference to concrete state.
> 
>         No, you are missing the point.

I may be missing *your* point.  I'm pretty sure there are some points
I am able to grasp.  Probably these include *my* points.

> In Python, it is a matter
> of _state_ whether an object is one type or another. There is no
> hard distinction between the two. So checking type but not state
> is relying on a distinction that doesn't exist.

I strongly suspect that we (hm, well, maybe you) are running afoul 
of the badly overloaded term, "type".  I didn't use the term type
above.

In my (limited) experience (basically, just some grad school 
courses), to completely model (capture behavior of) an abstract
data type, you need to model abstract state.  This would add alot
of formality that would put it beyond the reach or interest of most
programmers.  Fortunately, formal completeness is a specific
non-goal of my proposal.
 
>         You suggest that checking state violates the abstractness
> of the specification and you're right.
> 
>         Checking TYPE is just as bad for the same reason.
> Consider Euclids algorithm which computes the gcd of two
> 'numbers'. Should we check the arguments are ints?

I have tried, for good reason, to avoid the word "type" in my
proposal.  I suggest it would be a good idea to eliminate
it from these discussions.  I never said that signatures 
included anything smelling like interface or implementation 
information, although I do not think this would necessarily
be a bad idea.

I certainly don't mind using interface that impose behavioral
restrictions on inputs.  I don't even mind using interfaces that
impose implementation restrictions on inputs if the lack of 
flexibility is compensated for by other features.  Polymorphism
is not my only goal.

For example, I might not mind using an object that implemented
an interface like:

  interface BigBadMath:

    def ArrayInterface invertHumoungousArray(
        DamnSpecificArrayOfFloatsClass x)

    ...

if objects that implemented this interface could
invert humoungous arrays 1000 times faster than objects
that supported more flexible interfaces.
 
(snip)

>         The fact that OO ties these methods to the arguments
> is unequivocably wrong.

a. OO does nothing of the sort.  Certain OO and non OO languages
   make you say things you don't want to say about arguments
   and return values. Other OO (and non-OO) languages don't.

b. It is useful at times to be able to restrict arguments, as 
   I have tried to demonstrate in examples above.

> Lets not compound the error by repeating
> Meyer's mistake.
> 
> -----------------------------------
> 
> Here's my approach. First, I have implemented a protocols module
> in interscript. Get interscript, if only to examine that one module.
> Or examine the documentation at:
> 
>         http://www.triode.net.au/~skaller/interscript
> 
> and look for the protocols module in the core subpackage of the
> implementation section of the document.

I did.

> This module just allows objects to be 'tagged' with the name of
> a protocol such as 'sequence' 'file' or 'mapping'.
> It has some conveniences such as class protocols, reflection
> of the protocols of bases, and using the names of types as
> protocols. But in essence the module does nothing more than
> allow tagging objects with protocols.
> 
> This _clearly_ is not enough because it doesn't provide any
> mechanism to express relationships between protocols.
> (Reflection of inheritance is a convenient hack!)

It is not only a convenient hack, it reflects the way
many people think about the world (classification) and
it is a model that is already familiar to Python
programmers.
 
> But it has an advantage -- it is implemented right now,
> it requires no language extensions, and it covers a lot of
> basic territory including the whole of the existing type
> system and all the defined and more abstract protocols.

As stated in the proposal, the Scarecrow proposal also
doesn't require any language changes, and FWIW the implementation
is trivial.  It also provides:

  - Organization of interfaces in an interface "hierarchy", 
    which is, of course, really a DAG,

  - Interface objects, rather than just string tags, that
    provide a basis for managing and using meta data, like
    doc strings, (at least minimal "generic") signatures, 
    pre-and post conditions, whatever....
 
> Now, according to category theory, the _meaning_ of a protocol
> cannot be embodied in the protocol -- it is just an arbitrary label.
> It is the relationships between protocols that are important.

Ok, you've talked me into not using category theory.  Fortunately, 
I didn't use it in the Scarecrow proposal. ;)
 
> So what is required -- given the protocol tagging mechanism I've
> implemented -- is a way of describing these relationships.
> 
> The 'is-a' relationship is not be ignored, even though it isn't everything.
> If one can say Protocol A implies Protocol B, this is important information
> (even if it is not fundamental).
> 
> Example: mutable sequence implies sequence.
> 
> Another construction is composition: if one can say a collection of functions
> obeys protocol A and also protocol B, then it obeys protocol C, this is also
> important information. For example:
> 
> Mutable and sequence implies mutable sequence.
> 
> It's fairly obvious that by fixing the object (set of methods!!) we're just dealing
> with predicates and restricted first order logic.  (No negation!) Deductions in this
> system can be trivially handled by existing python. [More complex deductions
> can be done by logic systems like Prolog, or better, Mercury]
> 
> What is much more interesting is _intra-object_protocol: protocols obeyed by
> two, three, or familiies of objects NOTof the same 'type'.

I assume that by 'type' you mean type in the formal sense (ADT).

Yes, that would be interesting.  This can be captured, at least partially
through restrictions in the interfaces themselves.  For example:

  interface Renderer:

    def render(display, drawable):

      assert (implements(display, Display and
              implements(drawable, Drawable)

does express an interrelation between the Rendered, Display, and
Drawable interfaces.

> These protocols _cannot_ be expressed by tagging objects because the set of
> methods obeying the protocol aren't all methods of the same class.

I think I did express a mult-interface protocol above through
an essentially tagging mechanism. 
 
> For example: a list constructed with a list header and nodes has two types,
> and the protocols involved in managing the collection are functions
> which may operate on several nodes as well as the header.
> 
> We can specify the protocol by writing global functions
> like insert and delete, and defining a _category_: that is, specifying
> the relationships between these methods directly.
> 
> In this scenario, if one writes a generic algorirthm, one passes _functions_
> to it as arguments, and one must test that the function stand in the relationship
> defined by a category if one is to determine whether the algorithm will work.
> 
> Example:
> 
>         def dup(push, pop):
>                 x = pop()
>                 push(x)
>                 push(x)
> 
> will duplicate the top element of a stack. What is the requirement?
> Obviously, 'it' has to 'be' a stack! What does that mean??
> Do you remember??
> 
>         push.pop = identity
> 
> So what is 'the stack'?? It is the pair of functions (push, pop). It isn't an object!!
> 
> I am developing the technology to allow construction of categories in Python.
> Here's roughly what I see happening:
> 
>         1) construct an abstract stack category 'stack'. The members, push, pop, and identity
> are class instances with no attributes. The category records the rule push.pop = identity
> in a table.
> 
>         2) Attach to each arrow of the category an attribute named 'F' whose value is
> a function. F is a functor. The assertion which must be tested is that for all x:
> 
>         s = stack.push.F (s,x)
>         stack.pop.F(s) == x
> 
> ['stack' is not the stack, s is the stack. 'stack' is the abstract category!]
> 
> A routine is passed the functor F by passing:
> 
>         stack, 'F'
> 
> ie, the abstract category together with the string name of the functor.
> The routine doesn't check anything. It can't: the universal quantifier
> is involved here.(for all x). But there is an implicit assertion that F
> is a functor and it COMPLETELY DEFINES ALL CONSTRAINTS ON
> THE INSTANCES stack.pop.F and stack.push.F.
> 
> This is the crucial point: the specification is complete, formal, and also
> available at run time (even though it cannot be proven).
> Note that it can be _disproven_ by a single case, and so is
> amenable to testing.
> 
> So it isn't possible to check everything but it _is_ possible to
> provide an utterly formal complete specification.
> 
> In all realistic cases, proof of correctness is possible,
> in some cases it may be possible to verify the proof,
> and in some it may be possible for an inference engine
> to deduce it with limited or no assistance. I think this
> is the 'best case' scenario: complete formal interface
> specifications with the potential for mechanical verification
> in some cases, possibly with assistance, by existing
> logic engines.

People have built languages for doing this sort of thing.
Why don't you use one of these languages?

You don't seem to like OO.  OK.  I do find OO to be very useful.
I also think formal languages are kind of fun too.  I've had some
fun with Z.  Formal languages can be useful in "critical systems"
where you just absolutely have to get algorithms right.  Formal
systems are really pretty darn hard for not trivial cases and
require a good bit of training.  I don't think that these are 
tools that are suitable for the vast majority of Python 
programmers.

Also note that you don't have to turn Python into a 
formal specification language to use a formal specification
language to model algorithms that get implemented in Python.
You can use various formal languages to reason about and 
even provide the correctness of algorithms and then map
those algorithms into Python.
 
> The _difficult_ part of all this is integrating the most trivial
> cases -- tagging objects and classes -- with the more
> general cases, and it is hampered by the lack of
> operators that act on function objects in python.
> Category theory tells what is required. Only 10% of it is there.
> For example, there is no way to simple compose two functions.
> It can be emulated with a class .. but it should be a fundamental
> primative.

Have fun. I'm interested to see what you come up with.
I'd be a lot more interested if you applied this energy to 
more OO formal systems, as I think that there'd be a higher
payoff for Python.
 
Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (540) 371-6909              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.