While I was driving to work today, I had a thought about the iterator/iterable discussion of a few weeks ago. My impression is that that discussion was inconclusive, but a few general principles emerged from it: 1) Some types are iterators -- that is, they support calls to next() and raise StopIteration when they have no more information to give. 2) Some types are iterables -- that is, they support calls to __iter__() that yield an iterator as the result. 3) Every iterator is also an iterable, because iterators are required to implement __iter__() as well as next(). 4) The way to determine whether an object is an iterator is to call its next() method and see what happens. 5) The way to determine whether an object is an iterable is to call its __iter__() method and see what happens. I'm uneasy about (4) because if an object is an iterator, calling its next() method is destructive. The implication is that you had better not use this method to test if an object is an iterator until you are ready to take irrevocable action based on that test. On the other hand, calling __iter__() is safe, which means that you can test nondestructively whether an object is an iterable, which includes all iterators. Here is what I realized this morning. It may be obvious to you, but it wasn't to me (until after I realized it, of course): ``iterator'' and ``iterable'' are just two of many type categories that exist in Python. Some other categories: callable sequence generator class instance type number integer floating-point number complex number mutable tuple mapping method built-in As far as I know, there is no uniform method of determining into which category or categories a particular object falls. Of course, there are non-uniform ways of doing so, but in general, those ways are, um, nonuniform. Therefore, if you want to check whether an object is in one of these categories, you haven't necessarily learned much about how to check if it is in a different one of these categories. So what I wonder is this: Has there been much thought about making these type categories more explicitly part of the type system?
On Tuesday 13 August 2002 02:02 pm, Andrew Koenig wrote: [...]
I'm uneasy about (4) because if an object is an iterator, calling its next() method is destructive. The implication is that you had better not use this method to test if an object is an iterator until you are ready to take irrevocable action based on that test.
The test would be non-destructive if the test only checks for the existence of the next() method. hasattr(f,"next")
Here is what I realized this morning. It may be obvious to you, but it wasn't to me (until after I realized it, of course):
``iterator'' and ``iterable'' are just two of many type categories that exist in Python.
Some other categories:
callable sequence generator class instance type number integer floating-point number complex number mutable tuple mapping method built-in
As far as I know, there is no uniform method of determining into which category or categories a particular object falls. Of course, there are non-uniform ways of doing so, but in general, those ways are, um, nonuniform. Therefore, if you want to check whether an object is in one of these categories, you haven't necessarily learned much about how to check if it is in a different one of these categories.
So what I wonder is this: Has there been much thought about making these type categories more explicitly part of the type system?
The category names look like general purpose interface names. The addition of interfaces has been discussed quite a bit. While many people are interested in having interfaces added to Python, there are many design issues that will have to be resolved before it happens. Hopefully the removal of the class/type wart and the use of interfaces in Zope will hasten the addition of interfaces. I like your list of the basic Python interfaces. Perhaps a weak version of interface definitions could be added to Python prior to a full featured capability. The weak version would simply add a __category__ attribute to the each type definition. This attribute would reference an object that defines the distinguishing features of the category interface. Enforcement would be optional, but at least the definition would be published. Adding just the definition of the type interface would create a direct benefit, but it would provide a hook for developers to use in work on optimization and testing.
On Tue, Aug 13, 2002 at 03:45:29PM -0400, Michael McLay wrote:
So what I wonder is this: Has there been much thought about making these type categories more explicitly part of the type system?
The category names look like general purpose interface names. The addition of interfaces has been discussed quite a bit. While many people are interested in having interfaces added to Python, there are many design issues that will have to be resolved before it happens.
Nope. Type categories are fundamentally different from interfaces. An interface must be declared by the type while a category can be an observation about an existing type. Two types that are defined independently in different libraries may in fact fit under the same category because they implement the same protocol. With named interfaces they may in fact be compatible but they will not expose the same explicit interface. Requiring them to import the interface from a common source starts to sound more like Java than Python and would introduce dependencies and interface version issues in a language that is wonderfully free from such arbitrary complexities. Python is a dymanic language. It deserves a dynamic type category system, not static interfaces that must be declared. It's fine to write a class and somehow say "I intend this class to be in category X, please warn me if I write a method that will make it incompatible". But I don't want declarations to be a *requirement* for being considered compatible with a protocol. I have noticed that a lots of protocols are defined retroactively by observation of the behavior of existing code. There shoudln't be any need to go tag someone else's code as conforming to a protocol or put a wrapper around it just to be able to use it. A category is defined mathematically by a membership predicate. So what we need for type categories is a system for writing predicates about types. Standard Python expressions should not be used for defining a category membership predicate. A Python expression is not a pure function. This makes it impossible to cache the results of which type belongs to what category for efficiency. Another problem is that many different expressions may be equivalent but if two independently defined categories use equivalent predicates they should *be* the same category. They should be merged at runtime just like interned strings. About a year ago I worked on a system for predicates having a canonical representation for security applications. . While I was working on it I realized that it would be perfect for implementing a type category system for Python. It would be useful at runtime for error detection and runtime queries of protocols. It would also be useful at compile time for early detection of some errors and possibly for optimization. By implementing an optional strict mode the early error detection could be improved to the point where it's effectively a static type system. Just a quick example of the usefulness of canonical predicates: if I calculate the intersection of two predicates and reduce it to canonical form it will reduce to the FALSE predicate if no input will satisfy both predicates. It will be equal to one of the predicate if it is contained by the other. I spent countless hours thinking about these issues, probably more than most people on this list... I think I have the foundation for a powerful yet unobtrusive type category system. Unfortunately it will take me some time to put it in writing and I don't have enough free time (who does?) Oren
[Oren]
Type categories are fundamentally different from interfaces. An interface must be declared by the type while a category can be an observation about an existing type.
Yup. (In Python these have often been called "protocols". Jim Fulton calls them "lore protocols".)
Two types that are defined independently in different libraries may in fact fit under the same category because they implement the same protocol. With named interfaces they may in fact be compatible but they will not expose the same explicit interface. Requiring them to import the interface from a common source starts to sound more like Java than Python and would introduce dependencies and interface version issues in a language that is wonderfully free from such arbitrary complexities.
Hm, I'm not sure if you can solve the version incompatibility problem by ignoring it. :-)
Python is a dymanic language. It deserves a dynamic type category system, not static interfaces that must be declared. It's fine to write a class and somehow say "I intend this class to be in category X, please warn me if I write a method that will make it incompatible". But I don't want declarations to be a *requirement* for being considered compatible with a protocol. I have noticed that a lots of protocols are defined retroactively by observation of the behavior of existing code. There shoudln't be any need to go tag someone else's code as conforming to a protocol or put a wrapper around it just to be able to use it.
Are you familiar with Zope's Interface package? It solves this problem (nicely, IMO) by allowing you to place an interface declaration inside a class but also allowing you to make calls to an interface registry that declare interfaces for pre-existing classes.
A category is defined mathematically by a membership predicate. So what we need for type categories is a system for writing predicates about types.
Now I think you've lost me. How can a category on the one hand be observed after the fact and on the other hand defined by a rigorous mathematical definition? How could a program tell by looking at a class whether it really is an implementation of a given protocol?
Standard Python expressions should not be used for defining a category membership predicate. A Python expression is not a pure function. This makes it impossible to cache the results of which type belongs to what category for efficiency. Another problem is that many different expressions may be equivalent but if two independently defined categories use equivalent predicates they should *be* the same category. They should be merged at runtime just like interned strings.
Again you've lost me. I expect there's something here that you assume well-known. Can you please clarify this? What on earth do you mean by "A Python expression is not a pure function" ?
About a year ago I worked on a system for predicates having a canonical representation for security applications. . While I was working on it I realized that it would be perfect for implementing a type category system for Python. It would be useful at runtime for error detection and runtime queries of protocols. It would also be useful at compile time for early detection of some errors and possibly for optimization. By implementing an optional strict mode the early error detection could be improved to the point where it's effectively a static type system.
So let's see a proposal already. I can't guess what you are proposing from this description except that you think highly of your own invention. I wouldn't expect you to mention it otherwise, so that's bits of information. :-)
Just a quick example of the usefulness of canonical predicates: if I calculate the intersection of two predicates and reduce it to canonical form it will reduce to the FALSE predicate if no input will satisfy both predicates. It will be equal to one of the predicate if it is contained by the other.
I spent countless hours thinking about these issues, probably more than most people on this list...
How presumptuous.
I think I have the foundation for a powerful yet unobtrusive type category system. Unfortunately it will take me some time to put it in writing and I don't have enough free time (who does?)
I say vaporware. :-) Tell us about it when you have time. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Aug 14, 2002 at 09:09:19AM -0400, Guido van Rossum wrote: ...
Now I think you've lost me. How can a category on the one hand be ... Again you've lost me. I expect there's something here that you assume ...
Oh dear. Here we go again. I'm afraid that it may take several frustrating iterations just to get our terminology and assumptions in sync and be able to start talking about the actual issues.
Type categories are fundamentally different from interfaces. An interface must be declared by the type while a category can be an observation about an existing type.
Yup. (In Python these have often been called "protocols". Jim Fulton calls them "lore protocols".)
Nope. For me protocols are conventions to follow for performing a certain task. A type category is a formally defined set of types. For example, the 'iterable' protocol defines conventions for a programmer to follow for doing iteration. The 'iterable' category is a set defined by the membership predicate "hasattr(t, '__iter__')". The types in the 'iterable' category presumably conform to the 'iterable' protocol so there is a mapping between protocols and type categories but it's not quite 1:1. Protocols live in documentation and lore. Type categories live in the same place where vector spaces and other formal systems live.
Two types that are defined independently in different libraries may in fact fit under the same category because they implement the same protocol. With named interfaces they may in fact be compatible but they will not expose the same explicit interface. Requiring them to import the interface from a common source starts to sound more like Java than Python and would introduce dependencies and interface version issues in a language that is wonderfully free from such arbitrary complexities.
Hm, I'm not sure if you can solve the version incompatibility problem by ignoring it. :-)
Oops, I meant interface version *numbers*, not interface versions. A version number is a unidimentional entity. Variations on protocols and subprotocols have many dimensions. I find that set theory ("an object that has a method called foo and another method called bar") works better than arithmetic ("an object with version number 2.13 of interface voom").
Are you familiar with Zope's Interface package? It solves this problem (nicely, IMO) by allowing you to place an interface declaration inside a class but also allowing you to make calls to an interface registry that declare interfaces for pre-existing classes.
I don't like the bureacracy of declaring interfaces and maintaining registeries. I like the ad-hoc nature of Python protocols and I want a type system that gives me the tools to use it better, not replace it with something more formal.
A category is defined mathematically by a membership predicate. So what we need for type categories is a system for writing predicates about types.
Now I think you've lost me. How can a category on the one hand be observed after the fact and on the other hand defined by a rigorous mathematical definition? How could a program tell by looking at a class whether it really is an implementation of a given protocol?
A category is defined mathematically. A protocol is a somewhat more fuzzy meatspace concept. A protocol can be associated with a category with reasonable accuracy so the result of a set operation on categories is reasonably applicable to the associated protocols. Even a human can't always tell whether a class is *really* an implmentation of a given protocol. But many protocols can be inferred with pretty good accuracy from the presence of methods or members. You can always add a member as a flag indicating compliance with a certain protocol if that is not enough. My basic assumption is that programmers are fundamentally lazy. It hasn't ever failed me so far. This way there is no need to declare all the protocols a class conforms to. This is important since in many cases the protocol is only "discovered" later. The user of the class knows what protocol is expected and only needs to declare that. It should reduces the tendency to use relatively coarse-grained "fat" interfaces because there is not need to declare every minor protocol the type conforms to - it may observed by users of this type using a type category.
Standard Python expressions should not be used for defining a category membership predicate. A Python expression is not a pure function. This makes it impossible to cache the results of which type belongs to what category for efficiency. Another problem is that many different expressions may be equivalent but if two independently defined categories use equivalent predicates they should *be* the same category. They should be merged at runtime just like interned strings.
Again you've lost me. I expect there's something here that you assume well-known. Can you please clarify this? What on earth do you mean by "A Python expression is not a pure function" ?
A function whose result depends only on its inputs and has no side effects. In this case I would add "and can be evaluated without triggering any Python code". Set operations on membership predicates, caching and other optimizations need such guarantees. Oren
Oren Tirosh <oren-py-d@hishome.net> writes:
Nope. For me protocols are conventions to follow for performing a certain task. A type category is a formally defined set of types.
ODP (Reference Model For Open Distributed Processing, ISO 10746) defines that a type is a predicate; it implies a set (of which it is the characteristic function). By your definition, a type category, as a formally-defined means to determine whether something belongs to the category, is a predicate, and thus still a type.
For example, the 'iterable' protocol defines conventions for a programmer to follow for doing iteration. The 'iterable' category is a set defined by the membership predicate "hasattr(t, '__iter__')".
It is not so clear that this is what defines the iterable category. It could also be defined as "the programmer can use to for doing iteration, by means of the iterable protocol".
Protocols live in documentation and lore. Type categories live in the same place where vector spaces and other formal systems live.
By that definition, I'd say that Andrew's list enumerates protocols, not type categories: they all live in lore, not in a formalism.
A category is defined mathematically. A protocol is a somewhat more fuzzy meatspace concept.
A protocol can certainly be formalized, if there is need. Of all the possible interaction sequences, you define those that follow the protocol. Then, an object that follows the protocol in all interaction sequences in which it participates is said to implement the protocol.
Again you've lost me. I expect there's something here that you assume well-known. Can you please clarify this? What on earth do you mean by "A Python expression is not a pure function" ?
A function whose result depends only on its inputs and has no side effects. In this case I would add "and can be evaluated without triggering any Python code".
For being a pure function, requiring that it does not trigger Python code seems a bit too restrictive. In any case, I think it is incorrect to say that a Python expression is not a function. Instead, it is correct to say that it is not necessarily a function. There are certainly expressions that are functions. Regards, Martin
On Thu, Aug 15, 2002 at 12:42:46AM +0200, Martin v. Loewis wrote:
Oren Tirosh <oren-py-d@hishome.net> writes:
Nope. For me protocols are conventions to follow for performing a certain task. A type category is a formally defined set of types.
ODP (Reference Model For Open Distributed Processing, ISO 10746) defines that a type is a predicate; it implies a set (of which it is the characteristic function).
A type is a predicate about an object. A category is a predicate about a type. Objects have a type. References have a category. Well, Python references currently all have the 'any' category because Python has no type checking. Any Python reference may point to an object of any type. In a dynamically typed language there is no such thing as an 'integer variable' but it can be simulated by a reference that may only point to objects in the 'integer' category.
It is not so clear that this is what defines the iterable category. It could also be defined as "the programmer can use to for doing iteration, by means of the iterable protocol".
Your definition is not formal and cannot be evaluated by a program. The iterable category matches the set of types implementing the iterable protocol with reasonable accuracy. It doesn't have to be perfect to be useful.
Protocols live in documentation and lore. Type categories live in the same place where vector spaces and other formal systems live.
By that definition, I'd say that Andrew's list enumerates protocols, not type categories: they all live in lore, not in a formalism.
Exactly.
For being a pure function, requiring that it does not trigger Python code seems a bit too restrictive.
That's not a formal requirement, it for robustness and efficiency. Oren
In a dynamically typed language there is no such thing as an 'integer variable' but it can be simulated by a reference that may only point to objects in the 'integer' category.
This seems a game with words. I don't see the difference between an integer variable and a reference that must point to an integer. (Well, I see a difference, in the sharing semantics, but that's just the difference between a value and an pointer in C. They're both variables.) --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thu, Aug 15, 2002 at 08:20:06AM -0400, Guido van Rossum wrote:
In a dynamically typed language there is no such thing as an 'integer variable' but it can be simulated by a reference that may only point to objects in the 'integer' category.
This seems a game with words. I don't see the difference between an integer variable and a reference that must point to an integer. (Well, I see a difference, in the sharing semantics, but that's just the difference between a value and an pointer in C. They're both variables.)
In C a pointer and a value are both "objects". But Python references are not objects. In a language where almost everything is an object they are a conspicous exception. A slot in a list is bound to an object but there is no introspectable object that represents the slot *itself*. And yes, sharing semantics make a big difference. My basic distinction is that type categories are not a property of objects. An object is what it is. It doesn't need "type checking". Type categories are useful to check *references* and ensure that operations on a reference are meaningful. A useful type checking system can be built that makes no change at all to objects and type, only applying tests to references. The __category__ attribute I proposed for classes is not much more than a convenient way to spell: class Foo: ... assert Foo in category The category is not stored inside the class. It is an observation about the class, not a property of the class. Oren
[I'm just jumping into this thread -- please forgive me if my reply does not make sense in the context of the past discussions on this thread -- I've only had time to read part of the archive.] On Thu, 15 Aug 2002, Oren Tirosh wrote:
In C a pointer and a value are both "objects". But Python references are not objects.
References are so transparent that you can treat them as 'instance aliases'.
In a language where almost everything is an object they are a conspicous exception.
What semantics do you propose for "reference objects"?
A slot in a list is bound to an object but there is no introspectable object that represents the slot *itself*.
Of course there is -- slots bind a memory location in an object via a descriptor object stored in its class.
And yes, sharing semantics make a big difference. My basic distinction is that type categories are not a property of objects.
Why not deconstruct these ideas a little more and really explore what we want, and not how we want to go about implementing it. Let us consider the following characteristics: 1) Static interfaces/categories vs. dynamic interfaces/categories? i.e., does a particular instance always implement a given interface or category, or can it change over the lifetime of an object (to to state changes within the object, or changes within the interface/category registry system). 2) Unified interface/category registry vs. distributed interface/category registry? i.e., are the supported interfaces/categories instrinsic to a class or are they reflections of the desires of the code wishing to use an instance? For example, there are many possible and sensible subsets of the "file-like" interface. Must any class implementing "file-like" objects know about and advertize that it implements a predefined set of interfaces? Or, should the code wishing to use the object instance be able to construct a specialized interface/category query with which to check the specific capabilities of the instance? 3) (related) Should interfaces/categories of a particular class/instance be enumerable? 4) Should interfaces/categories be monotonic with respect to inheritance? i.e., is it sufficient that base class of an instance implements a given interface/category for a derived instance to implement that interface/category.
An object is what it is. It doesn't need "type checking". Type categories are useful to check *references* and ensure that operations on a reference are meaningful.
This distinction is meaningless for Python. Objects references are not typed, and thus any reference can potentially be to any type. Putting too fine a point on the semantic difference between the type of an object and the type of an object reffered to by some reference is just playing games.
A useful type checking system can be built that makes no change at all to objects and type, only applying tests to references. The __category__ attribute I proposed for classes is not much more than a convenient way to spell:
class Foo: ...
assert Foo in category
The category is not stored inside the class. It is an observation about the class, not a property of the class.
Given this description, I am guessing that these are your answers to my previous questions: 1) dynamic, since your categories are not fixed at class creation 2) ?, either is possible, though I suspect you are advocating a standard unified category registry. 3) Yes, depending on implementation details 4) No Please correct my guesses if I am mistaken. Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com
In a language where almost everything is an object they are a conspicous exception.
Kevin> What semantics do you propose for "reference objects"? I think the idea is to constrain the circumstances under which a reference can be created in the first place. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Thu, Aug 15, 2002 at 09:55:04AM -0400, Kevin Jacobs wrote:
[I'm just jumping into this thread -- please forgive me if my reply does not make sense in the context of the past discussions on this thread -- I've only had time to read part of the archive.]
On Thu, 15 Aug 2002, Oren Tirosh wrote:
In C a pointer and a value are both "objects". But Python references are not objects.
References are so transparent that you can treat them as 'instance aliases'.
In a language where almost everything is an object they are a conspicous exception.
What semantics do you propose for "reference objects"?
No no no! I am not proposing anything like that. What I'm saying is that interfaces/categories/whateveryouwannacallit are more about references to objects than about the objects themselves and pointed out that references are not even Python objects. Two references to the same object may have very different expectations about what they are pointing to. I went a step further and decided to completely decouple it from the object: All the intelligence is in the category that makes observations about the object's form without requiring any change to the objects or types. Oren
Oren Tirosh <oren-py-d@hishome.net>:
Two references to the same object may have very different expectations about what they are pointing to.
It sounds a bit odd to talk about references having expectations. I think all Oren is trying to say is that different pieces of code may have different requirements of the same object, or the same piece of code at different times, and that it's not practical to precalculate all the requirements that might exist and put information about them in the object itself or its class, or anywhere else for that matter. So he wants to check the requirements procedurally whenever such a check is needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
I don't like the bureacracy of declaring interfaces and maintaining registeries. I like the ad-hoc nature of Python protocols and I want a type system that gives me the tools to use it better, not replace it with something more formal.
But you seem to want *something* that's more formal. How formal exactly do you have in mind? How does it differ from what Zope does? (Which I know nothing about, by the way...) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
The category names look like general purpose interface names. The addition of interfaces has been discussed quite a bit. While many people are interested in having interfaces added to Python, there are many design issues that will have to be resolved before it happens.
Oren> Nope. Type categories are fundamentally different from Oren> interfaces. An interface must be declared by the type while a Oren> category can be an observation about an existing type. Why? That is, why can't you imagine making a claim that type X meets interface Y, even though the author of neither X nor Y made that claim? However, now that you bring it up... One difference I see between interfaces and categories is that I can imagine categories carrying semantic information to the human reader of the code that is not actually expressed in the category itself. As a simple example, I can imagine a PartialOrdering category that I might like as part of the specification for an argument to a sort function. Oren> Two types that are defined independently in different libraries Oren> may in fact fit under the same category because they implement Oren> the same protocol. With named interfaces they may in fact be Oren> compatible but they will not expose the same explicit Oren> interface. Requiring them to import the interface from a common Oren> source starts to sound more like Java than Python and would Oren> introduce dependencies and interface version issues in a Oren> language that is wonderfully free from such arbitrary Oren> complexities. Why is importing an interface any worse than importing a library? I see both interfaces and categories as claims about types. Those claims might be made by the types' authors, or they might be made by the types' users. I see no reason why they should have to be any more static than the definitions of the types themselves. Oren> Python is a dymanic language. It deserves a dynamic type Oren> category system, not static interfaces that must be Oren> declared. It's fine to write a class and somehow say "I intend Oren> this class to be in category X, please warn me if I write a Oren> method that will make it incompatible". But I don't want Oren> declarations to be a *requirement* for being considered Oren> compatible with a protocol. I have noticed that a lots of Oren> protocols are defined retroactively by observation of the Oren> behavior of existing code. There shoudln't be any need to go tag Oren> someone else's code as conforming to a protocol or put a wrapper Oren> around it just to be able to use it. Oren> A category is defined mathematically by a membership Oren> predicate. So what we need for type categories is a system for Oren> writing predicates about types. Indeed, that's what I was thinking about initially. Guido pointed out that the notion could be expanded to making concrete assertions about the interface to a class. I had originally considered that those assertions could be just that--assertions, but then when Guido started talking about interfaces, I realized that my original thought of expressing satisfaction of a predicate by inheriting it could be extended by simply adding methods to those predicates. Of course, this technique has the disadvantage that it's not easy to add base classes to a class after it has been defined. Oren> Standard Python expressions should not be used for defining a Oren> category membership predicate. A Python expression is not a pure Oren> function. This makes it impossible to cache the results of which Oren> type belongs to what category for efficiency. Another problem is Oren> that many different expressions may be equivalent but if two Oren> independently defined categories use equivalent predicates they Oren> should *be* the same category. They should be merged at runtime Oren> just like interned strings. Yes. Oren> About a year ago I worked on a system for predicates having a Oren> canonical representation for security applications. . While I Oren> was working on it I realized that it would be perfect for Oren> implementing a type category system for Python. It would be Oren> useful at runtime for error detection and runtime queries of Oren> protocols. It would also be useful at compile time for early Oren> detection of some errors and possibly for optimization. By Oren> implementing an optional strict mode the early error detection Oren> could be improved to the point where it's effectively a static Oren> type system. Oren> Just a quick example of the usefulness of canonical predicates: Oren> if I calculate the intersection of two predicates and reduce it Oren> to canonical form it will reduce to the FALSE predicate if no Oren> input will satisfy both predicates. It will be equal to one of Oren> the predicate if it is contained by the other. Oren> I spent countless hours thinking about these issues, probably Oren> more than most people on this list... I think I have the Oren> foundation for a powerful yet unobtrusive type category Oren> system. Unfortunately it will take me some time to put it in Oren> writing and I don't have enough free time (who does?) Is there room to scribble it in the margin somewhere? <wink> -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Andrew Koenig wrote:
The category names look like general purpose interface names. The addition of interfaces has been discussed quite a bit. While many people are interested in having interfaces added to Python, there are many design issues that will have to be resolved before it happens.
Oren> Nope. Type categories are fundamentally different from Oren> interfaces. An interface must be declared by the type while a Oren> category can be an observation about an existing type.
Why? That is, why can't you imagine making a claim that type X meets interface Y, even though the author of neither X nor Y made that claim?
That's entirely possible, and as Guido mentioned earlier in the thread, the Zope 3 interface package allows that. I think that still currently doesn't work with built-in types yet, but that's an implementation detail, not a fundamental problem. (it's in Interface.Implements, the implements() function)
However, now that you bring it up... One difference I see between interfaces and categories is that I can imagine categories carrying semantic information to the human reader of the code that is not actually expressed in the category itself. As a simple example, I can imagine a PartialOrdering category that I might like as part of the specification for an argument to a sort function.
But isn't that exactly what interfaces are? Of course you may not want to make all interfaces explicit as it is too much programming overhead; that's in part what's nice about a dynamically typed language. However, an interface does carry semantic information to the human reader of the code that is not actually expressed in the category itself. By making interfaces explicit the human reader can also write code that introspects interface information. Or do you mean sometimes it is not useful to make interfaces explicit at all, as you're never going to introspect on them anyway? I'd say they may still be useful as documentation, in which case they seem to work like your 'category'. Or of course you can not specify them at all. Regards, Martijn
However, now that you bring it up... One difference I see between interfaces and categories is that I can imagine categories carrying semantic information to the human reader of the code that is not actually expressed in the category itself. As a simple example, I can imagine a PartialOrdering category that I might like as part of the specification for an argument to a sort function.
Martijn> But isn't that exactly what interfaces are? Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
Andrew Koenig <ark@research.att.com> writes:
Martijn> But isn't that exactly what interfaces are?
Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
An interface can certainly claim such things, in its documentation - and indeed, the documentation of interfaces typically associates certain semantics with the objects implementing the interface (and in some cases, even semantics for objects using the interface). Of course, there is typically no way to automatically *validate* such claims; you can only validate conformance to signatures. It turns out that, in Python, you cannot even do that. Regards, Martin
Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
That's done in the docs, usually. Zope even has the notion of a "marker" interface -- an interface that says "this object has property such-and-such" but which does not assert any methods or attributes. --Guido van Rossum (home page: http://www.python.org/~guido/)
Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
Guido> That's done in the docs, usually. Zope even has the notion of a Guido> "marker" interface -- an interface that says "this object has property Guido> such-and-such" but which does not assert any methods or attributes. So perhaps what I mean by a category is the set of all types that implement a particular marker interface.
On Wed, Aug 14, 2002 at 09:38:18PM -0400, Andrew Koenig wrote:
Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
Guido> That's done in the docs, usually. Zope even has the notion of a Guido> "marker" interface -- an interface that says "this object has property Guido> such-and-such" but which does not assert any methods or attributes.
So perhaps what I mean by a category is the set of all types that implement a particular marker interface.
I propose that any method or attribute may serve as a marker. This makes it possible to use an existing practice as a marker so protocols can be defined retroactively for an existing code base. It's also possible, of course, to add an attribute called 'has_property_such_and_such' to serve as an explicit marker. A type category is defined by a predicate that tests for the presence of one or more markers. Predicates can test not only for the presence of markers but also for the type category of the marker object and for call signatures. When optional type checking is implemented they should also be able to test for the categories of arguments and return values. A new category may be defined as a union or intersection of two existing categories. This is done by ANDing or ORing the membership predicates of the two categories and reducing them back to canonical form. Canonizing a predicate is done by conversion into Disjnctive Normal Form, elimination of redundant terms and products, sorting and a few other steps. A global dictionary of canonical predicates is kept (similar to interning of strings) so any equivalent categories are merged. Each type object can store a cache of categories in which it is a member so evaluation of a membership predicate only needs to be done once for each type. This may sound complicated by here's how it might work in practice: Extracting a category from an existing class: foobarlike = like(FooBar) The members of the foobarlike category are any classes that implement the same methods and attributes as FooBar, whether or not they are actually descended from it. They may be defined independently in another library. FooBar may be an abstract class used just as a template for a category. Asserting that a class must be a member of a category: class SomeClass: __category__ = like(AnotherClass) ... At the end of the class definition it will be checked whether it really is a member of that category (like(SomeClass) issubsetof like(AnotherClass)) This attribute is inherited by subclasses. Any subclass of this class will be checked whether it is still a member of the category. A subclass may also override this attribute: class InheritImplementationButNotTheCategoryCheckFrom(SomeClass): __category__ = some_other_category ... class AddAdditionalRestrictionsTo(SomeClass): __category__ = __category__ & like(YetAnotherClass) If there is a conflict between the two categories the new category will reduce to the empty set and an error will be generated. The error can be quite informative by extracting a category from the new class, subtracting it from the defined category and printing the difference. When a backward compatible change is made to a protocol (e.g. adding a new method) any modules that use the old category should still work because the new category is a subcategory of the old one. When a non backward compatible change is made (e.g. removing a method, changing its call signature) existing code may still run without complaining depending on the category it uses to do the checking. If it's a wider category that doesn't check for the method it should be ok. A non backward compatible change must change the exposed interface. This may be ensured by adding an attribute or method that serves as an explicit marker and includes a version number or is renamed in some other way when making incompatible changes. Category union may be used to check for two incompatible versions that are known to implement a common subset even if it has never been given a name, etc. Oren
Extracting a category from an existing class: foobarlike = like(FooBar)
The members of the foobarlike category are any classes that implement the same methods and attributes as FooBar, whether or not they are actually descended from it. They may be defined independently in another library.
This seems fairly userless in practice -- you almost never want to use *all* methods and attributes of a class as the characteristic. (Even if you skip names starting with _.)
FooBar may be an abstract class used just as a template for a category.
Then the like() syntax seems unnecessary. It then becomes similar to Zope's Interfaces.
Asserting that a class must be a member of a category:
class SomeClass: __category__ = like(AnotherClass) ...
In Zope: class SomeClass: __implements__ = AnotherClass By convention, AnotherClass usually has a name that indicates it is an interface: IAnotherClass.
At the end of the class definition it will be checked whether it really is a member of that category (like(SomeClass) issubsetof like(AnotherClass)) This attribute is inherited by subclasses. Any subclass of this class will be checked whether it is still a member of the category.
I've been mulling over another way to spell this; perhaps you can add categories to the inheritance list: class SomeClass(IAnotherClass): ... There's ambiguity here though: extending an interface already uses the same syntax: class IExtendedClass(IAnotherClass): ... Disambiguating based on name conventions seems wrong and unpythonic. In C++, abstract classes are those that have one or more abstract methods; maybe we can borrow from that.
A subclass may also override this attribute:
class InheritImplementationButNotTheCategoryCheckFrom(SomeClass): __category__ = some_other_category ...
My alternative spelling idea currently has no way to do this; but one is needed, and preferably one that's not too ugly.
class AddAdditionalRestrictionsTo(SomeClass): __category__ = __category__ & like(YetAnotherClass)
There's a (shallow) problem here, in that __category__ is not initially in your class's namespace: at the start of executing the class statement, you begin with an empty local namespace. --Guido van Rossum (home page: http://www.python.org/~guido/)
Oren Tirosh wrote:
On Wed, Aug 14, 2002 at 09:38:18PM -0400, Andrew Koenig wrote:
Not really. I can see how an interface can claim that a particular method exists, but not how it can claim that the method implements a function that is antisymmetric and transitive.
Guido> That's done in the docs, usually. Zope even has the notion of a Guido> "marker" interface -- an interface that says "this object has property Guido> such-and-such" but which does not assert any methods or attributes.
So perhaps what I mean by a category is the set of all types that implement a particular marker interface.
I propose that any method or attribute may serve as a marker. This makes it possible to use an existing practice as a marker so protocols can be defined retroactively for an existing code base. It's also possible, of course, to add an attribute called 'has_property_such_and_such' to serve as an explicit marker.
This is an interesting idea. I'd say you could plug such a thing into an interface system, by making 'interface.isImplementedBy()' calling some hooks that may dynamically claim an object implements an interface, based on methods and attributes. I'm not sure if it's a good idea, as if you're going to state this in code anyway it seems to me it's clearer to actually explicitly use marker interfaces instead of writing some code that guesses based on the presence of particular attributes, but it's definitely an interesting idea. [snip]
A new category may be defined as a union or intersection of two existing categories. This is done by ANDing or ORing the membership predicates of the two categories and reducing them back to canonical form.
This is similar to some ideas I came up with a couple of years ago on the types-SIG, and Guido told me to talk about it at a conference over some beer, instead. :) http://mail.python.org/pipermail/types-sig/1999-December/000633.html http://mail.python.org/pipermail/types-sig/2000-January/000923.html (see bottom: Interfaces can be implied:) and here's Guido's finding my idea too absurd: http://mail.python.org/pipermail/types-sig/2000-January/000932.html But they're still interesting ideas. :) You can basically deduce what attributes (and methods) are on an interface by just giving a whole bunch of classes that you claim implement the interface, for instance. Regards, Martijn
Martijn> Oren Tirosh wrote: Oren> I propose that any method or attribute may serve as a Oren> marker. This makes it possible to use an existing practice as a Oren> marker so protocols can be defined retroactively for an existing Oren> code base. It's also possible, of course, to add an attribute Oren> called 'has_property_such_and_such' to serve as an explicit Oren> marker. Martijn> This is an interesting idea. I'd say you could plug such a Martijn> thing into an interface system, by making Martijn> 'interface.isImplementedBy()' calling some hooks that may Martijn> dynamically claim an object implements an interface, based on Martijn> methods and attributes. In that case, a marker is really just an interface with a single element. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Wed, Aug 14, 2002 at 10:08:59AM -0400, Andrew Koenig wrote:
The category names look like general purpose interface names. The addition of interfaces has been discussed quite a bit. While many people are interested in having interfaces added to Python, there are many design issues that will have to be resolved before it happens.
Oren> Nope. Type categories are fundamentally different from Oren> interfaces. An interface must be declared by the type while a Oren> category can be an observation about an existing type.
Why? That is, why can't you imagine making a claim that type X meets interface Y, even though the author of neither X nor Y made that claim?
It's not a failure of imagination, it's a failure of terminology. In contexts where the term 'interface' is used (Java, COM, etc) it usually means something you explicitly expose from your objects. I find that the term 'category' implies something you observe after the fact without modifying the object - "these objects both have property so-and-so, let's group them together and call it a category".
However, now that you bring it up... One difference I see between interfaces and categories is that I can imagine categories carrying semantic information to the human reader of the code that is not actually expressed in the category itself. As a simple example, I can imagine a PartialOrdering category that I might like as part of the specification for an argument to a sort function.
You can define any category you like and attach a semantic meaning to it as long as you can write a membership predicate for the category. It may be based on a marker that the type must have or, in case you can't change the type (e.g. a builtin type) you can write a membership predicate that also tests for some set of specific types.
Oren> A category is defined mathematically by a membership Oren> predicate. So what we need for type categories is a system for Oren> writing predicates about types.
Indeed, that's what I was thinking about initially. Guido pointed out that the notion could be expanded to making concrete assertions about the interface to a class. I had originally considered that those assertions could be just that--assertions, but then when Guido started talking about interfaces, I realized that my original thought of expressing satisfaction of a predicate by inheriting it could be extended by simply adding methods to those predicates. Of course, this technique has the disadvantage that it's not easy to add base classes to a class after it has been defined.
That's why the intelligence should be in the membership predicate, not in the classes it selects. Nothing needs to be changed about types. Conceptually, categories apply to *references*, not to *objects*. They help you ensure that during execution certain references may only point to objects from a limited category of types so that the operations you perform on them are meaningful (though not necessarily correct). A situation that may lead to a reference pointing to an object outside the valid category should be detected as early as possible. Detecting this during compilation is great. On module import is good. At runtime it's ok. Can you can think of a better name than 'categories' to describe a set of types selected by a membership predicate? Oren
Oren Tirosh wrote:
On Wed, Aug 14, 2002 at 10:08:59AM -0400, Andrew Koenig wrote: [snip]
Why? That is, why can't you imagine making a claim that type X meets interface Y, even though the author of neither X nor Y made that claim?
It's not a failure of imagination, it's a failure of terminology. In contexts where the term 'interface' is used (Java, COM, etc) it usually means something you explicitly expose from your objects. I find that the term 'category' implies something you observe after the fact without modifying the object - "these objects both have property so-and-so, let's group them together and call it a category".
Okay, but in Python interfaces as we know them (the Scarecrow descended interfaces as in use in Zope 2 and Zope 3), you can say things like this (with some current limitations concerning basic types). Usually however one does use the __implements__ class attribute, but that's because it's often clearer and easier when one is writing a new class. [snip]
That's why the intelligence should be in the membership predicate, not in the classes it selects. Nothing needs to be changed about types. Conceptually, categories apply to *references*, not to *objects*. They help you ensure that during execution certain references may only point to objects from a limited category of types so that the operations you perform on them are meaningful (though not necessarily correct).
This is quite different from the Zope interfaces approach, I think. Zope interfaces do talk about objects, not references. This is leaning towards a more static feel of typing (even though it may be quite different from static typing in the details), which I think should be clearly marked as quite independent from a discussion on interfaces. Though I'd still call your beasties Interfaces and not categories, even though you want to use them in a statically typed way -- but please let's not reject simple interfaces just because we may want to do something complicated and involved with static typing later.. Regards, Martijn
I think we may have retired the types-sig a week or two too early... :-) This kind of discussion is great for sharpening our intellect; the types-sig had outbursts like this maybe twice a year. But I have yet to see something come out of it that was practical enough to be added to Python "now". Maybe Zope Interfaces are our best bet. They surely have been used and refined for almost four years now... --Guido van Rossum (home page: http://www.python.org/~guido/)
Why? That is, why can't you imagine making a claim that type X meets interface Y, even though the author of neither X nor Y made that claim?
Oren> It's not a failure of imagination, it's a failure of Oren> terminology. In contexts where the term 'interface' is used Oren> (Java, COM, etc) it usually means something you explicitly Oren> expose from your objects. I find that the term 'category' Oren> implies something you observe after the fact without modifying Oren> the object - "these objects both have property so-and-so, let's Oren> group them together and call it a category". But what if it is possible to express property so-and-so as an interface? It's like observing that a particular set, that someone else defined, is a group, so now all the group theorems apply to it. Similarly, if someone has defined a class, and I happen to notice that that class is really a reversible iterator, I would like a way saying so that will let anyone who wants to use that class in a context that requires a reversible iterator to do so.
However, now that you bring it up... One difference I see between interfaces and categories is that I can imagine categories carrying semantic information to the human reader of the code that is not actually expressed in the category itself. As a simple example, I can imagine a PartialOrdering category that I might like as part of the specification for an argument to a sort function.
Oren> You can define any category you like and attach a semantic Oren> meaning to it as long as you can write a membership predicate Oren> for the category. It may be based on a marker that the type must Oren> have or, in case you can't change the type (e.g. a builtin type) Oren> you can write a membership predicate that also tests for some Oren> set of specific types. Or perhaps a membership predicate that tests whether a type satisfies a particular interface. Oren> A category is defined mathematically by a membership Oren> predicate. So what we need for type categories is a system for Oren> writing predicates about types. And, perhaps, a way for defining predicates that determine whether types meet interfaces.
Indeed, that's what I was thinking about initially. Guido pointed out that the notion could be expanded to making concrete assertions about the interface to a class. I had originally considered that those assertions could be just that--assertions, but then when Guido started talking about interfaces, I realized that my original thought of expressing satisfaction of a predicate by inheriting it could be extended by simply adding methods to those predicates. Of course, this technique has the disadvantage that it's not easy to add base classes to a class after it has been defined.
Oren> That's why the intelligence should be in the membership Oren> predicate, not in the classes it selects. Nothing needs to be Oren> changed about types. Conceptually, categories apply to Oren> *references*, not to *objects*. I don't see why categories should not also apply to class objects. Oren> They help you ensure that during execution certain references Oren> may only point to objects from a limited category of types so Oren> that the operations you perform on them are meaningful (though Oren> not necessarily correct). A situation that may lead to a Oren> reference pointing to an object outside the valid category Oren> should be detected as early as possible. Detecting this during Oren> compilation is great. On module import is good. At runtime it's Oren> ok. Agreed. Oren> Can you can think of a better name than 'categories' to describe Oren> a set of types selected by a membership predicate? Not offhand. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Andrew Koenig <ark@research.att.com> writes:
So what I wonder is this: Has there been much thought about making these type categories more explicitly part of the type system?
Certainly. Such a feature has been called "interface" or "protocol"; I usually associate with "interface" a static property (a type implements an interface, by means of a declaration) and with "protocol" a dynamic property (an object conforms to a protocol, by acting according to the rules that the protocol set). Your question exist in many variations. One of it lead to the creation of the types-sig, another one triggered papers titled "Optional Static Typing", see http://www.python.org/~guido/static-typing/ The most recent version of an attempt to making interfaces part of Python is PEP 245, http://python.org/peps/pep-0245.html I believe there is agreement by now that there will be difference between declared interfaces and implemented protocols: an object may follow the protocol even if it did not declare the interface, and an object may violate a protocol even if its type did declare the interface. Beyond that, there is little agreement. Regards, Martin
While I was driving to work today, I had a thought about the iterator/iterable discussion of a few weeks ago. My impression is that that discussion was inconclusive, but a few general principles emerged from it:
1) Some types are iterators -- that is, they support calls to next() and raise StopIteration when they have no more information to give.
2) Some types are iterables -- that is, they support calls to __iter__() that yield an iterator as the result.
3) Every iterator is also an iterable, because iterators are required to implement __iter__() as well as next().
4) The way to determine whether an object is an iterator is to call its next() method and see what happens.
5) The way to determine whether an object is an iterable is to call its __iter__() method and see what happens.
I'm uneasy about (4) because if an object is an iterator, calling its next() method is destructive. The implication is that you had better not use this method to test if an object is an iterator until you are ready to take irrevocable action based on that test. On the other hand, calling __iter__() is safe, which means that you can test nondestructively whether an object is an iterable, which includes all iterators.
Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome for your uneasiness with (4) (and (5), I might add -- I don't know that __iter__ is always safe). He contrasts it with a different attitude, which might be summarized as "It's easier to ask forgiveness than permission." In many cases, there is no reason for LBYL syndrome, and it can actually cause subtle bugs. For example, a LBYL programmer could write if not os.path.exists(fn): print "File doesn't exist:", fn return fp = open(fn) ...use fp... A "forgiveness" programmer would write this as follows instead: try: fp = open(fn) except IOError, msg: print "Can't open", fn, ":", msg return ...use fp... The latter is safer; there are many reasons why the open() call in the first version could fail despite the exists() test succeeding, including insufficient permissions, lack of operating resources, a hard file lock, or another process that removed the file in the mean time. While it's not an absolute rule, I tend to dislike interface/protocol checking as an example of LBYL syndrome. I prefer to write this: def f(x): print x[0] rather than this: def f(x): if not hasattr(x, "__getitem__"): raise TypeError, "%r doesn't support __getitem__" % x print x[0] Admittedly this is an extreme example that looks rather silly, but similar type checks are common in Python code written by people coming from languages with stronger typing (and a bit of paranoia). The exception is when you need to do something different based on the type of an object and you can't add a method for what you want to do. But that is relatively rare.
Here is what I realized this morning. It may be obvious to you, but it wasn't to me (until after I realized it, of course):
``iterator'' and ``iterable'' are just two of many type categories that exist in Python.
Some other categories:
callable sequence generator class instance type number integer floating-point number complex number mutable tuple mapping method built-in
You missed the two that are most commonly needed in practice: string and file. :-) I believe that the notion of an informal or "lore" (as Jim Fulton likes to call it) protocol first became apparent when we started to use the idea of a "file-like object" as a valid value for sys.stdout.
As far as I know, there is no uniform method of determining into which category or categories a particular object falls. Of course, there are non-uniform ways of doing so, but in general, those ways are, um, nonuniform. Therefore, if you want to check whether an object is in one of these categories, you haven't necessarily learned much about how to check if it is in a different one of these categories.
So what I wonder is this: Has there been much thought about making these type categories more explicitly part of the type system?
I think this has been answered by other respondents. Interestingly enough, Jim Fulton asked me to critique the Interface package as it exists in Zope 3, from the perspective of adding (something like) it to Python 2.3. This is a descendant of the "scarecrow" proposal, http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see also http://www.zope.org/Members/jim/PythonInterfaces/Summary). The Zope3 implementation can be viewed here: http://cvs.zope.org/Zope3/lib/python/Interface/ --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome Guido> for your uneasiness with (4) (and (5), I might add -- I don't know Guido> that __iter__ is always safe). He contrasts it with a different Guido> attitude, which might be summarized as "It's easier to ask forgiveness Guido> than permission." In many cases, there is no reason for LBYL Guido> syndrome, and it can actually cause subtle bugs. For example, a LBYL Guido> programmer could write Guido> if not os.path.exists(fn): Guido> print "File doesn't exist:", fn Guido> return Guido> fp = open(fn) Guido> ...use fp... Guido> A "forgiveness" programmer would write this as follows instead: Guido> try: Guido> fp = open(fn) Guido> except IOError, msg: Guido> print "Can't open", fn, ":", msg Guido> return Guido> ...use fp... Guido> The latter is safer; there are many reasons why the open() call in the Guido> first version could fail despite the exists() test succeeding, Guido> including insufficient permissions, lack of operating resources, a Guido> hard file lock, or another process that removed the file in the mean Guido> time. Guido> While it's not an absolute rule, I tend to dislike interface/protocol Guido> checking as an example of LBYL syndrome. I prefer to write this: Guido> def f(x): Guido> print x[0] Guido> rather than this: Guido> def f(x): Guido> if not hasattr(x, "__getitem__"): Guido> raise TypeError, "%r doesn't support __getitem__" % x Guido> print x[0] I completely agree with you so far. If you have an object that you know that you intend to use in only a single way, it is usually right to just go ahead and use it that way rather than asking first. Guido> Admittedly this is an extreme example that looks rather silly, Guido> but similar type checks are common in Python code written by Guido> people coming from languages with stronger typing (and a bit of Guido> paranoia). Guido> The exception is when you need to do something different based Guido> on the type of an object and you can't add a method for what Guido> you want to do. But that is relatively rare. Perhaps the reason it's rare is that it's difficult to do. One of the cases I was thinking of was the built-in * operator, which does something completely diferent if one of its operands is an integer. Another one was the buffering iterator we were discussing earlier, which ideally would omit buffering entirely if asked to buffer a type that already supports multiple iteration.
Some other categories:
callable sequence generator class instance type number integer floating-point number complex number mutable tuple mapping method built-in
Guido> You missed the two that are most commonly needed in practice: Guido> string and file. :-) Actually, I thought of them but omitted them to avoid confusion between a type and a category with a single element. Guido> I believe that the notion of an informal or "lore" (as Jim Guido> Fulton likes to call it) protocol first became apparent when we Guido> started to use the idea of a "file-like object" as a valid Guido> value for sys.stdout. OK. So what I'm asking about is a way of making notions such as "file-like object" more formal and/or automatic. Of course, one reason for my interest is my experience with a language that supports compile-time overloading -- what I'm really seeing on the horizon is some kind of notion of overloading in Python, perhaps along the lines of ML's clausal function definitions (which I think are truly elegant). Guido> Interestingly enough, Jim Fulton asked me to critique the Interface Guido> package as it exists in Zope 3, from the perspective of adding Guido> (something like) it to Python 2.3. Guido> This is a descendant of the "scarecrow" proposal, Guido> http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see Guido> also http://www.zope.org/Members/jim/PythonInterfaces/Summary). Guido> The Zope3 implementation can be viewed here: Guido> http://cvs.zope.org/Zope3/lib/python/Interface/ I'll have a look; thanks!
Guido> The exception is when you need to do something different based Guido> on the type of an object and you can't add a method for what Guido> you want to do. But that is relatively rare.
Perhaps the reason it's rare is that it's difficult to do.
Perhaps... Is it the chicken or the egg?
One of the cases I was thinking of was the built-in * operator, which does something completely diferent if one of its operands is an integer.
Really? I suppose you're thinking of sequence repetition. I consider that one of my early mistakes (it didn't make it to my "regrets" list but probably should have). It would have been much simpler if sequences simply supported multiplcation, and in fact repeated changes to the implementation (and subtle edge cases of the semantics) are slowly nudging into this direction.
Another one was the buffering iterator we were discussing earlier, which ideally would omit buffering entirely if asked to buffer a type that already supports multiple iteration.
How do you do that in C++? I guess you overload the function that asks for the iterator, and call that function in a template. I think in Python we can ask the caller to provide a buffering iterator when a function needs one. Since we really have very little power at compile time, we sometimes need to do a little more work at run time. But the resulting language appears to be easier to understand (for most people anyway) despite the theoretical deficiency. I'm not quite sure why that is, but I am slowly developing a theory, based on a remark by Samuele Pedroni; at least I believe it was he who remarked at some point "Python has only run time", ehich got me thinking. My theory, partially developed though it is, is that it is much harder (again, for most people :-) to understand in your head what happens at compile time than it is to understand what goes at run time. Or perhaps that understanding *both* is harder than understanding only one. But I believe that for most people acquiring a sufficient mental model for what goes on at run time is simpler than the mental model for what goes on at compile time. Possibly this is because compilers really *do* rely on very sophisticated algorithms (such as deciding which overloading function is called based upon type information and available conversions). Run time on the other hand is dead simple most of the time -- it has to be, since it has to be executed by a machine that has a very limited time to make its decisions. All this reminds me of a remark that I believe is due to John Ousterhout at the VHLL conference in '94 in Santa Fe, where you & I first met. (Strangely it was Perl's Tom Christiansen who was in a large part responsible for the eclectic program.) You gave a talk about ML, and I believe it was in response to your talk that John remarked that ML was best suited for people with an IQ of over 150. That rang true to me, since the only other person besides you that I know who is a serious ML user definitely falls into that category. And ML is definitely a language that does more than the average language at compile time.
Some other categories:
callable sequence generator class instance type number integer floating-point number complex number mutable tuple mapping method built-in
Guido> You missed the two that are most commonly needed in practice: Guido> string and file. :-)
Actually, I thought of them but omitted them to avoid confusion between a type and a category with a single element.
Can you explain? Neither string (which has Unicode and 8-bit, plus a few other objects that are sufficiently string-like to be regex-searchable, like arrays) nor file (at least in the "lore protocol" interpretation of file-like object) are categories with a single element.
Guido> I believe that the notion of an informal or "lore" (as Jim Guido> Fulton likes to call it) protocol first became apparent when we Guido> started to use the idea of a "file-like object" as a valid Guido> value for sys.stdout.
OK. So what I'm asking about is a way of making notions such as "file-like object" more formal and/or automatic.
Yeah, that's the holy Grail of interfaces in Python.
Of course, one reason for my interest is my experience with a language that supports compile-time overloading -- what I'm really seeing on the horizon is some kind of notion of overloading in Python, perhaps along the lines of ML's clausal function definitions (which I think are truly elegant).
Honestly, I hadn't read this far ahead when I brought up ML above. :-) I really hope that the holy grail can be found at run time rather than compile time. Python's compile time doesn't have enough information easily available, and to gather the necessary information is very expensive (requiring whole-program analysis) and not 100% reliable (due to Python's extreme dynamic side).
Guido> Interestingly enough, Jim Fulton asked me to critique the Interface Guido> package as it exists in Zope 3, from the perspective of adding Guido> (something like) it to Python 2.3.
Guido> This is a descendant of the "scarecrow" proposal, Guido> http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see Guido> also http://www.zope.org/Members/jim/PythonInterfaces/Summary).
Guido> The Zope3 implementation can be viewed here: Guido> http://cvs.zope.org/Zope3/lib/python/Interface/
I'll have a look; thanks!
BTW A the original scarecrow proposal is at http://www.foretec.com/python/workshops/1998-11/dd-fulton-sum.html --Guido van Rossum (home page: http://www.python.org/~guido/)
Perhaps the reason it's rare is that it's difficult to do.
Guido> Perhaps... Is it the chicken or the egg? Did you hear about the two philosophers in the diner? One ordered a chicken-salad sandwich and the other ordered an egg-salad sandwich, because they wanted to see which would come first.
One of the cases I was thinking of was the built-in * operator, which does something completely diferent if one of its operands is an integer.
Guido> Really? I suppose you're thinking of sequence repetition. Right. Guido> I consider that one of my early mistakes (it didn't make it to Guido> my "regrets" list but probably should have). It would have Guido> been much simpler if sequences simply supported multiplcation, Guido> and in fact repeated changes to the implementation (and subtle Guido> edge cases of the semantics) are slowly nudging into this Guido> direction. It's still a plausible example, I think.
Another one was the buffering iterator we were discussing earlier, which ideally would omit buffering entirely if asked to buffer a type that already supports multiple iteration.
Guido> How do you do that in C++? I guess you overload the function Guido> that asks for the iterator, and call that function in a Guido> template. I think in Python we can ask the caller to provide a Guido> buffering iterator when a function needs one. Since we really Guido> have very little power at compile time, we sometimes need to do Guido> a little more work at run time. But the resulting language Guido> appears to be easier to understand (for most people anyway) Guido> despite the theoretical deficiency. I understand that, I think. The C++ library has a notion of ``iterator traits'' that is implemented by a template class named, of all thing, iterator_traits. So, for example, if T is an iterator type, then iterator_traits<T>::value_type is the type that dereferencing an object of type T will yield. To reveal what operations an iterator supports, iterator_traits<T>::iterator_category is one of the following five types, depending on the iterator: input_iterator_tag output_iterator_tag forward_iterator_tag bidirectional_iterator_tag random_access_iterator_tag Each of the last three of these types is derived from the one before it. It is possible to instantiate objects of any of these types, but the objects carry no information beyond their type and identity. Now, suppose you want to implement an algorithm that requires a bidirectional iterator, but can be done more efficiently with a random-access iterator. Then you might write something like this: // The bidirectional-iterator case template<class It> void foo_aux(It begin, It end, bidirectional_iterator_tag) { // ... } // The random-access-iterator case template<class It> void foo_aux(It begin, It end, random_access_iterator_tag) { // ... } and then you can select the appropriate algorithm at compile time this way: template<class It> void foo(It begin, It end) { foo_aux(begin, end, typename iterator_traits<It>::iterator_category()); } This code creates an extra object (the anonymous object created by the expression ``typename iterator_traits<It>::iterator_category()'' for the sole purpose of using its type to distinguish between the two overloaded versions of foo_aux. This distinction is made at compile time, and if the compiler is smart enough, it will also optimize away the empty, anonymous object. So this is an example of what I mean by ``dispatching based on a type category.'' In C++ it's done at compile time, but what I care about in the context of Python is not when it is done, but rather how convenient it is to express. (I don't think the C++ mode of expression is particularly convenient, but at least it's possible) Guido> I'm not quite sure why that is, but I am slowly developing a Guido> theory, based on a remark by Samuele Pedroni; at least I Guido> believe it was he who remarked at some point "Python has only Guido> run time", ehich got me thinking. My theory, partially Guido> developed though it is, is that it is much harder (again, for Guido> most people :-) to understand in your head what happens at Guido> compile time than it is to understand what goes at run time. Guido> Or perhaps that understanding *both* is harder than Guido> understanding only one. I have no problem believing that. Guido> But I believe that for most people acquiring a sufficient Guido> mental model for what goes on at run time is simpler than the Guido> mental model for what goes on at compile time. Possibly this Guido> is because compilers really *do* rely on very sophisticated Guido> algorithms (such as deciding which overloading function is Guido> called based upon type information and available conversions). Guido> Run time on the other hand is dead simple most of the time -- Guido> it has to be, since it has to be executed by a machine that has Guido> a very limited time to make its decisions. That's OK with me. But I'd still like a less ad-hoc way of making those run-time tests. Guido> All this reminds me of a remark that I believe is due to John Guido> Ousterhout at the VHLL conference in '94 in Santa Fe, where you & I Guido> first met. (Strangely it was Perl's Tom Christiansen who was in a Guido> large part responsible for the eclectic program.) You gave a talk Guido> about ML, and I believe it was in response to your talk that John Guido> remarked that ML was best suited for people with an IQ of over 150. I'm still not convinced that's necessarily true -- I think it depends a great deal on how ML is taught. I do believe that most of what has been written about ML is hard to follow for people who have grown up in the imperative world, but I don't think it has to be that way. Guido> That rang true to me, since the only other person besides you Guido> that I know who is a serious ML user definitely falls into that Guido> category. Thanks for the compliment! Guido> And ML is definitely a language that does more than the average Guido> language at compile time. Yes. One of the reasons I find it interesting, incidentally, is that it still manages to generate surprisingly efficient machine code.
Actually, I thought of them but omitted them to avoid confusion between a type and a category with a single element.
Guido> Can you explain? Neither string (which has Unicode and 8-bit, Guido> plus a few other objects that are sufficiently string-like to Guido> be regex-searchable, like arrays) nor file (at least in the Guido> "lore protocol" interpretation of file-like object) are Guido> categories with a single element. Fair enough. I just didn't have examples at my fingertips and thought at first that using those exmaples would confuse matters. I don't mind asking them. Guido> I believe that the notion of an informal or "lore" (as Jim Guido> Fulton likes to call it) protocol first became apparent when we Guido> started to use the idea of a "file-like object" as a valid Guido> value for sys.stdout.
OK. So what I'm asking about is a way of making notions such as "file-like object" more formal and/or automatic.
Guido> Yeah, that's the holy Grail of interfaces in Python. Cool! (I care much less about type checking because, as I mentioned in another message, there are uncheckable things such as being an order relation that I would like to use for dispatching anyway).
Of course, one reason for my interest is my experience with a language that supports compile-time overloading -- what I'm really seeing on the horizon is some kind of notion of overloading in Python, perhaps along the lines of ML's clausal function definitions (which I think are truly elegant).
Guido> Honestly, I hadn't read this far ahead when I brought up ML Guido> above. :-) :-) Guido> I really hope that the holy grail can be found at run time Guido> rather than compile time. Python's compile time doesn't have Guido> enough information easily available, and to gather the Guido> necessary information is very expensive (requiring Guido> whole-program analysis) and not 100% reliable (due to Python's Guido> extreme dynamic side). I have no problem with that. So here's a simple example of ML's clausal functions: fun len([]) = 0 | len(h::t) = len(t) + 1 Here, [] is an empty list, and h::t is ML's way of spelling cons(h,t). The two clauses (one per line) are checked in order *at run time* -- we're dispatching on the *value*, not the type of the argument. if you like, this example is equivalent to the following: fun len(x) = if x = [] then 0 else len(tl(x))+1 (well, not really, but only an ML expert will see why, and it's not germane) In the Python domain, I imagine something like this: def f(arg: Category1): .... or f(arg: Category2): .... or f(arg: Category3): I would like the implementation to try each version of f until it finds one that passes the constraints, and then executes that one. If none of them fits the bill, then it should throw an exception. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark PS: Please forgive the erratic replies -- apparently our mail gateway decided to hang onto a bunch of messages for a day or so...
Andrew Koenig <ark@research.att.com>:
In the Python domain, I imagine something like this:
def f(arg: Category1): .... or f(arg: Category2): .... or f(arg: Category3):
I would like the implementation to try each version of f until it finds one that passes the constraints
Would all the versions of f have to be written together like that? I think when most people talk of multiple dispatch they have something more flexible in mind. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Greg> Would all the versions of f have to be written together like Greg> that? I'm not sure. In ML, they do, but in ML, the tests are on values, not types (ML has neither inheritance nor overloading). Obviously, it would be nice not to have to write the versions of f together, but I haven't thought about how such a feature would be defined or implemented. Greg> I think when most people talk of multiple Greg> dispatch they have something more flexible in mind. Probably true. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Greg> Would all the versions of f have to be written together like Greg> that?
I'm not sure. In ML, they do, but in ML, the tests are on values, not types (ML has neither inheritance nor overloading). Obviously, it would be nice not to have to write the versions of f together, but I haven't thought about how such a feature would be defined or implemented.
Greg> I think when most people talk of multiple Greg> dispatch they have something more flexible in mind.
Probably true.
I can see how it could be done using some additional syntax similar to what ML uses, e.g.: def f(a: Cat1): ...code for Cat1... else f(a: Cat2): ...code for Cat2... else f(a: Cat3): ...code for Cat3... Don't take this syntax too seriously! I just mean that there is a single statement that provides the different alternative versions. Another approach would be more in the spirit of properties in 2.2: def f1(a: Cat1): ...code for Cat1... def f2(a: Cat2): ...code for Cat2... def f3(a: Cat3): ...code for Cat3... f = multimethod(f1, f2, f3) (There could be a way to spell this without having the type declaration syntax in the argument list, and do it in the multimethod() call instead, e.g. with keyword arguments or passing a list of tuples: [(f1, Cat1), (f2, Cat2), ...]. I suppose this could be extended to more arguments as well.) It might also be possible to modify a multimethod dynamically, e.g. later one could write: def f4(a: Cat4): ...code for Cat4... f.add(f4) This is more in the spirit of Python than your original proposal, which appeared like the compiler would have to gather all the definitions from different places and fuse them. That would be too complex for Python's simple-minded compiler! --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido:
I can see how it could be done using some additional syntax similar to what ML uses, e.g.:
def f(a: Cat1): ...code for Cat1... else f(a: Cat2): ...code for Cat2... else f(a: Cat3): ...code for Cat3...
As long as all the implementations have to be in one place, this is equivalent to def f(a): if belongstocategory(a, Cat1): ... elif belongstocategory(a, Cat2): ... elif belongstocategory(a, Cat3): ... so you're not gaining much from the new syntax.
It might also be possible to modify a multimethod dynamically, e.g. later one could write:
def f4(a: Cat4): ...code for Cat4...
f.add(f4)
This sort of scheme makes me uneasy, because it means that any module can change the behaviour of any call of f() in any other module. Currently, if you know the types involved in a method call, you can fairly easily track down in the source which piece of code will be called. With this sort of generic function, that will no longer be possible. It's kind of like an "import *" in reverse -- you won't know what's coming from where, and you can get things that you never even asked for. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
I can see how it could be done using some additional syntax similar to what ML uses, e.g.:
Guido> def f(a: Cat1): Guido> ...code for Cat1... Guido> else f(a: Cat2): Guido> ...code for Cat2... Guido> else f(a: Cat3): Guido> ...code for Cat3... Greg> As long as all the implementations have to be in one place, Greg> this is equivalent to Greg> def f(a): Greg> if belongstocategory(a, Cat1): Greg> ... Greg> elif belongstocategory(a, Cat2): Greg> ... Greg> elif belongstocategory(a, Cat3): Greg> ... Greg> so you're not gaining much from the new syntax. I'm not so sure. The code is alreasy somewhat simpler here, and it would be substantially simpler in examples such as def arctan(x): ... else arctan(y, x): ...
It might also be possible to modify a multimethod dynamically, e.g. later one could write:
def f4(a: Cat4): ...code for Cat4...
f.add(f4)
Greg> This sort of scheme makes me uneasy, because it means that any module Greg> can change the behaviour of any call of f() in any other Greg> module. It makes me uneasy because the behavior of programs might depend on the order in which modules are loaded. That's why I didn't suggest a way of defining the variations on f in separate places. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
From: "Andrew Koenig" <ark@research.att.com>
Greg> so you're not gaining much from the new syntax.
I'm not so sure. The code is alreasy somewhat simpler here, and it would be substantially simpler in examples such as
def arctan(x): ... else arctan(y, x): ...
It might also be possible to modify a multimethod dynamically, e.g. later one could write:
def f4(a: Cat4): ...code for Cat4...
f.add(f4)
Greg> This sort of scheme makes me uneasy, because it means that any module Greg> can change the behaviour of any call of f() in any other Greg> module.
It makes me uneasy because the behavior of programs might depend on the order in which modules are loaded. That's why I didn't suggest a way of defining the variations on f in separate places.
This concern seems most un-pythonic to my eye, since there are already all kinds of ways any module can change the behavior of any call in another module. The moset direct way is by rebinding the implementation of another module's function. Python is a dynamic language, and that is usually seen as a strength. More importantly, though, forcing all the definitions to be in one place prevents an important (you might even say the most important) use case: the author of a new type should be able to provide a a multimethod implementation corresponding to that type. For example, if I write a rational number class, I should be able to plug in a corresponding arctan implementation. I'm extra-surprised to see that Andy's uneasy about this, since a C++ feature which (colloquially) bears his name was purpose-built to make this sort of thing possible. Koenig lookup raises a similar issue: that the behavior of a function call can be changed depending on which headers are #included, and even the order they're #included in. [I personally have many other concerns about how that feature worked out in C++ - paper available on request - but the Python implementation being suggested here suffers none of those problems because of its explicit nature] -Dave
David Abrahams <dave@boost-consulting.com>:
This concern seems most un-pythonic to my eye, since there are already all kinds of ways any module can change the behavior of any call in another module.
Yes, but most of the time you don't have to use them! With this feature, it would be the *normal* way of using it.
forcing all the definitions to be in one place prevents an important (you might even say the most important) use case: the author of a new type should be able to provide a a multimethod implementation corresponding to that type.
You can get that without the notion of a generic function as a separate entity. Just have a dispatch mechanism that looks in all the arguments of a call for a method to use, instead of just the first one. That would be relatively tractable, since at least you'd know that the method must be found in one of the argument classes somewhere. It also doesn't suffer from the who-imports-the-module problem, since someone must have imported it in order to get an object of that class in the first place. The use case that this doesn't cover is where you're not defining a new class, just trying to add behaviour to handle a previously unanticipated combination of existing classes. The organisational problems involved in that aren't unique to Python, and seem to me an inherent feature of the problem itself. Where does functionality belong that isn't owned by any class? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
From: "Greg Ewing" <greg@cosc.canterbury.ac.nz>
David Abrahams <dave@boost-consulting.com>:
This concern seems most un-pythonic to my eye, since there are already all kinds of ways any module can change the behavior of any call in another module.
Yes, but most of the time you don't have to use them! With this feature, it would be the *normal* way of using it.
I don't understand. You still don't have to use it. Nobody would force you to add or encourage multimethod overloads. In fact, I think it would be most appropriate if multimethods meant to be overloaded had to be declared explicitly.
forcing all the definitions to be in one place prevents an important (you might even say the most important) use case: the author of a new type should be able to provide a a multimethod implementation corresponding to that type.
You can get that without the notion of a generic function as a separate entity. Just have a dispatch mechanism that looks in all the arguments of a call for a method to use, instead of just the first one.
That would be relatively tractable, since at least you'd know that the method must be found in one of the argument classes somewhere.
Nooooo-o-o-o-o....! (Sorry, I'm overreacting... but just a little) That approach suffers from all the problems of Koenig lookup in C++. Namely, if I provide a method foo in my class, and two different modules are invoking "foo", whose idea of the "foo" semantics am I implementing? That really becomes a problem for authors of generic functions (the ones that call the multimethods) because every time they call a function it essentially reserves the name of that function for the semantics they're it to have. This is currently, IMO, one of the most-intractable problems in C++ and I'd hate to see Python go down that path.** If you want to see the gory details, ask me to send you my paper about it.
It also doesn't suffer from the who-imports-the-module problem, since someone must have imported it in order to get an object of that class in the first place.
I don't think that's a serious problem. Multimethod definitions that apply to a given type will typically be supplied by the same module as the type.
The use case that this doesn't cover is where you're not defining a new class, just trying to add behaviour to handle a previously unanticipated combination of existing classes. The organisational problems involved in that aren't unique to Python, and seem to me an inherent feature of the problem itself. Where does functionality belong that isn't owned by any class?
Often there's behavior associated with combinations of classes from the same package or module. It's reasonable to supply that at module scope. Besides the practical problems mentioned above, I think it's unnatural to try to tie MULTImethod implementations to a single class. When you try to generalize that arrangement to two arguments, you end up with something like the __add__/__radd__ system, and generalizing it to three arguments is next to impossible. Where to supply multimethods that work for types defined in different modules/packages is an open question, but it's a question that applies to both the class-scope and module-scope approaches -Dave ** Python is very nice about using explicit qualification to associate semantics with implementation (i.e. we write self.foo(x) and not just foo(x)), and this would be a major break with that tradition. Explicit is better than implicit.
David Abrahams <dave@boost-consulting.com>:
I think it's unnatural to try to tie MULTImethod implementations to a single class.
I'm not sure what you mean by that. What I was talking about wouldn't be tied to a single class. Any given method implementation would have to reside in some class, but the dispatch mechanism would be symmetrical with respect to all its arguments.
When you try to generalize that arrangement to two arguments, you end up with something like the __add__/__radd__ system, and generalizing it to three arguments is next to impossible.
But it's exactly the same problem as implementing a generic function dispatch mechanism. If you can solve one, you can solve the other. I'm talking about replacing f(a, b, c) where f is a generic function, with (a, b, c).f (not necessarily that syntax, but that's more or less what it would mean.) The dispatch mechanism -- whatever it is -- is the same, but the generic function entity itself doesn't exist.
Where to supply multimethods that work for types defined in different modules/packages is an open question, but it's a question that applies to both the class-scope and module-scope approaches
The class-scope approach would be saying effectively that you're not allowed to have a method that doesn't belong in any class -- you have to pick a class and put it there. That doesn't solve the problem, I know, but at least it would be explicit about not solving it! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
From: "Greg Ewing" <greg@cosc.canterbury.ac.nz>
David Abrahams <dave@boost-consulting.com>:
I think it's unnatural to try to tie MULTImethod implementations to a single class.
I'm not sure what you mean by that. What I was talking about wouldn't be tied to a single class. Any given method implementation would have to reside in some class,
"reside in" is approximately equivalent to what I meant by "tied to". I think it's unnatural to force users to associate a function designed to be considered symmetrically over a combination of types (and "type categories", I hope) with a single one of those types. That approach also prevents another important use-case: I want to use type X with generic function F, and can write a plausible implementation of some multimethod call used by F for X, but the author of X didn't supply it
When you try to generalize that arrangement to two arguments, you end up with something like the __add__/__radd__ system, and generalizing it to three arguments is next to impossible.
But it's exactly the same problem as implementing a generic function dispatch mechanism. If you can solve one, you can solve the other.
I'm talking about replacing
f(a, b, c)
where f is a generic function, with
(a, b, c).f
(not necessarily that syntax, but that's more or less what it would mean.) The dispatch mechanism -- whatever it is -- is the same, but the generic function entity itself doesn't exist.
Are you talking about allowing the "self" argument of a multimethod to appear in any position in the argument list? Othewise you get a proliferation of __add__, __radd__, __r2add__, etc. methods
Where to supply multimethods that work for types defined in different modules/packages is an open question, but it's a question that applies to both the class-scope and module-scope approaches
The class-scope approach would be saying effectively that you're not allowed to have a method that doesn't belong in any class -- you have to pick a class and put it there.
That doesn't solve the problem, I know, but at least it would be explicit about not solving it!
And that's progress? Anyway, you've managed to avoid the most important problem with this approach (and this is a way of rephrasing my analogy to the problems with Koenig lookup in C++): it breaks namespaces. When a module author defines a generic function he shouldn't be forced to go to some name distribution authority to find a unique name: some_module.some_function should be enough to ensure uniqueness. Class authors had better be able to say precisely which module's idea of "some_function" they're implementing. If you want class authors to write something like: def some_module.some_function(T1: x, T2: y) within the class body, I guess it's OK with me, but it seems rather pointless to force the association with a class in that case, since the really important association is with the module defining the generic function's semantics. Explicit is better than implicit. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
David Abrahams <dave@boost-consulting.com>:
I think it's unnatural to force users to associate a function designed to be considered symmetrically over a combination of types ... with a single one of those types.
I agree, but the alternative seems to be for it to reside "out there" somewhere in an unknown location from the point of view of code which uses it.
Are you talking about allowing the "self" argument of a multimethod to appear in any position in the argument list?
Something like that. I haven't really thought it through very much. Maybe the method name could be systematically modified to indicate which argument position is "self", or something like that.
That doesn't solve the problem, I know, but at least it would be explicit about not solving it!
And that's progress?
Maybe. I don't know. At least it would generalise and make available to the user what's already there in an ad-hoc way to deal with numeric types.
Class authors had better be able to say precisely which module's idea of "some_function" they're implementing. If you want class authors to write something like:
def some_module.some_function(T1: x, T2: y)
That would only be an issue if T1 and T2 were already both using the name some_function for incompatible purposes. That can happen now anyway with multiple inheritance -- name clashes can occur whenever you inherit from two pre-existing classes. I don't see that it's any more of a problem here. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
David Abrahams <dave@boost-consulting.com>:
Koenig lookup raises a similar issue: that the behavior of a function call can be changed depending on which headers are #included, and even the order they're #included in.
But at least you can, in principle, figure out what will be done by a particular call in the source, by examining the files included by that source file. With the proposed generic function mechanism in Python, that wouldn't be true. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
From: "Greg Ewing" <greg@cosc.canterbury.ac.nz>
David Abrahams <dave@boost-consulting.com>:
Koenig lookup raises a similar issue: that the behavior of a function call can be changed depending on which headers are #included, and even the order they're #included in.
But at least you can, in principle, figure out what will be done by a particular call in the source, by examining the files included by that source file.
With the proposed generic function mechanism in Python, that wouldn't be true.
Like anything in Python, you can figure out what will happen by a. examining all the source that will be executed b. examining the state of things at runtime. What's new? ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################
ark> It makes me uneasy because the behavior of programs might depend ark> on the order in which modules are loaded. That's why I didn't ark> suggest a way of defining the variations on f in separate places. David> This concern seems most un-pythonic to my eye, since there are David> already all kinds of ways any module can change the behavior of David> any call in another module. The moset direct way is by David> rebinding the implementation of another module's David> function. Python is a dynamic language, and that is usually David> seen as a strength. Indeed. What concerns me is not dynamic behavior, but order-dependent behavior that might be occurring behind the scenes. I would really like to be confident that if I write import x, y it has the same effect as import y, x I understand that there is no guarantee of that property now, but I suspect that most people write programs in a way that does guarantee it. I would hate to see the language evolve in ways that makes it substantially more difficult to avoid such order dependencies, so I am reluctant to propose a feature that would increase that difficulty. David> More importantly, though, forcing all the definitions to be in David> one place prevents an important (you might even say the most David> important) use case: the author of a new type should be able to David> provide a a multimethod implementation corresponding to that David> type. For example, if I write a rational number class, I should David> be able to plug in a corresponding arctan implementation. Yes. I'm not saying such a feature shouldn't exist; just that I don't know what form it should take. David> I'm extra-surprised to see that Andy's uneasy about this, since David> a C++ feature which (colloquially) bears his name was David> purpose-built to make this sort of thing possible. Koenig David> lookup raises a similar issue: that the behavior of a function David> call can be changed depending on which headers are #included, David> and even the order they're #included in. The C++ #include mechanism, based as it is on copying source text, offers almost no hope of sensible behavior without active collaboration from programmers. David> [I personally have many other concerns about how that feature David> worked out in C++ - paper available on request - but the Python David> implementation being suggested here suffers none of those David> problems because of its explicit nature] Which doesn't mean it can't suffer from other problems. In particular, if I know that modules x and y overload the same function, and I want to be sure that x's case is tested first, one would think I could ensure it by writing import x, y But in fact I can't, because someone else may have imported y already, in which case the second import is a no-op. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
From: "Andrew Koenig" <ark@research.att.com>
ark> It makes me uneasy because the behavior of programs might depend ark> on the order in which modules are loaded. That's why I didn't ark> suggest a way of defining the variations on f in separate places.
David> This concern seems most un-pythonic to my eye, since there are David> already all kinds of ways any module can change the behavior of David> any call in another module. The moset direct way is by David> rebinding the implementation of another module's David> function. Python is a dynamic language, and that is usually David> seen as a strength.
Indeed. What concerns me is not dynamic behavior, but order-dependent behavior that might be occurring behind the scenes. I would really like to be confident that if I write
import x, y
it has the same effect as
import y, x
I understand that there is no guarantee of that property now, but I suspect that most people write programs in a way that does guarantee it. I would hate to see the language evolve in ways that makes it substantially more difficult to avoid such order dependencies, so I am reluctant to propose a feature that would increase that difficulty.
Oh, easily solved: "in the face of ambiguity, refuse the temptation to guess". There should be a best match rule, and if there are two best matches, it's an error. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
David> Oh, easily solved: "in the face of ambiguity, refuse the David> temptation to guess". There should be a best match rule, and David> if there are two best matches, it's an error. In the ML example I showed earlier: fun len([]) = 0 | len(h::t) = len(t) + 1 ordering is crucial: As long as the argument is not empty, both cases match, so the language is defined to test the clauses in sequence. My intuition is that people will often want to define category tests to be done in a particular order. There is no problem with such ordering as long as all of the tests are specified together. Once the tests are distributed, ordering becomes a problem, because one person's intentional order dependency is another person's ambiguity. Which means that how one specifies distributed tests will probably be different from how one specifies tests all in one place. That's yet another reason I think it may be right to consider the two problems separately.
fre 2002-08-16 klockan 15.37 skrev Andrew Koenig:
David> Oh, easily solved: "in the face of ambiguity, refuse the David> temptation to guess". There should be a best match rule, and David> if there are two best matches, it's an error.
In the ML example I showed earlier:
fun len([]) = 0 | len(h::t) = len(t) + 1
ordering is crucial: As long as the argument is not empty, both cases match, so the language is defined to test the clauses in sequence. My intuition is that people will often want to define category tests to be done in a particular order. There is no problem with such ordering as long as all of the tests are specified together.
What does "not empty" mean in this context? "not []"? Does h::t match [] or does [2] match []? Why is the ordering crucial? In Haskell: f [] = 0 f (x:xs) = 1 + f xs is totally equivalent with: f (x:xs) = 1 + f xs f [] = 0 Of course, if the different patterns overlap, THEN ordering is crucial, I just find it odd that [] and h::t would overlap... Martin -- Martin Sjögren martin@strakt.com ICQ : 41245059 Phone: +46 (0)31 7710870 Cell: +46 (0)739 169191 GPG key: http://www.strakt.com/~martin/gpg.html
In the ML example I showed earlier:
fun len([]) = 0 | len(h::t) = len(t) + 1
ordering is crucial: As long as the argument is not empty, both cases match, so the language is defined to test the clauses in sequence. My intuition is that people will often want to define category tests to be done in a particular order. There is no problem with such ordering as long as all of the tests are specified together.
Martin> What does "not empty" mean in this context? "not []"? Does h::t match [] Martin> or does [2] match []? Why is the ordering crucial? In Haskell: Martin> f [] = 0 Martin> f (x:xs) = 1 + f xs Martin> is totally equivalent with: Martin> f (x:xs) = 1 + f xs Martin> f [] = 0 I'm sorry, you're right. In this particular example, there is no overlap, so order doesn't matter. However, the general point still stands: ML patterns are order-sensitive in cases where there is overlap. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Andrew Koenig <ark@research.att.com> writes:
David> Oh, easily solved: "in the face of ambiguity, refuse the David> temptation to guess". There should be a best match rule, and David> if there are two best matches, it's an error.
In the ML example I showed earlier:
fun len([]) = 0 | len(h::t) = len(t) + 1
ordering is crucial: As long as the argument is not empty, both cases match, so the language is defined to test the clauses in sequence. My intuition is that people will often want to define category tests to be done in a particular order. There is no problem with such ordering as long as all of the tests are specified together.
If multimethods make it into Python, I think (hope!) it's a safe bet that they will look more like CLOS's multimethods than ML's pattern matching. Cheers, M. -- ZAPHOD: You know what I'm thinking? FORD: No. ZAPHOD: Neither do I. Frightening isn't it? -- The Hitch-Hikers Guide to the Galaxy, Episode 11
Andrew Koenig <ark@research.att.com>:
if I know that modules x and y overload the same function, and I want to be sure that x's case is tested first, one would think I could ensure it by writing
import x, y
But in fact I can't, because someone else may have imported y already, in which case the second import is a no-op.
So far no-one has addressed the other importing problem I mentioned, which is how to ensure that the relevant modules get imported *at all*. Currently in Python, a module gets imported because some other module needs to use a name from it. If no other module needs to do so, the module is not needed. But with generic functions, this will no longer be true. It will be possible for a module to be needed by the system as a whole, yet no other module knows that it is needed! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
From: "Guido van Rossum" <guido@python.org> to more arguments as well.)
It might also be possible to modify a multimethod dynamically, e.g. later one could write:
def f4(a: Cat4): ...code for Cat4...
f.add(f4)
This is more in the spirit of Python than your original proposal, which appeared like the compiler would have to gather all the definitions from different places and fuse them. That would be too complex for Python's simple-minded compiler!
This is most like what I had in mind. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
Guido van Rossum wrote:
Guido> The exception is when you need to do something different based Guido> on the type of an object and you can't add a method for what Guido> you want to do. But that is relatively rare.
Perhaps the reason it's rare is that it's difficult to do.
Perhaps... Is it the chicken or the egg?
Once you've defined interfaces you do end up using them this way, I've found in my own experiences. It can be more clear than the alternative if you have a set of objects in some collection that fall apart in a number of kinds -- 'content' versus 'container' type things in Zope for instance. It's nice to be able to say 'is this a container' without having to think about implementation inheritance hierarchies or trying to call a method that should exist on a container and not on a content object. And of course Zope3 uses interfaces in more advanced ways to associate objects together automatically -- a view for a content object is looked up automatically by interface, and you can automatically hook adapters that translate one interface to another together by looking them up in an interface registry as well. [snip]
BTW A the original scarecrow proposal is at http://www.foretec.com/python/workshops/1998-11/dd-fulton-sum.html
I recall looking at that for the first time and not understanding too much about the reasoning behind it, but by now I have some decent experience with the descendant of that behind me (the interface package in Zope), and it's quite nice. Many people seem to react to interfaces by associating them with static types and then rejecting the notion, but Python interface checking is just as run-time as anything else. By the way, the Twisted people are starting to use interfaces in their package; a home grown very simple implementation at first but they are trying to stay compatible with the Zope ones and are looking into adopting the Zope interface package proper. When I first discussed interfaces with some Twisted developers a year ago or so their thinking seemed quite negative, but they seem to be changing their minds, at least slowly. That's a good sign for interfaces, and I imagine it will happen with more people. Interfaces in Python are almost too trivial to understand, but surprisingly useful. I imagine this is why so many smart Python users don't get it; they either reject the notion because it seems too trivial and 'therefore useless', or because they think it must involve far more complication (static typing) and therefore it's too complicated and not in the spirit of Python. :) Regards, Martijn
Interfaces in Python are almost too trivial to understand, but surprisingly useful. I imagine this is why so many smart Python users don't get it; they either reject the notion because it seems too trivial and 'therefore useless', or because they think it must involve far more complication (static typing) and therefore it's too complicated and not in the spirit of Python. :)
No, I think it's because they only work well if they are used pervasively (not necessarily everywhere). That's why they work in Zope: not only does almost everything in Zope have an interface, but interfaces are used to implement many Zope features. I haven't made up my mind yet whether Python could benefit as much as Zope, but I am cautiosuly looking into adding something derived from Zope's interface package. Jim & I have rather different ideas on what the ideal interfaces API should look like though, so it'll be a while. Maybe I should pull down the Twisted interfaces package and see how I like their subset (I'm sure it must be a subset -- the Zope package is a true kitchen sink :-). --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Interfaces in Python are almost too trivial to understand, but surprisingly useful. I imagine this is why so many smart Python users don't get it; they either reject the notion because it seems too trivial and 'therefore useless', or because they think it must involve far more complication (static typing) and therefore it's too complicated and not in the spirit of Python. :)
No, I think it's because they only work well if they are used pervasively (not necessarily everywhere). That's why they work in Zope: not only does almost everything in Zope have an interface, but interfaces are used to implement many Zope features.
That is not true for Zope 2, and I do use interfaces in Zope 2. Zope 2 certainly doesn't use interfaces pervasively. I use them pervasively in some of my code which is a framework on top of Zope, which may weaken my argument, but it's still not true interfaces are not useful unless they're used pervasively. It's definitely a lot more powerful if you do, of course, though. I also think it may help that one can declare a class implements an interface outside said class itself, in a different section of the code. I do not have any practical experience with that outside some Zope 3 hackery, however, so I can't really defend this one very well.
I haven't made up my mind yet whether Python could benefit as much as Zope, but I am cautiosuly looking into adding something derived from Zope's interface package. Jim & I have rather different ideas on what the ideal interfaces API should look like though, so it'll be a while. Maybe I should pull down the Twisted interfaces package and see how I like their subset (I'm sure it must be a subset -- the Zope package is a true kitchen sink :-).
It's an extremely small subset and very trivial, and last I checked they used 'implements' in a different way than Zope, unfortunately (I pointed it out and they may have fixed that by now, not sure). But if you are looking for another API then the Twisted version doesn't help (except for the inadvertent 'implements()' difference). I don't consider the Zope 3 interface package to be a kitchen sink myself, but I've been working with it for a while now. I would note that some of its extensibility and introspection features is quite useful when implementing Schema (a special kind of interfaces with descriptions about non-method attributes). If a new package is to be designed I hope that those use cases will be taken into account. Regards, Martijn
From: "Guido van Rossum" <guido@python.org>
Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome for your uneasiness with (4) (and (5), I might add -- I don't know that __iter__ is always safe). He contrasts it with a different attitude, which might be summarized as "It's easier to ask forgiveness than permission." In many cases, there is no reason for LBYL syndrome, and it can actually cause subtle bugs.
While it's not an absolute rule, I tend to dislike interface/protocol checking as an example of LBYL syndrome.
<snip>
The exception is when you need to do something different based on the type of an object and you can't add a method for what you want to do. But that is relatively rare.
The main reason I want to be able to LBYL (and, AFAICT, it's the same as Alex's reason) is to support multiple dispatch. In other words, it wouldn't be user code doing the looking. The best reason to support protocol introspection is so that we can provide users with a way to write more-elegant code, instead of messing around with manual type inspection. What's your position on multiple dispatch? -Dave
The main reason I want to be able to LBYL (and, AFAICT, it's the same as Alex's reason) is to support multiple dispatch.
But isn't your application one where the types are mapped from C++? Then you should be able to dispatch on type() of the arguments. Or am I misunderstanding, and do you want to make multi-dispatch a standard paradigm in Python?
In other words, it wouldn't be user code doing the looking. The best reason to support protocol introspection is so that we can provide users with a way to write more-elegant code, instead of messing around with manual type inspection. What's your position on multiple dispatch?
That it's too inefficient in a language with run-time dispatch to even think about it. --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum" <guido@python.org>
The main reason I want to be able to LBYL (and, AFAICT, it's the same as Alex's reason) is to support multiple dispatch.
But isn't your application one where the types are mapped from C++?
Not all of them, not hardly! Boost.Python is about interoperability, not just about wrapping C++. My users are writing functions that want to accept any Python sequence as one argument (for some definition of "sequence"). They'd like to dispatch to different implementations of that function based on whether that argument is a sequence or a scalar numeric type.
Then you should be able to dispatch on type() of the arguments. Or am I misunderstanding, and do you want to make multi-dispatch a standard paradigm in Python?
Absolutely.
In other words, it wouldn't be user code doing the looking. The best reason to support protocol introspection is so that we can provide users with a way to write more-elegant code, instead of messing around with manual type inspection. What's your position on multiple dispatch?
That it's too inefficient in a language with run-time dispatch to even think about it.
That's funny, my users are very happy with how fast it works in Boost.Python. I don't see any reason it should have to be much less efficient in pure Python for most cases... the important "type categories" could be builtins. And as others have pointed out, it could even be used to get certain optimzations. -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
That's funny, my users are very happy with how fast it works in Boost.Python. I don't see any reason it should have to be much less efficient in pure Python for most cases... the important "type categories" could be builtins. And as others have pointed out, it could even be used to get certain optimzations.
Time to write a PEP. Maybe there's an implementation trick you haven't told us about? --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum" <guido@python.org>
That's funny, my users are very happy with how fast it works in Boost.Python. I don't see any reason it should have to be much less efficient in pure Python for most cases... the important "type categories" could be builtins. And as others have pointed out, it could even be used to get certain optimzations.
Time to write a PEP.
I don't know how these things usually work, but isn't it a bit early for that? I would like to have some discussion about multiple dispatch (and especially matching criteria) before investing in a formal proposal. That's what my earlier posting which got banished to the types-sig was trying to do. Getting a feel for what people are thinking about this, and getting feedback from those with lots more experience than I in matters Pythonic is important to me.
Maybe there's an implementation trick you haven't told us about?
There's not all that much to what I'm doing. I have a really simple-minded dispatching scheme which checks each overload in sequence, and takes the first one which can get a match for all arguments. That causes some problems for people who want to overload on Python float vs. int types, for example, because each one matches the other. When I get some time I plan to move to a more-sophisticated scheme which rates each match and picks the best one. It doesn't seem like it should cause a significant slowdown, but that's just intuition (AKA bullshit) talking. My users generally think C++ = fast (but hard) Python = slow (but easy) [no rude remarks from the peanut gallery, please!] So they don't tend to worry too much about the speed at the Python/C++ boundary, where this mechanism lies. It could be that they don't notice the cost because they're putting all time-critical functionality completely inside the C++ part. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
Time to write a PEP.
I don't know how these things usually work, but isn't it a bit early for that?
Not at all. If you want multiple dispatch to go into the language, you'll have to educate the rest of us here, both about the advantages, and how it can be implemented with reasonable, Pythonic semantics. A PEP is the perfect vehicle for that. A PEP doesn't have to *start* as a full formal proposal. It can go through stages and eventually end up being rejected before there ever was a full formal proposal, *or* it will eventually evolve into a full formal proposal. (For example, PEPs 245 and 246 are examples of PEPs in the very early stages. I expect PEP 245 was too early, but PEP 246 strikes me as just the right thing to get a meaningful discussion started.) --Guido van Rossum (home page: http://www.python.org/~guido/)
David Abrahams wrote:
There's not all that much to what I'm doing. I have a really simple-minded dispatching scheme which checks each overload in sequence, and takes the first one which can get a match for all arguments.
Can you explain in more detail how the matching is done? Wouldn't having some kind of type declarations be a precondition to implementing multiple dispatch. Neil
David Abrahams wrote:
There's not all that much to what I'm doing. I have a really simple-minded dispatching scheme which checks each overload in sequence, and takes
From: "Neil Schemenauer" <nas-dated-1029772483.f08c64@python.ca> the
first one which can get a match for all arguments.
Can you explain in more detail how the matching is done? Wouldn't having some kind of type declarations be a precondition to implementing multiple dispatch.
Since in Boost.Python we are ultimately wrapping C++ function and member function pointers, the type declarations are available to us. For each C++ type, any number of from_python converters may be registered with the system. Each converter can have its own matching criterion. For example, there is a pre-registered converter for each of the built-in C++ integral types which checks the source object's tp_int field to decide convertibility. When you wrap a C++ class, a from_python converter is registered whose convertibility criterion checks to see if the source object is one of my extension classes, then asks if it contains a C++ object of the appropriate type. Since we have C++ types corresponding to some of the built-in Python types (e.g. list, dict, str), the convertibility criterion for those just checks to see whether the Python object has the appropriate type. However, we're not limited to matching precise types: we could easily make a C++ type called "sequence" whose converter would match any Python sequence (if we could decide exactly what constitutes a Python sequence <.02 wink>). HTH, Dave P.S. If you want even /more/ gory details, just ask: I have plenty ;-) ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
On Tue, Aug 13, 2002 at 05:15:58PM -0400, Guido van Rossum wrote:
Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome for your uneasiness with (4) (and (5), I might add -- I don't know that __iter__ is always safe). He contrasts it with a different attitude, which might be summarized as "It's easier to ask forgiveness than permission." In many cases, there is no reason for LBYL syndrome, and it can actually cause subtle bugs. For example, a LBYL programmer could write
if not os.path.exists(fn): print "File doesn't exist:", fn return fp = open(fn) ...use fp...
A "forgiveness" programmer would write this as follows instead:
try: fp = open(fn) except IOError, msg: print "Can't open", fn, ":", msg return ...use fp...
So far I have proposed two "forgiveness" solutions to the re-iterability issue: One was to raise an error if .next() is called after StopIteration so an attempt to iterate twice over an iterator would fail noisily. You have rejected this idea, probably because too much code depends on the current documented behavior. My other proposed solution is at http://mail.python.org/pipermail/python-dev/2002-July/026960.html I suspect it got lost in the noise, though. Oren
In the message that started all of this type-category discussion, I said: As far as I know, there is no uniform method of determining into which category or categories a particular object falls. Of course, there are non-uniform ways of doing so, but in general, those ways are, um, nonuniform. Therefore, if you want to check whether an object is in one of these categories, you haven't necessarily learned much about how to check if it is in a different one of these categories. As it happens, I'm presently working on a program in which I would like to be able to determine whether a given value is: -- a member of a particular class hierarchy that I've defined; -- a callable object; -- a compiled regular expression; or -- anything else. and do something different in each of these four cases. Testing for the first category is easy: I evaluate isinstance(x, B), where B is the base class of my hierarchy. Testing for the second is also easy: I evaluate callable(x). How do I test for the third? I guess I need to know the name of the type of a compiled regular expression object. Hmmm... A quick scan through the documentation doesn't reveal it. So I do an experiment: >>> import re >>> re.compile("foo") <_sre.SRE_Pattern object at 0x111018> Hmmm... This doesn't look good -- Can I really count on a compiled regular expression being an instance of _sre.SRE_Pattern for the future?
"AK" == Andrew Koenig <ark@research.att.com> writes:
AK> How do I test for the third? I guess I need to know the name of AK> the type of a compiled regular expression object. Hmmm... A AK> quick scan through the documentation doesn't reveal it. So I do AK> an experiment:
import re re.compile("foo") AK> <_sre.SRE_Pattern object at 0x111018>
AK> Hmmm... This doesn't look good -- Can I really count on a AK> compiled regular expression being an instance of AK> _sre.SRE_Pattern for the future? I'd put this at the module level: compiled_re_type = type(re.compile("")) Then you can use isistance() to test: isinstance(re.compile("spam+"), compiled_re_type) Jeremy
Jeremy> I'd put this at the module level: Jeremy> compiled_re_type = type(re.compile("")) Jeremy> Then you can use isistance() to test: Jeremy> isinstance(re.compile("spam+"), compiled_re_type) But is it guaranteed that re.compile will always yield an object of the same type?
"AK" == Andrew Koenig <ark@research.att.com> writes:
Jeremy> I'd put this at the module level: compiled_re_type = Jeremy> type(re.compile("")) Jeremy> Then you can use isistance() to test: Jeremy> isinstance(re.compile("spam+"), compiled_re_type) AK> But is it guaranteed that re.compile will always yield an object AK> of the same type? Hard to say. I can read the code and see that the current implementation will always return objects of the same type. In fact, it's using type(sre_compile.compile("", 0)) internally to represent that type. That's not a guarantee. Perhaps Fredrik wants to reserve the right to change this in the future. It's not unusual for Python modules to be under-specified in this way. Jeremy
Jeremy> Hard to say. I can read the code and see that the current Jeremy> implementation will always return objects of the same type. Jeremy> In fact, it's using type(sre_compile.compile("", 0)) Jeremy> internally to represent that type. Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve the Jeremy> right to change this in the future. It's not unusual for Jeremy> Python modules to be under-specified in this way. The real point is that this is an example of why a uniform way of checking for such types would be nice. I shouldn't have to read the source to figure out how to tell if something is a compiled regular expression. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
"ARK" == Andrew Koenig <ark@research.att.com> writes:
Jeremy> Hard to say. I can read the code and see that the current Jeremy> implementation will always return objects of the same type. Jeremy> In fact, it's using type(sre_compile.compile("", 0)) Jeremy> internally to represent that type. Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve Jeremy> the right to change this in the future. It's not unusual Jeremy> for Python modules to be under-specified in this way. ARK> The real point is that this is an example of why a uniform way ARK> of checking for such types would be nice. I shouldn't have to ARK> read the source to figure out how to tell if something is a ARK> compiled regular expression. Let's assume for the moment that the re module wants to define an explicit type of compiled regular expression objects. This seems a sensible thing to do, and it already has such a type internally. I'm not sure how this relates to your real point. You didn't have to read the source code to figure out if something is a compiled regular expression. Instead, I recommended that you use type(obj) where obj was a compiled regular expression. It might have been convenient if there was a module constant, such that re.CompiledRegexType == type(re.compile("")). Then you asked if re.compile() was guaranteed to return an object of the same type. That question is all about the contract of the re module. The answer might have been: "No. In version X, it happens to always return objects of the same type, but in version Z, I may want to change this." I suppose we could get at the general question of checking types by assuming that re.compile() returned instances of two apparently unrelated classes and that we wanted a way to declare their relationship. I'm thinking of something like Haskell typeclasses here. Jeremy
Jeremy> Then you asked if re.compile() was guaranteed to return an Jeremy> object of the same type. That question is all about the Jeremy> contract of the re module. The answer might have been: "No. Jeremy> In version X, it happens to always return objects of the same Jeremy> type, but in version Z, I may want to change this." Jeremy> I suppose we could get at the general question of checking Jeremy> types by assuming that re.compile() returned instances of two Jeremy> apparently unrelated classes and that we wanted a way to Jeremy> declare their relationship. I'm thinking of something like Jeremy> Haskell typeclasses here. Right. And the classes don't even have to be unrelated -- it's enough that neither one is derived from the other (for instance, that they both be derived from a third class). -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Monday 19 August 2002 06:45 pm, Andrew Koenig wrote:
Jeremy> Hard to say. I can read the code and see that the current Jeremy> implementation will always return objects of the same type. Jeremy> In fact, it's using type(sre_compile.compile("", 0)) Jeremy> internally to represent that type.
Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve the Jeremy> right to change this in the future. It's not unusual for Jeremy> Python modules to be under-specified in this way.
The real point is that this is an example of why a uniform way of checking for such types would be nice. I shouldn't have to read the source to figure out how to tell if something is a compiled regular expression.
In general you wouldn't care whether is was a sre_foo or an sre_bar, just if it acts like a compiled regular expression, and therefore supports that interface. So, the real solution would be to have re assert that interface on whatever the compiler returns so that you can check for it, something like: if ISre.isImplementedBy(unknown_ob): # It's a regex Where ISre is the compiled regular expression interface object. If the implementation varies the test would still work. Even if the interface varied, the test would work (but it might break other stuff). -Casey
Casey> In general you wouldn't care whether is was a sre_foo or an sre_bar, just if Casey> it acts like a compiled regular expression, and therefore supports that Casey> interface. So, the real solution would be to have re assert that interface on Casey> whatever the compiler returns so that you can check for it, something like: Casey> if ISre.isImplementedBy(unknown_ob): Casey> # It's a regex Casey> Where ISre is the compiled regular expression interface object. If the Casey> implementation varies the test would still work. Even if the interface Casey> varied, the test would work (but it might break other stuff). Exactly.
[Jeremy Hylton]
"AK" == Andrew Koenig <ark@research.att.com> writes:
Jeremy> I'd put this at the module level: compiled_re_type = Jeremy> type(re.compile(""))
Jeremy> Then you can use isistance() to test:
Jeremy> isinstance(re.compile("spam+"), compiled_re_type)
AK> But is it guaranteed that re.compile will always yield an object AK> of the same type?
Hard to say. I can read the code and see that the current implementation will always return objects of the same type. In fact, it's using type(sre_compile.compile("", 0)) internally to represent that type.
That's not a guarantee. <snip>
This might be a stupid question, but why wouldn't isinstance(re.compile("spam+"), type(re.compile(''))) always work (this is Jeremey's code, just inlined)? Unless the instance being tested was marshalled (I think), the test should always work. Even using an unpickled instance (I think, again) should work since it would use the current implementation of a pattern object. So as long as the instance being tested is not somehow being stored and then brought back using a newer version of Python it should always work. If not true, then I have been lied to. =) -Brett C.
brett wrote:
This might be a stupid question, but why wouldn't isinstance(re.compile("spam+"), type(re.compile(''))) always work.
re.compile is a factory function, and it might (in theory) return different types for different kind of patterns. </F>
But is it guaranteed that re.compile will always yield an object of the same type?
There are no guarantees in life, but I expect that that is something that plenty of code depends on, so it will likely stay that way. --Guido van Rossum (home page: http://www.python.org/~guido/)
But is it guaranteed that re.compile will always yield an object of the same type?
Guido> There are no guarantees in life, but I expect that that is something Guido> that plenty of code depends on, so it will likely stay that way. The kind of situation I imagine is that a regular expression might be implemented not just as a single type but as a whole hierarchy of them, with the particular type used for a regular expression depending on thevalue of the regular expression. For example: class Compiled_regexp(object): # ... class Anchored_regexp(Compiled_regexp): # ... class Unanchored_regexp(Compiled_regexp): # ... where whether a regexp is anchored or unanchored depends on whether it begins with "^". (Contrived, but you get the idea). In that case, it is entirely possible that re.compile("") and re.compile("^foo") return types such that neither is an instance of the other. I understand that the regexp library doesn't work this way, and will probably never work this way, but I'm using this example to show why the technique of using the type returned by a particular library function call to identify the results of future calls doesn't work in general.
This discussion appears about over, but I haven't seen any solutions via inheritance. Other languages that lack true interfaces use abstract base classes. Python supports multiple inheritance. Isn't this enough? If the basic types are turned into abstract base classes and inserted into the builtin name space, and library and user-defined classes are reparented to the appropriate base class, then isinstance becomes the test for category inclusion. A partial example: class FileType(): def __init__(*args, **kwargs): raise AbstractClassError, \ "You cannot instantiate abstract classes" def readline(*args, **kwargs): raise NotImplementedError, \ "Methods must be overridden by their children" All "file-like" objects, beginning with file itself and StringIO, can extend FileType (or AbstractFile or File or whatever). A function expecting a file-like object or a filename can test the parameter to see if it is an instance of FileType rather than seeing if it has a readline method. Type hierarchies could be obvious (or endlessly debated): Object --> Collection --> Sequence --> List \ \ \--> Tuple \ \ \--> String \ \--> Set \ \--> Mapping --> Dict \--> FileLike --> File \--> StringIO \--> Number --> Complex \--> Real --> Integer --> Long \--> Float \--> Iterator \--> Iterable etc. The hierarchy could be further complicated with mutability (MutableSequence (e.g. list), ImmutableSequence (e.g. tuple, string)), or perhaps mutability could be a property of classes or even objects (allowing runtime marking of objects read-only? by contract? enforced?). This seems to be a library (not language) solution to the problem posed. Can the low level types implemented completely in C still descend from a python parent class without any performance hit? Can someone please point out the inferiority or infeasibility of this method? Or is it just "ugly"? -- Nathan Clegg GeerBox nathan@geerbox.com
This discussion appears about over, but I haven't seen any solutions via inheritance. Other languages that lack true interfaces use abstract base classes. Python supports multiple inheritance. Isn't this enough?
I haven't given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different. I wish I understood why he thinks so.
If the basic types are turned into abstract base classes and inserted into the builtin name space, and library and user-defined classes are reparented to the appropriate base class, then isinstance becomes the test for category inclusion.
A partial example:
class FileType(): def __init__(*args, **kwargs): raise AbstractClassError, \ "You cannot instantiate abstract classes"
def readline(*args, **kwargs): raise NotImplementedError, \ "Methods must be overridden by their children"
Except that the readline signature should be shown here.
All "file-like" objects, beginning with file itself and StringIO, can extend FileType (or AbstractFile or File or whatever). A function expecting a file-like object or a filename can test the parameter to see if it is an instance of FileType rather than seeing if it has a readline method.
Type hierarchies could be obvious (or endlessly debated):
Endlessly debated is more like it. Do you need separate types for readable files and writable files? For seekable files? For text files? Etc.
Object --> Collection --> Sequence --> List \ \ \--> Tuple \ \ \--> String
Is a string really a collection?
\ \--> Set \ \--> Mapping --> Dict
How about readonly mappings? Should every mapping support keys()? values()? items()? iterkeys(), itervalues(), iteritems()?
\--> FileLike --> File \--> StringIO \--> Number --> Complex \--> Real --> Integer --> Long
Where does short int go?
\--> Float \--> Iterator \--> Iterable
etc.
The hierarchy could be further complicated with mutability (MutableSequence (e.g. list), ImmutableSequence (e.g. tuple, string)), or perhaps mutability could be a property of classes or even objects (allowing runtime marking of objects read-only? by contract? enforced?).
Exactly. Endless debate will be yours. :-)
This seems to be a library (not language) solution to the problem posed. Can the low level types implemented completely in C still descend from a python parent class without any performance hit?
Not easily, no, but it would be possible to put most of the abstract hierarchy in C.
Can someone please point out the inferiority or infeasibility of this method? Or is it just "ugly"?
Agreeing on an ontology seems the hardest part to me. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
This discussion appears about over, but I haven't seen any solutions via inheritance. Other languages that lack true interfaces use abstract base classes. Python supports multiple inheritance. Isn't this enough?
GvR> I haven't given up the hope that inheritance and interfaces GvR> could use the same mechanisms. But Jim Fulton, based on years GvR> of experience in Zope, claims they really should be different. GvR> I wish I understood why he thinks so. Here's a go at explaining why the mechanisms need to be separate. I'm loathe to channel Jim, but I think he'd agree. We'd like to use interfaces to make fairly strong claims. If a class A implements an interface I, then we should be able to use an instance of A anywhere that an I is needed. This is just the straightforward notion of substitutability. I'm saying this is a strong claim because we want an A to behave like an I. By behave, I mean that the interface I can describe behavior beyond just a method name or signature. Why can't we use the current inheritance mechanism to implement the interface concept? Because the inheritance mechanism is too general. If we take the class A, anyone can create a subclass of it, regardless of whether that subclass implements I. Say you wanted to write LBYL code that tests whether an object implements an interface. If you use a marker class and isinstance() for the test, the inheritance rules makes it impossible to express some relationships. In particular, it is impossible to write a class B that inherits from A, but does not implement I. Since our test is isinstance(), any subclass of A will appear to implement I. This is unfortunate, because inheritance is a great implementation trick that shouldn't have anything to do with the interface. If we think about it briefly in terms of types. (Python doesn't have the explicit types, but sometimes we reason about programs as if they did.) Strongly typed OO languages have to deal in some way with subclasses that are not subtypes. Some type systems require covariance or contravariance or invariance. In some cases, you can write a class that is a subclass but is not a subtype. The latter is what we're hoping to achieve with interfaces. If we imagined an interface statement that was explicit and no inherited, then we'd be okay. class A(SomeBase): implements I class B(A): implements J Now we've got a class A that implements I and a subclass B that implements J. The test isinstance(B(), A) is true, but the test implements(B(), I) is not. It's quite helpful to have the implements() predicate which uses a rule different from isinstance(). If we don't have the separate interface concept, the language just isn't as expressive. We would have to establish a convention to sacrifice one of -- a) being able to inherit from a class just for implementation purposes or b) being able to reason about interfaces using isinstance(). a) is error prone, because the language wouldn't prevent anyone from making the mistake. b) is unfortunate, because we'd have interfaces but no formal way to reason about them. Jeremy
If we don't have the separate interface concept, the language just isn't as expressive. We would have to establish a convention to sacrifice one of -- a) being able to inherit from a class just for implementation purposes or b) being able to reason about interfaces using isinstance(). a) is error prone, because the language wouldn't prevent anyone from making the mistake. b) is unfortunate, because we'd have interfaces but no formal way to reason about them.
So the point is that it's possible to have a class D that picks up interface I somewhere in its inheritance chain, by inheriting from a class C that implements I, where D doesn't actually satisfy the invariants of I (or of C, probably). I can see that that is a useful feature. But it shouldn't have to preclude us from using inheritance for interfaces, if there was a way to "shut off" inheritance as far as isinstance() (or issubclass()) testing is concerned. C++ does this using private inheritance. Maybe we can add a similar convention to Python for denying inheritance from a given class or interface. Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger. For example, if we ever are going to add argument type declarations to Python, it will probably look like this: def foo(a: classA, b: classB): ...body... It would be convenient if this could be *defined* as assert isinstance(a, classA) and isinstance(b, classB) so that programs that have a simple class hierarchy can use their classes directly as argument types, without having to go through the trouble of declaring a parallel set of interfaces. I also think that it should be possible to come up with a set of standard "abstract" classes representing concepts like number, sequence, etc., in which the standard built-in types are nicely embedded. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Saturday 24 August 2002 08:44 am, Guido van Rossum wrote: ...
For example, if we ever are going to add argument type declarations to Python, it will probably look like this:
def foo(a: classA, b: classB): ...body...
It would be convenient if this could be *defined* as
assert isinstance(a, classA) and isinstance(b, classB)
I was hoping this could be defined as: a = adapt(a, classA) b = adapt(b, classB) but I fully agree that, if we have an elegant way to say "I'm inheriting JUST for implementation, don't let that be generally known" (C++'s private inheritance is not a perfect mechanism, because in C++ 'private' affects _accessibility_, not _visibility_, sigh), it can indeed be handier and more productive to have interfaces and classes merge into each other rather than be completely separate. Substantial experience with C++ (merged) and Java (separate) suggests that to me. From the point of view of the hypothetical 'adapt', that suggests the 'protocol' argument should be allowed to be a type (class), rather than an 'interface' that is a different entity than a type, and also the useful implication: isinstance(i, T) ==> adapt(i, T) is i
so that programs that have a simple class hierarchy can use their classes directly as argument types, without having to go through the trouble of declaring a parallel set of interfaces.
The "parallel set of interfaces" (which I had to do in Java) *was* indeed somewhat of a bother. Any time you need to develop and maintain two separate but strongly parallel trees (here, one of interfaces, and a separate parallel one of typical/suggested partial or total implementations to be used e.g. in inner classes that supply those interfaces), you're in for a spot of trouble. I even did some of that with a hand-kludged "code generator" which read a single description file and generated both the interface AND the class from it (but then of course I ended up editing the generated code and back in trouble again when maintenance was needed -- seems to happen regularly to me with code generators). Surely making the target language directly able to digest a unified description would be nicer.
I also think that it should be possible to come up with a set of standard "abstract" classes representing concepts like number, sequence, etc., in which the standard built-in types are nicely embedded.
If you manage to pull that off, it will be a WONDERFUL trick indeed. Alex
From: "Guido van Rossum" <guido@python.org>
If we don't have the separate interface concept, the language just isn't as expressive. We would have to establish a convention to sacrifice one of -- a) being able to inherit from a class just for implementation purposes or b) being able to reason about interfaces using isinstance(). a) is error prone, because the language wouldn't prevent anyone from making the mistake. b) is unfortunate, because we'd have interfaces but no formal way to reason about them.
So the point is that it's possible to have a class D that picks up interface I somewhere in its inheritance chain, by inheriting from a class C that implements I, where D doesn't actually satisfy the invariants of I (or of C, probably).
<snip>
def foo(a: classA, b: classB): ...body...
It would be convenient if this could be *defined* as
assert isinstance(a, classA) and isinstance(b, classB)
so that programs that have a simple class hierarchy can use their classes directly as argument types, without having to go through the trouble of declaring a parallel set of interfaces.
I also think that it should be possible to come up with a set of standard "abstract" classes representing concepts like number, sequence, etc., in which the standard built-in types are nicely embedded.
Ah, but not all numbers are created equal! Can I write: x << 1 ? Not if x is a float. Somebody will eventually want to categorize numeric types more-finely, e.g. Monoid, Euclidean Ring, ... It sounds to my C++ ear like you're trying to make this analogous to runtime polymorphism in C++. I think Python's polymorphism is a lot closer to what we do at compile-time in C++, and it should stay that way: no inheritance relationship needed... at least, not on the surface. Here's why: people inevitably discover new type categories in the objects and types they're already using. In C++ this happened when Stepanov et al discovered that built-in pointers matched his mental model of random-access iterators. A similar thing will happen in Python when you make all numbers inherit from Number but someone wants to impose the real mathematical categories (or heck: Integer vs. Fractional) on them. What Stepanov's crew did did was to invent iterator traits, which decouple the type's category from the type itself. Each category is represented by a class, and those classes do have an inherticance relationship (i.e. every random_access_iterator IS-A bidirectional_iterator). Actually, I have no problem with collecting type category info from an object's MRO: as Guido implies, that will often be the simplest way to do it. However, I think there ought to be a parallel mechanism which allows additional categorization non-intrusively, and it was my understanding that the PEP Alex has been promoting does that. -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
David> It sounds to my C++ ear like you're trying to make this David> analogous to runtime polymorphism in C++. I think Python's David> polymorphism is a lot closer to what we do at compile-time in David> C++, and it should stay that way: no inheritance relationship David> needed... at least, not on the surface. Here's why: people David> inevitably discover new type categories in the objects and David> types they're already using. In C++ this happened when Stepanov David> et al discovered that built-in pointers matched his mental David> model of random-access iterators. A similar thing will happen David> in Python when you make all numbers inherit from Number but David> someone wants to impose the real mathematical categories (or David> heck: Integer vs. Fractional) on them. David> What Stepanov's crew did did was to invent iterator traits, David> which decouple the type's category from the type itself. Each David> category is represented by a class, and those classes do have David> an inherticance relationship (i.e. every random_access_iterator David> IS-A bidirectional_iterator). In other words, there *is* an inheritance relationship in C++'s compile-time polymorphism, and iterator traits are one way of expressing that polymorphism. So we have two desirable properties: 1) Guido's suggestion that interface specifications are close enough to classes that they should be classes, and should be inherited like classes, possibly with a way of hiding that inheritance for special cases; 2) Dave's suggestion that people other than a class author might wish to make claims about the interface that the class supports. I now remember that in one of my earlier messages, I said something related to (2) as well. Is there a way of merging these two ideas? -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Sat, Aug 24, 2002 at 02:44:27AM -0400, Guido van Rossum wrote:
Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger.
Inheriting the implementation without implementing the same interfaces is only one reason to want an interface mechanism that is not 100% tied to inheritance. Objects written by different authors on two sides of the globe often implement the same protocol without actually inheriting the definition from a common module. I can pass these objects to a method expecting this protocol and it will work just fine (most of the time...) I would like to be able to declare that I need an object with a specific interface even if the object was written long before and I don't want to modify an existing library just to make it conform to my interface names. Strictly defined named interfaces like Zope interfaces are also important but most of the interfaces I use in everyday programming are more ad-hoc in nature and are often defined retroactively.
For example, if we ever are going to add argument type declarations to Python, it will probably look like this:
def foo(a: classA, b: classB): ...body...
It would be convenient if this could be *defined* as
assert isinstance(a, classA) and isinstance(b, classB)
In your Optional Static Typing presentation slides you define "type expressions". If the isinstance method accepted a type expression object as its second argument this assertion would work for interfaces that are not defined by strict hierarchical inheritance.
so that programs that have a simple class hierarchy can use their classes directly as argument types, without having to go through the trouble of declaring a parallel set of interfaces.
...and classes could be used too. They are just type expressions that match a single class. BTW, isinstance already supports a simple form of this: a tuple is interpreted as an "OR" type expression. You can say that isinstance returns True if the object is an instance of one of the types matched by the type expression. Oren
Good point, Oren. We now have two requirements for interfaces that are different than the standard inheritance mechanism. It should be possible to: - inherit from a class without implementing that class's interfaces - declare that a class implements an interface outside the class statement It's harder to support the second requirement using the current inheritance mechanism. Jeremy
On Saturday 24 August 2002 05:15 pm, Jeremy Hylton wrote:
Good point, Oren. We now have two requirements for interfaces that are different than the standard inheritance mechanism. It should be possible to:
- inherit from a class without implementing that class's interfaces
- declare that a class implements an interface outside the class statement
It's harder to support the second requirement using the current inheritance mechanism.
The second requirement is a good part of what adaptation is meant to do. As I understand, that's exactly what Zope3 already provides for its interfaces. You don't just "declare" the fact -- you register an adapter that can provide whatever is needed to make it so. I.e., if object X does already implement interface Y without ANY need for tweaking/renaming/whatever, I guess the registered adapter can just return the object X it receives as an argument. More often, the adapter will return some (hopefully thin) wrapper over X that deals with renaming, signature-adaptation, and the like. That's how it works in Zope3 (at least as I understood from several discussions with Jim Fulton and Guido -- haven't studied Zope3 yet), and I think that such "external adaptation" functionality, however dressed up, should definitely be a part of whatever Python ends up with. Alex
On Sat, Aug 24, 2002 at 11:15:56AM -0400, Jeremy Hylton wrote:
Good point, Oren. We now have two requirements for interfaces that are different than the standard inheritance mechanism. It should be possible to:
- inherit from a class without implementing that class's interfaces
- declare that a class implements an interface outside the class statement
It's harder to support the second requirement using the current inheritance mechanism.
I want to go a step further. I don't want to declare that a class implements an interface outside the class statement. I don't want to declare *anything* about classes. My approach centers on the user of the class rather than the provider. The user can declare what he *expects* from the class and the inteface checking will verify that the class meets these requirements. In a way this is what you already do in Python - you use the object and if it doesn't meet your expectations it raises an exception. Exceptions are raised for both bad form and bad content. Bad content will still trigger an exception when you try to use it but bad form can be detected much earlier. See http://www.tothink.com/python/predicates I originally developed this for rulebases in security applications. I am now porting it to Python and cleaning it up. I think it should be an effective way to write assertions about the form of class objects based on methods, call signatures, etc. If/when type checking is added to Python it should also be possible to specify specific types for arguments and return values. Oren
"Oren" == Oren Tirosh <oren-py-d@hishome.net> writes:
Oren> I would like to be able to declare that I need an object Oren> with a specific interface even if the object was written Oren> long before and I don't want to modify an existing library Oren> just to make it conform to my interface names. class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): pass I'm not saying this is a good idea :), but I believe this problem is already solvable in the current language. The wrapper class should pass the test of isinstance for the interface class, but the existing class as the first parent should implement all of the calls. Note that most other languages that actually support proper interfaces (i.e. Java) would have similar trouble adding an interface to a prior existing class without modifying its definition. Python actually provides a much simpler solution than others might, it seems to me. -- Nathan Clegg GeerBox nathan@geerbox.com
From: "Nathan Clegg" <nathan@geerbox.com>
"Oren" == Oren Tirosh <oren-py-d@hishome.net> writes:
Oren> I would like to be able to declare that I need an object Oren> with a specific interface even if the object was written Oren> long before and I don't want to modify an existing library Oren> just to make it conform to my interface names.
class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): pass
I'm not saying this is a good idea :), but I believe this problem is already solvable in the current language. The wrapper class should pass the test of isinstance for the interface class, but the existing class as the first parent should implement all of the calls.
Note that most other languages that actually support proper interfaces (i.e. Java) would have similar trouble adding an interface to a prior existing class without modifying its definition. Python actually provides a much simpler solution than others might, it seems to me.
The problem is that we want to use ExistingClass *objects* where AbstractInterfaceClass is required. If someone else has written a module containing: def some_fantastic_function(AbstractInterfaceClass: x) ... And I have written a function: def my_func(generator) for x in generator: some_fantastic_function(x) If there's a generator lying about which produces ExistingClass, I ought to be able to pass it to my_func. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
"Oren" == Oren Tirosh <oren-py-d@hishome.net> writes: Oren> I would like to be able to declare that I need an object Oren> with a specific interface even if the object was written Oren> long before and I don't want to modify an existing library Oren> just to make it conform to my interface names.
Nathan> class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): Nathan> pass Nathan> I'm not saying this is a good idea :), but I believe this problem is Nathan> already solvable in the current language. Not quite. You are creating a new class with the desired property, but it can sometimes be desirable to assert properties about types that already exist. For example, suppose I invent a GroupUnderPlus property for types for which the + operator has group properties. I would like to be able to say that int has that property, and not have to derive a new class from int in order to do so. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Guido> For example, if we ever are going to add argument type declarations to Guido> Python, it will probably look like this: Guido> def foo(a: classA, b: classB): Guido> ...body... Guido> It would be convenient if this could be *defined* as Guido> assert isinstance(a, classA) and isinstance(b, classB) Guido> so that programs that have a simple class hierarchy can use their Guido> classes directly as argument types, without having to go through the Guido> trouble of declaring a parallel set of interfaces. Guido> I also think that it should be possible to come up with a set of Guido> standard "abstract" classes representing concepts like number, Guido> sequence, etc., in which the standard built-in types are nicely Guido> embedded. I agree completely. Any use of inheritance that satisfies Liskov substitutability will satisfy interface inheritance too, and although it is possible to think of uses of inheritance that arent substutitable, they're unusual enough that they should probably require (syntactic) special pleading, if only to alert the reader. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Sat, Aug 24, 2002, Guido van Rossum wrote:
Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger.
For example, if we ever are going to add argument type declarations to Python, it will probably look like this:
def foo(a: classA, b: classB): ...body...
I'm curious, and I don't recall having seen anything about this: why wouldn't we simply use attributes to hold this information, like __slots__? After all, attributes get inherited, too, and there's no need to pretzel the syntax. Using attributes IMO would make it easier to handle the case where derived classes need to mangle type and interface declarations. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
From: "Aahz" <aahz@pythoncraft.com>
On Sat, Aug 24, 2002, Guido van Rossum wrote:
Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger.
For example, if we ever are going to add argument type declarations to Python, it will probably look like this:
def foo(a: classA, b: classB): ...body...
I'm curious, and I don't recall having seen anything about this: why wouldn't we simply use attributes to hold this information, like __slots__? After all, attributes get inherited, too, and there's no need to pretzel the syntax. Using attributes IMO would make it easier to handle the case where derived classes need to mangle type and interface declarations.
A few weeks ago I realized there was reason in principle that declaring a class satisfies an interface shouldn't just amount to adding the interface to the class' __bases__ (as Guido has been suggesting all along). Why not? Am we missing somethings? -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
A few weeks ago I realized there was reason in principle that ^^^^^^^^^^ Did you mean "was no reason"???
declaring a class satisfies an interface shouldn't just amount to adding the interface to the class' __bases__ (as Guido has been suggesting all along).
Why not? Am we missing somethings?
We'd need a trick to deny an interface that would be inherited by default. Something like private inheritance. There's also the ambiguity of inheriting from a single interface: does that create a sub-interface or an implementation of the interface? Of course with your C++ hat on you probably don't care. On Mondays, Wednesdays, Fridays and alternating Sundays I don't care either. --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum" <guido@python.org>
A few weeks ago I realized there was reason in principle that ^^^^^^^^^^ Did you mean "was no reason"???
declaring a class satisfies an interface shouldn't just amount to adding the interface to the class' __bases__ (as Guido has been suggesting all along).
Why not? Am we missing somethings?
We'd need a trick to deny an interface that would be inherited by default. Something like private inheritance.
I think it's more than that. You might need to "uninherit": Say Interface A begets class B which begets class C. What if C doesn't fulfill A?
There's also the ambiguity of inheriting from a single interface: does that create a sub-interface or an implementation of the interface? Of course with your C++ hat on you probably don't care. On Mondays, Wednesdays, Fridays and alternating Sundays I don't care either.
With my C++ hat on I can't even imagine this. In C++ we don't express interfaces in code: they're written down as "concepts" in the some documentation somewhere (no, I don't think an abstract class in C++ is a good analogy for these Python interfaces). -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
A few weeks ago I realized there was reason in principle that ^^^^^^^^^^ Did you mean "was no reason"???
So did you?
declaring a class satisfies an interface shouldn't just amount to adding the interface to the class' __bases__ (as Guido has been suggesting all along).
Why not? Am we missing somethings?
We'd need a trick to deny an interface that would be inherited by default. Something like private inheritance.
I think it's more than that. You might need to "uninherit": Say Interface A begets class B which begets class C. What if C doesn't fulfill A?
Sorry, I meant to include that case. How do you do that in C++? Inherit privately from B and publicly from A, and making A virtual base everywhere?
There's also the ambiguity of inheriting from a single interface: does that create a sub-interface or an implementation of the interface? Of course with your C++ hat on you probably don't care. On Mondays, Wednesdays, Fridays and alternating Sundays I don't care either.
With my C++ hat on I can't even imagine this. In C++ we don't express interfaces in code: they're written down as "concepts" in the some documentation somewhere (no, I don't think an abstract class in C++ is a good analogy for these Python interfaces).
What's the difference between an abstract class and an interface in C++? --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum" <guido@python.org>
We'd need a trick to deny an interface that would be inherited by default. Something like private inheritance.
I think it's more than that. You might need to "uninherit": Say Interface A begets class B which begets class C. What if C doesn't fulfill A?
Sorry, I meant to include that case. How do you do that in C++?
We don't use inheritance for this kind of interface. When we're making a Java-style interface, sure, inheritance works fine in C++. However, because of Python's dynamic, generic nature what we've been calling an interface for Python is much more like a "concept", which has no direct expression in code: http://www.boost.org/more/generic_programming.html#concept Actually if you read a little further on down the page (the Traits and Tag Dispatching sections), you'll see that it's possible to create an expression in code of a concept in C++. Usually you want to do that when concepts form a refinement hierarchy (e.g. bidirectional_iterator refines forward_iterator) which may or may not correspond to inheritance relationships.
Inherit privately from B and publicly from A, and making A virtual base everywhere?
I guess you /could/ do that. I don't think anyone does, though ;-) I was going to say that is seems to me if you can dynamically inject base classes in Python there's no problem using inheritance to do this sort of labelling. However, on third though, maybe there is a problem. Suppose you have an inheritance chain A->B->C...->Z and I come a long later to say that A fulfills interface II and add II to A's bases. Which of A's subclasses also fulfill II. I might not know. I might not even know about them. For this, maybe you'd need a way to express inheritance that goes just "one level deep" (i.e. A inherits II publicly, but nothing else does). And that might just screw with the notion of inheritance enough that you want a separate parallel mechanism. So I guess I'm back to where I was before. Inheritance probably doesn't work out too well for expressing "satisfies interface". ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
"DA" == David Abrahams <dave@boost-consulting.com> writes:
DA> I was going to say that is seems to me if you can dynamically DA> inject base classes in Python there's no problem using DA> inheritance to do this sort of labelling. However, on third DA> though, maybe there is a problem. Suppose you have an DA> inheritance chain A->B->C...->Z and I come a long later to say DA> that A fulfills interface II and add II to A's bases. Which of DA> A's subclasses also fulfill II. I might not know. I might not DA> even know about them. For this, maybe you'd need a way to DA> express inheritance that goes just "one level deep" (i.e. A DA> inherits II publicly, but nothing else does). And that might DA> just screw with the notion of inheritance enough that you want a DA> separate parallel mechanism. DA> So I guess I'm back to where I was before. Inheritance probably DA> doesn't work out too well for expressing "satisfies interface". I had similar third thoughts a couple of weeks ago :-). So I guess I agree with you. Jeremy
"JH" == Jeremy Hylton <jeremy@alum.mit.edu> writes:
"DA" == David Abrahams <dave@boost-consulting.com> writes:
DA> I was going to say that is seems to me if you can dynamically DA> inject base classes in Python there's no problem using DA> inheritance to do this sort of labelling. However, on third DA> though, maybe there is a problem. Suppose you have an DA> inheritance chain A->B->C...->Z and I come a long later to say DA> that A fulfills interface II and add II to A's bases. Which of DA> A's subclasses also fulfill II. I might not know. I might not DA> even know about them. For this, maybe you'd need a way to DA> express inheritance that goes just "one level deep" (i.e. A DA> inherits II publicly, but nothing else does). And that might DA> just screw with the notion of inheritance enough that you want DA> a separate parallel mechanism. DA> So I guess I'm back to where I was before. Inheritance DA> probably doesn't work out too well for expressing "satisfies DA> interface". JH> I had similar third thoughts a couple of weeks ago :-). So I JH> guess I agree with you. I tend to agree as well. But to play devil's advocate for a moment: I think Guido said that inheritance won't be the only way to spell conforms-to, but it'll be the predominately common way. So you'd definitely need a way to spell that outside of inheritance as your example clearly shows. Which means that any conformsto() function will have to be more complicated because it'll need to check both mechanisms. Is that a worthwhile price to pay to allow conforms-to-by-inheritance? What I don't like about the inheritance mechanism is that the syntax isn't explicit. I look at a class definition and I don't really know what's a base class for implementation purposes and what's an interface assertion. It might even be difficult if I had the source code for all the classes in the base class list if there's little except convention to syntactically distinguish between a class definition and an interface definition (no keyword, but just a stylized bunch of defs). I think it's going to be important to know what's an interface and what's a base class. Naming conventions (IThingie) can help but aren't enforced. -Barry
From: "Guido van Rossum" <guido@python.org>
A few weeks ago I realized there was reason in principle that ^^^^^^^^^^ Did you mean "was no reason"???
Oh. Yup. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com
Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger.
For example, if we ever are going to add argument type declarations to Python, it will probably look like this:
def foo(a: classA, b: classB): ...body...
I'm curious, and I don't recall having seen anything about this: why wouldn't we simply use attributes to hold this information, like __slots__? After all, attributes get inherited, too, and there's no need to pretzel the syntax. Using attributes IMO would make it easier to handle the case where derived classes need to mangle type and interface declarations.
That's exactly what Zope does with the __inherits__ attribute. But it's got limitations: there's only one __inherits__ attribute, so it isn't automatically merged properly on multiple inheritance, and adding one new interface to it means you have to copy or reference the base class __inherits__ attribute. Also, __slots__ is provisional. The plan is for this to eventually get nicer syntax (when I get over my fear of adding new keywords :-). --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, Sep 13, 2002, Guido van Rossum wrote:
Aahz:
I'm curious, and I don't recall having seen anything about this: why wouldn't we simply use attributes to hold this information, like __slots__? After all, attributes get inherited, too, and there's no need to pretzel the syntax. Using attributes IMO would make it easier to handle the case where derived classes need to mangle type and interface declarations.
That's exactly what Zope does with the __inherits__ attribute.
But it's got limitations: there's only one __inherits__ attribute, so it isn't automatically merged properly on multiple inheritance, and adding one new interface to it means you have to copy or reference the base class __inherits__ attribute.
Isn't that what metaclasses are for? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
On Fri, 23 Aug 2002, Guido van Rossum wrote:
I haven't given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different. I wish I understood why he thinks so.
If i may hazard a guess, i'd imagine that Jim's answer would simply be that inheritance (of implementation) doesn't imply subtyping, and subtyping doesn't imply inheritance. That is, you may often want to re-use the implementation of one class in another class, but this doesn't mean the new class will meet all of the commitments of the old. Conversely, you may often want to declare that different classes adhere to the same set of commitments (i.e. provide the same interface) even if they have different implementations. (A common situation where the latter occurs is when the implementations are written by different people.)
Agreeing on an ontology seems the hardest part to me.
Indeed. One of the advantages of separating inheritance and subtyping is that this can give you a bit more flexibility in setting up the ontology, which may make it easier to settle on something good. -- ?!ng
[Guido]
I haven't given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different. I wish I understood why he thinks so.
[Ping]
If i may hazard a guess, i'd imagine that Jim's answer would simply be that inheritance (of implementation) doesn't imply subtyping, and subtyping doesn't imply inheritance.
Well, yes, of course. But I strongly believe that in *most* cases, inheritance and subtyping go hand in hand. I'd rather invent a mechanism to deal with the exceptions rather than invent two parallel mechanisms that must both be deployed separately to get the full benefit out of them.
That is, you may often want to re-use the implementation of one class in another class, but this doesn't mean the new class will meet all of the commitments of the old. Conversely, you may often want to declare that different classes adhere to the same set of commitments (i.e. provide the same interface) even if they have different implementations. (A common situation where the latter occurs is when the implementations are written by different people.)
Nevertheless, these are exceptions to the general rule.
Agreeing on an ontology seems the hardest part to me.
Indeed. One of the advantages of separating inheritance and subtyping is that this can give you a bit more flexibility in setting up the ontology, which may make it easier to settle on something good.
Really? Given that there are no inheritance relationships between the existing built-in types, I would think that you could define an ontology consisting entirely of abstract types, and then graft the concrete types on it. I don't see what having separate interfaces would buy you. But perhaps you can give an example that shows your point? --Guido van Rossum (home page: http://www.python.org/~guido/)
[GvR]
[Ping]
If i may hazard a guess, i'd imagine that Jim's answer would simply be that inheritance (of implementation) doesn't imply subtyping, and subtyping doesn't imply inheritance.
Well, yes, of course. But I strongly believe that in *most* cases, inheritance and subtyping go hand in hand. I'd rather invent a mechanism to deal with the exceptions rather than invent two parallel mechanisms that must both be deployed separately to get the full benefit out of them.
One exception being to able to declare conformance to an interface after-the-fact in some sweet way.
Agreeing on an ontology seems the hardest part to me.
Indeed. One of the advantages of separating inheritance and subtyping is that this can give you a bit more flexibility in setting up the ontology, which may make it easier to settle on something good.
Really? Given that there are no inheritance relationships between the existing built-in types, I would think that you could define an ontology consisting entirely of abstract types, and then graft the concrete types on it. I don't see what having separate interfaces would buy you. But perhaps you can give an example that shows your point?
E.g. my ideas of declaring partial conformance and of super-interfaces identified as a base-interface plus a subset of signatures do not fit so well in a just-abstract-classes model. But OTOH I insist, IMO, given how python code is written now, they would be handy although complex. regards.
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:
SP> One exception being to able to declare conformance to an SP> interface after-the-fact in some sweet way. This is a very important use case, IMO. I'm leary of trying to weave some interface taxonomy into the standard library and types without having a lot of experience in using this for real world applications. Even then, it's possible <wink> that there will be a lot of disagreement on the shape of the type hierarchy. So one strategy would be to not classify the existing types and classes ahead of time, but to provide a way for an application to declare conformance to existing types in a way that makes sense for the application (or library). The downside of this is that it may lead to a raft of incompatible interface declarations, but I also think that eventually we'd see convergence as we gain more experience. My guess would be that of all the interfaces that get defined and used in the Python community, on a few that are commonly agreed on or become ubiquitous idioms will migrate into the core. I don't think we need to solve this "problem" for the core types right away. Let's start by providing mechanism and not policy. -Barry
I'm leary of trying to weave some interface taxonomy into the standard library and types without having a lot of experience in using this for real world applications. Even then, it's possible <wink> that there will be a lot of disagreement on the shape of the type hierarchy.
That's what I said when I predicted it would be hard to come up with an ontology. :-)
So one strategy would be to not classify the existing types and classes ahead of time, but to provide a way for an application to declare conformance to existing types in a way that makes sense for the application (or library). The downside of this is that it may lead to a raft of incompatible interface declarations, but I also think that eventually we'd see convergence as we gain more experience.
This is what Zope does. One problem is that it has its own notion of what makes a "sequence", a "mapping", etc. that don't always match the Pythonic convention.
My guess would be that of all the interfaces that get defined and used in the Python community, on a few that are commonly agreed on or become ubiquitous idioms will migrate into the core. I don't think we need to solve this "problem" for the core types right away. Let's start by providing mechanism and not policy.
Sure. But does that mean the mechanism needs to be necessarily separate from the inheritance mechanism? --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
>> I'm leary of trying to weave some interface taxonomy into the >> standard library and types without having a lot of experience >> in using this for real world applications. Even then, it's >> possible <wink> that there will be a lot of disagreement on the >> shape of the type hierarchy. GvR> That's what I said when I predicted it would be hard to come GvR> up with an ontology. :-) Actually, it'll be easy, that's why we'll do it a hundred times. :) >> So one strategy would be to not classify the existing types and >> classes ahead of time, but to provide a way for an application >> to declare conformance to existing types in a way that makes >> sense for the application (or library). The downside of this >> is that it may lead to a raft of incompatible interface >> declarations, but I also think that eventually we'd see >> convergence as we gain more experience. GvR> This is what Zope does. One problem is that it has its own GvR> notion of what makes a "sequence", a "mapping", etc. that GvR> don't always match the Pythonic convention. Yep, that's a problem. One approach might be to provide some blessed or common interfaces in a module, but don't weave them into the types. OTOH, I suspect that big apps and frameworks like Zope may want their own notion anyway, and hopefully it'll be fairly easy for components that want to play with Zope to add the proper interface conformance assertions. >> My guess would be that of all the interfaces that get defined >> and used in the Python community, on a few that are commonly >> agreed on or become ubiquitous idioms will migrate into the >> core. I don't think we need to solve this "problem" for the >> core types right away. Let's start by providing mechanism and >> not policy. GvR> Sure. But does that mean the mechanism needs to be GvR> necessarily separate from the inheritance mechanism? It definitely means that there has to be a way to separate them that is largely transparent to all the code that checks, uses, asserts, etc. interfaces. IOW, if we allow all of inheritance, __implements__, and a registry to assert conformance to an interface, the built-in conformsto() -- or whatever we call it -- has to know about all these accepted variants and should return True for any match. -Barry
On Mon, Aug 26, 2002 at 11:29:55AM -0400, Barry A. Warsaw wrote:
I'm leary of trying to weave some interface taxonomy into the standard library and types without having a lot of experience in using this for real world applications. Even then, it's possible <wink> that there will be a lot of disagreement on the shape of the type hierarchy.
So one strategy would be to not classify the existing types and classes ahead of time, but to provide a way for an application to declare conformance to existing types in a way that makes sense for the application (or library). The downside of this is that it may lead to a raft of incompatible interface declarations, but I also think that eventually we'd see convergence as we gain more experience.
My guess would be that of all the interfaces that get defined and used in the Python community, on a few that are commonly agreed on or become ubiquitous idioms will migrate into the core. I don't think we need to solve this "problem" for the core types right away. Let's start by providing mechanism and not policy.
+1 for a non-creationist approach to type categories. Oren
[GvR]
[Ping]
If i may hazard a guess, i'd imagine that Jim's answer would simply be that inheritance (of implementation) doesn't imply subtyping, and subtyping doesn't imply inheritance.
Well, yes, of course. But I strongly believe that in *most* cases, inheritance and subtyping go hand in hand. I'd rather invent a mechanism to deal with the exceptions rather than invent two parallel mechanisms that must both be deployed separately to get the full benefit out of them.
[Samuele]
One exception being to able to declare conformance to an interface after-the-fact in some sweet way.
I've heard of people who add mix-in base classes after the fact by using assignment to __bases__. (This is currently not supported by new-style classes, but it's on my list of things to fix.) If that's not acceptable (it certainly looks questionable to me :-), I guess a separate registry may have to be created; ditto for deviations in the other direction (implementation inheritance without interface conformance).
E.g. my ideas of declaring partial conformance and of super-interfaces identified as a base-interface plus a subset of signatures do not fit so well in a just-abstract-classes model. But OTOH I insist, IMO, given how python code is written now, they would be handy although complex.
Yes, I'll have to think about that idea some more. It's appealing because it matches current Pythonic practice better than anything else. OTOH I want a solution that can be verified at compile time. --Guido van Rossum (home page: http://www.python.org/~guido/)
[GvR]
[me]
E.g. my ideas of declaring partial conformance and of super-interfaces identified as a base-interface plus a subset of signatures do not fit so well in a just-abstract-classes model. But OTOH I insist, IMO, given how python code is written now, they would be handy although complex.
Yes, I'll have to think about that idea some more. It's appealing because it matches current Pythonic practice better than anything else.
Thanks, I was under the impression nobody cared. In the end you could discard the notion, the semantics are maybe too complex, but I think it is really worth some thinking.
OTOH I want a solution that can be verified at compile time.
Here I don't get what you are referring to. I have indicated some possible sloppy interpretations but just in order to care for transitioning code. But under the precise interpretation they are checkable (maybe it is costly and complex to do so and that's your point?): class Source: def read(self): ... # other methods e.g. could declare to implement partially FileLike (that means the matching subset of signatures), or be very precise and declare that it implements FileLike{read} and FileLike{read} given FileLike has a very precise interpretation even at compile-time. regards.
OTOH I want a solution that can be verified at compile time.
Here I don't get what you are referring to.
Not specifically to your proposal.
I have indicated some possible sloppy interpretations but just in order to care for transitioning code. But under the precise interpretation they are checkable (maybe it is costly and complex to do so and that's your point?):
class Source: def read(self): ...
# other methods
e.g. could declare to implement partially FileLike (that means the matching subset of signatures), or be very precise and declare that it implements FileLike{read}
and FileLike{read} given FileLike has a very precise interpretation even at compile-time.
That's great. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Monday 26 August 2002 10:45 am, Guido van Rossum wrote:
Indeed. One of the advantages of separating inheritance and subtyping is that this can give you a bit more flexibility in setting up the ontology, which may make it easier to settle on something good.
Really? Given that there are no inheritance relationships between the existing built-in types, I would think that you could define an ontology consisting entirely of abstract types, and then graft the concrete types on it. I don't see what having separate interfaces would buy you. But perhaps you can give an example that shows your point?
Several posts have expressed a need to support multiple ontologies for a given set of classes. This doesn't preclude using the ontology that is defined by the class hierarchy as one method for defining an ontology. That could be the default ontology. What is missing is the ability to also place a class in to an alternate ontology that may be specific to an application. The problem could be solved if applications had the ability to add and delete references to the type interface definition that apply to a class. Aclass.declaretype(AnInterface) Aclass.deletetype(AnInterface) Perhaps the interface definitions should also be able to add themselves to class definitions. That way common interface patterns that apply to standard libraries could be defined in the standard library. This would eliminate the repeated addition of interfaces to classes in each application. Interface I: pass I.appliesto(Aclass) Removing an interface from a class might not be possible, or it may require a second class implementation to be created at compile time, because usage of that interface may be required in some other module. I suspect having two implementations of the same class might be somewhat confusing to the user. Perhaps removal of a required interface would trigger an exception.
Guido> Well, yes, of course. But I strongly believe that in *most* cases, Guido> inheritance and subtyping go hand in hand. I'd rather invent a Guido> mechanism to deal with the exceptions rather than invent two parallel Guido> mechanisms that must both be deployed separately to get the full Guido> benefit out of them. I think there are (at least) three important kinds of problems, not just two. 1) You want to define a new class that inherit from a class that already exists, and you intend your class to be in all of the categories of its base class(es). This case is probably the most common, so whatever mechanism we adopt should make it easy to write. 2) You want to define a new class that inherits from a class that already exists, but you do *not* want your class to be in all of the categories of its base classes. Perhaps you are inheriting for implementation purposes only. I think we all agree that this case is relatively rare--but it is not completely unheard of, so there should be a way of coping with it. 3) You want to define a new type category, and assert that some classes that already exist are members of this category. You do not want to modify the definitions of these classes in order to do so. Defining new classes that inherit from the existing ones does not solve the problem, because you would then have to change code all over the place to make it use the new classes. Here is an example of (3). I might want to define a TotallyOrdered category, together with a sort function that accepts only a container with elements that are TotallyOrdered. For example: def sort(x): if __debug__: for i in x: assert i in TotallyOrdered # or whatever # continue with the sort here. If someone uses my sort function to sort a container of objects that are not of built-in types, I don't mind imposing the requirement on the user to affirm that those types do indeed meet the TotallyOrdered requirement. What I do not want to do, however, is require that the person making the claim is also the author of the class about which the claim is being made. Incidentally, it just occurred to me that if we regard categories as claims about types (or, if you like, predicate functions with type arguments), then it makes sense to include (Cartesian) product types. What I mean by this is that the TotallyOrdered category is really a category of pairs of types. Note that the comparison operators generally work on arguments of different type, so to make a really appropriate claim about total ordering, I really need a way to say not just that a single type (such as int) is totally ordered, but that all combinations of types from a particular set are totally ordered (or not -- recall that on most implementations it is possible to find three numbers x, y, z such that total ordering fails, as long as you mix int and float with sufficiently evil intent). -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
On Mon, Aug 26, 2002 at 11:57:31AM -0400, Andrew Koenig wrote:
Incidentally, it just occurred to me that if we regard categories as claims about types (or, if you like, predicate functions with type arguments), then it makes sense to include (Cartesian) product types.
Would such as product type be anything more than than a predicate about tuples? Something like the (T1, T2, T3) case in Guido's static typing presentation[1] where T1, T2 and T3 are type predicates rather than just types. [1] http://www.python.org/~guido/static-typing/sld008.htm ) Oren
Oren> On Mon, Aug 26, 2002 at 11:57:31AM -0400, Andrew Koenig wrote:
Incidentally, it just occurred to me that if we regard categories as claims about types (or, if you like, predicate functions with type arguments), then it makes sense to include (Cartesian) product types.
Oren> Would such as product type be anything more than than a Oren> predicate about tuples? No, I don't think it would. Indeed, ML completely unifies Cartesian product types and tuples in a very, very cool way: Every function takes exactly one argument and yields exactly one result. However, the argument or result can be a tuple. So in ML, when I write f(x,y) that really means to bundle x and y into a tuple, and call f with that tuple as its argument. So, for example, if I write val xy = (x,y) which defines a variable named xy and binds it to the tuple (x,y), then f xy means exactly the same thing as f(x,y) The parentheses are really tuple constructors, and ML doesn't require parentheses for function calls at all. However, if you're going to define predicates over tuples of (Python) types, then you had better not try to define those predicates as part of the tuples' class definitions, because they don't have one.
Indeed, ML completely unifies Cartesian product types and tuples in a very, very cool way:
Every function takes exactly one argument and yields exactly one result.
However, the argument or result can be a tuple.
So in ML, when I write
f(x,y)
that really means to bundle x and y into a tuple, and call f with that tuple as its argument. So, for example, if I write
val xy = (x,y)
which defines a variable named xy and binds it to the tuple (x,y), then
f xy
means exactly the same thing as
f(x,y)
The parentheses are really tuple constructors, and ML doesn't require parentheses for function calls at all.
ABC did this, and very early Python did this, too (but Python always required parentheses for calls). However, adding optional arguments caused trouble: after def f(a, b=1): print a*b t = (1, 2) what should f(t) mean? It could mean either f((1, 2), 1) or f(1, 2). So we had to get rid of that. I suppose ML doesn't have optional arguments (in the sense of Python), so the problem doesn't occur there; that's why it wasn't a problem in ABC. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> ABC did this, and very early Python did this, too (but Python always Guido> required parentheses for calls). However, adding optional arguments Guido> caused trouble: after Guido> def f(a, b=1): Guido> print a*b Guido> t = (1, 2) Guido> what should Guido> f(t) Guido> mean? It could mean either f((1, 2), 1) or f(1, 2). So we had to get Guido> rid of that. I suppose ML doesn't have optional arguments (in the Guido> sense of Python), so the problem doesn't occur there; that's why it Guido> wasn't a problem in ABC. Right -- ML doesn't have optional arguments. It does, however, have clausal definitions, which can serve a similar purpose: fun f[a, b] = a*b | f[a] = a Here, the square brackets denote lists, much as they do in Python. So you can call this function with a list that has one or two elements. The list's arguments must be integers, because if you don't say what type the operands of * are, it assumes int. If you were to call this function with a list with other than one or two elements, it would raise an exception. You can't do the analogous thing with tuples in ML: fun f(a, b) = a*b | f(a) = a for a rather surprising reason: The ML type inference mechanism sees from the first clause (f(a, b) = a*b) that the argument to f must be a 2-element tuple, which means that in the *second* clause, `a' must also be a 2-element tuple. Otherwise the argument of f would not have a single, well-defined type. But if `a' is a 2-element tuple, that means that the type of the result of f is also a 2-element tuple. That type is inconsistent with the type of a*b, which is int. So the compiler will complain about this definition because the function f cannot return both an int and a tuple at the same time. If we were to define it this way: fun f(a, b) = a*b | f(a) = 42 the compiler would now accept it. However, it would give a warning that the second clause is irrelevant, because there is no argument you can possibly give to f that would cause the second clause to match without first causing the first clause to match.
On Mon, Aug 26, 2002 at 03:10:59PM -0400, Andrew Koenig wrote:
Oren> Would such as product type be anything more than than a Oren> predicate about tuples?
No, I don't think it would.
(explanation about ML functions deleted)
Can you give a more concrete example of what could a cartesian product of type predicates actually stand for in Python?
However, if you're going to define predicates over tuples of (Python) types, then you had better not try to define those predicates as part of the tuples' class definitions, because they don't have one.
Naturally. Andrew, in reply to your "scribble in the margin" question about two weeks ago see http://www.tothink.com/python/predicates Oren
Oren> Can you give a more concrete example of what could a cartesian Oren> product of type predicates actually stand for in Python? Consider my TotallyOrdered suggestion from before. I would like to have a way of saying that for any two types T1 and T2 (where T1 might equal T2) chosen from the set {int, long, float}, < imposes a total ordering on values of those types. Come to think of it, that's not really a Cartesian product. Rather, it's a claim about the members of the set union(int,union(long, float)).
On Mon, Aug 26, 2002 at 03:51:13PM -0400, Andrew Koenig wrote:
Oren> Can you give a more concrete example of what could a cartesian Oren> product of type predicates actually stand for in Python?
Consider my TotallyOrdered suggestion from before. I would like to have a way of saying that for any two types T1 and T2 (where T1 might equal T2) chosen from the set {int, long, float}, < imposes a total ordering on values of those types.
Come to think of it, that's not really a Cartesian product. Rather, it's a claim about the members of the set union(int,union(long, float)).
Isn't it easier to just spell it union(int, long, float)? Your example helped me make the distinction between two very types of type categories: 1. Type categories based on form: presence of methods, call signatures, etc. 2. Type categories based on semantics. Semantic categories only live within a single form category. A method call cannot possibly be semantically correct if it isn't well-formed: it will cause a runtime error. But a method call that is well-formed may or may not be semantically correct. A language *can* verify well-formedness. It cannot verify semantical correctness but it can provide tools to help developers communicate their semantic expectations. Form-based categories may be used to convey semantic categories: just add a dummy method or member to serve as a marker. It can force an interface with an otherwise identical form to be intentionally incompatible to help you detect semantic categorization errors. The opposite is not true: semantic categories cannot be used to enforce well-formedness. You can mark a class as implementing the "TotallyOrdered" interface when it doesn't even have a comparison method. A similar case can happen when using inheritance for categorization: a subclass may modify the call signatures, making the class form-incompatible but it still retains its ancestry which may be interpreted in some cases as a marker of a semantic category. Oren
Oren> On Mon, Aug 26, 2002 at 03:51:13PM -0400, Andrew Koenig wrote: Oren> Can you give a more concrete example of what could a cartesian Oren> product of type predicates actually stand for in Python?
Consider my TotallyOrdered suggestion from before. I would like to have a way of saying that for any two types T1 and T2 (where T1 might equal T2) chosen from the set {int, long, float}, < imposes a total ordering on values of those types.
Come to think of it, that's not really a Cartesian product. Rather, it's a claim about the members of the set union(int,union(long, float)).
Oren> Isn't it easier to just spell it union(int, long, float)? Yes but I have a cold today so I'm not thinking clearly. Oren> Your example helped me make the distinction between two very Oren> types of type categories: Oren> 1. Type categories based on form: presence of methods, call signatures, etc. Oren> 2. Type categories based on semantics. Oren> Semantic categories only live within a single form category. A Oren> method call cannot possibly be semantically correct if it isn't Oren> well-formed: it will cause a runtime error. But a method call Oren> that is well-formed may or may not be semantically correct. Yes. Oren> A language *can* verify well-formedness. It cannot verify Oren> semantical correctness but it can provide tools to help Oren> developers communicate their semantic expectations. Yes. Oren> Form-based categories may be used to convey semantic categories: Oren> just add a dummy method or member to serve as a marker. It can Oren> force an interface with an otherwise identical form to be Oren> intentionally incompatible to help you detect semantic Oren> categorization errors. Remember that one thing I consider important is the ability to claim that classes written by others belong to a category defined by me. I do not want to have to modify those classes in order to do so. So, for example, if I want to say that int is TotallyOrdered, I do not want to have to modify the definition of int to do so. Oren> The opposite is not true: semantic categories cannot be used to Oren> enforce well-formedness. You can mark a class as implementing Oren> the "TotallyOrdered" interface when it doesn't even have a Oren> comparison method. Yes. But semantic categories are useful anyway. Oren> A similar case can happen when using inheritance for Oren> categorization: a subclass may modify the call signatures, Oren> making the class form-incompatible but it still retains its Oren> ancestry which may be interpreted in some cases as a marker of a Oren> semantic category. Right. And several people have noted that it can be desirable for subclasses sometimes not to be members of all of their base classes' categories.
On Mon, Aug 26, 2002 at 04:33:53PM -0400, Andrew Koenig wrote:
Oren> Form-based categories may be used to convey semantic categories: Oren> just add a dummy method or member to serve as a marker. It can Oren> force an interface with an otherwise identical form to be Oren> intentionally incompatible to help you detect semantic Oren> categorization errors.
Remember that one thing I consider important is the ability to claim that classes written by others belong to a category defined by me. I do not want to have to modify those classes in order to do so.
How about union(int, long, float, has_marker("TotallyOrdered")) ? This basically means "I know that int, long and float are totally ordered and I'm willing to take your word for it if you claim that your type is also totally ordered". If the set of types that match a predicate is cached it should be at least as efficient as any other form of runtime interface checking.
Oren> The opposite is not true: semantic categories cannot be used to Oren> enforce well-formedness. You can mark a class as implementing Oren> the "TotallyOrdered" interface when it doesn't even have a Oren> comparison method.
Yes. But semantic categories are useful anyway.
Sure they are, but if form-based categories can be used to define semantic categories but not the other way around makes a point in favor of using form-based categories as as the basic form for categories implemented by the language. Inheritance of implementation also inherits the form (methods and call signatures). If you don't go out of your way to modify it a subclass will usually also be a subcategory so this should be pretty transparent most of the time. Form-based categories are a tool for making claims about code: "under condition X the method Y should not raise NameError or TypeError". If you want, you also use this tool to make semantic claims about your data types. With compile-time type inference these claims can be upgraded to the level of formal proofs. Oren
Remember that one thing I consider important is the ability to claim that classes written by others belong to a category defined by me. I do not want to have to modify those classes in order to do so.
Oren> How about union(int, long, float, has_marker("TotallyOrdered")) ? How about it? There is still the question of how to make such claims. Oren> Inheritance of implementation also inherits the form (methods Oren> and call signatures). If you don't go out of your way to modify Oren> it a subclass will usually also be a subcategory so this should Oren> be pretty transparent most of the time. Right. So how do you define a subclass that you do not want to be a subcategory?
On Mon, Aug 26, 2002 at 05:17:28PM -0400, Andrew Koenig wrote:
Remember that one thing I consider important is the ability to claim that classes written by others belong to a category defined by me. I do not want to have to modify those classes in order to do so.
Oren> How about union(int, long, float, has_marker("TotallyOrdered")) ?
How about it? There is still the question of how to make such claims.
Oren> Inheritance of implementation also inherits the form (methods Oren> and call signatures). If you don't go out of your way to modify Oren> it a subclass will usually also be a subcategory so this should Oren> be pretty transparent most of the time.
Right. So how do you define a subclass that you do not want to be a subcategory?
If you make an incompatible change to a method call signature it will just happen by itself. If that's what you meant - good. It that wasn't what you meant this serves as a form of error checking. It gets harder if you want to remove a method or marker. The problem is that there is currently no way to mask inherited attributes. This will require either a language extension that will allow you to del them or using some other convention for this purpose. Oren
It gets harder if you want to remove a method or marker. The problem is that there is currently no way to mask inherited attributes. This will require either a language extension that will allow you to del them or using some other convention for this purpose.
Can't you use this? def B: def foo(self): pass def C: foo = None # Don't implement foo --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Aug 26, 2002 at 05:56:52PM -0400, Guido van Rossum wrote:
It gets harder if you want to remove a method or marker. The problem is that there is currently no way to mask inherited attributes. This will require either a language extension that will allow you to del them or using some other convention for this purpose.
Can't you use this?
def B: def foo(self): pass
def C: foo = None # Don't implement foo
This comes closer: def raise_attributeerror(self): raise AttributeError RemoveAttribute = property(raise_attributeerror) class A: def f(self): print "method A.f" def g(self): print "method A.g" class B: f = RemoveAttribute a = A() b = B() a.f() a.g() print hasattr(b, "f"), hasattr(B, "f"), hasattr(b, "g"), hasattr(B, "g") try: b.f except AttributeError: print "b.f does not exist (correctly)" else: print "Expected AttributeError not raised" b.g() writing 'b.f' will raise AttributeError, but unfortunately hasattr(B, 'f') will still return True. Jeff
On Mon, Aug 26, 2002 at 08:39:44PM -0500, jepler@unpythonic.net wrote:
On Mon, Aug 26, 2002 at 05:56:52PM -0400, Guido van Rossum wrote:
It gets harder if you want to remove a method or marker. The problem is that there is currently no way to mask inherited attributes. This will require either a language extension that will allow you to del them or using some other convention for this purpose.
Can't you use this?
def B: def foo(self): pass
def C: foo = None # Don't implement foo
This comes closer:
def raise_attributeerror(self): raise AttributeError
RemoveAttribute = property(raise_attributeerror)
class A: def f(self): print "method A.f" def g(self): print "method A.g"
class B: f = RemoveAttribute
Yes, that's a good solution. But it should be some special builtin out-of-band value, not a user-defined property.
writing 'b.f' will raise AttributeError, but unfortunately hasattr(B, 'f') will still return True.
This isn't necessarily a problem but hasattr could be taught about this out-of-band value. -- Proposed hierarchy for categories, types and interfaces: +category +type +int +str etc. +interface +Iattribute +Icallsignature +Iunion +Iintersection etc. Both types and interfaces define a set. The set membership test is the 'isinstance' function (implemented by a new slot). For types the set membership is defined by inheritance - the isinstance handler will get the first argument's type and crawl up the __bases__ DAG to see if it finds the itself. Interfaces check the object's form instead of its ancestry. An Iattribute interface checks for the presence of a single attribute and applies another interface check to its value. An Icallsignature interface checks if the argument is a callable object with a specified number of arguments, default arguments, etc. An Iintersection interface checks that the argument matches a set of categories. example: interface readable: def read(bytes: int): str def readline(): str def readlines(): [str] is just a more convenient way to write: readable = Iintersection( Iattribute('read', Icallsignature(str, ('bytes', int) )), Iattribute('readline', Icallsignature(str)), Iattribute('readlines', Icallsignature(Ilistof(str))) ) The name 'readable' is simply bound to the resulting object; interfaces are defined by their value, not their name. The types of arguments and return values will not be checked at first and only serve as documentation. Note that they don't necessarily have to be types - they can be interfaces, too. For example, 'str|int' in an interface declaration will be coverted to Iunion(str, int). >>>isinstance(file('/dev/null'), readable) True >>>isinstance(MyFileLikeClass(), readable) True The MyFileLIkeClass or file classes do not have to be explicitly declared as implementing the readable interface. The benefit of explit inteface declarations is that you will get an error if you write a method that does not match the declaration. If you try to implement two conflicting interfaces this can also be detected immediately - the intersection of the two interfaces will reduce to the empty interface. For now this will only catch the same method name with different number of arguments but in the future it may detect conflicting argument or return value types. doesn't-have-anything-better-to-do-at-6-am-ly yours, Oren
participants (24)
-
Aahz
-
Alex Martelli
-
Andrew Koenig
-
barry@python.org
-
Brett Cannon
-
Casey Duncan
-
David Abrahams
-
Fredrik Lundh
-
Greg Ewing
-
Guido van Rossum
-
jepler@unpythonic.net
-
Jeremy Hylton
-
jeremy@alum.mit.edu
-
Ka-Ping Yee
-
Kevin Jacobs
-
Martijn Faassen
-
Martin Sjögren
-
martin@v.loewis.de
-
Michael Hudson
-
Michael McLay
-
Nathan Clegg
-
Neil Schemenauer
-
Oren Tirosh
-
Samuele Pedroni