Providing a guarantee that instances of user-defined classes have distinct identities

Suppose I model a game of cards, where suits don't matter. I might find the integer representation of cards (14 for "Ace", 13 for "King", ..., 2 for "2") to be convenient. The deck has 4 copies of each card, which I need to distinguish. I was thinking to model a card as follows: class Card(int): __hash__ = int.__hash__ def __eq__(self, other): return self is other This works precisely as I want (at least in CPython 3.2): x = Card(14) y = Card(14) assert x != y # x and y are two different Aces z = x assert x == z # x and z are bound to the same Ace But this behavior is implementation dependent, so the above code may one day break (very painfully for whoever happens to maintain it at the time). Is it possible to add a guarantee to the language that would make the above code safe to use? Currently the language promises: "For immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed." Nowhere in the documentation is it clearly defined which objects are considered "immutable" for the purpose of this promise. As a result, a Python implementation, now or in the future, may decide that it's ok to return a reference to an existing object when a Card instance is created - since arguably, class Card is immutable (since it derives from an immutable base class, and doesn't add any new attributes). Perhaps a change like this would be fine (it obviously won't break any existing code): "For certain types, operations that compute new values may actually return a reference to an existing object with the same type and value. The only types for which this may happen are: - built-in immutable types - user-defined classes that explicitly override __new__ to return a reference to an existing object Note that a user-defined class that inherits from a built-in immutable types, without overriding __new__, will not exhibit this behavior."

On Wed, Apr 18, 2012 at 11:23 AM, Max Moroz <maxmoroz@gmail.com> wrote:
It's up to the objects themselves (and their metaclasses) - any such optimisation must be implemented in cls.__new__ or metacls.__call__. So, no, you're not going to get a stronger guarantee than is already in place (and you'd be better of just writing Card properly - inheriting from int for an object that should model a "value, suit" 2-tuple is a bad idea. Using collections.namedtuple would be a much better option. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 18 Apr 2012 11:44:47 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
First, the Python docs don't clearly tell you what objects are immutable because, well, it's an extensible language. With that constraint, the best you can do about that is what it says not far above the section you quoted: An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable. I.e. - that this is determined by it's type, and a list of builtin types that are immutable.
So, no, you're not going to get a stronger guarantee than is already in place
I believe that this guarantee is strong enough to guarantee that classes that inherit from immutable types won't share values unless the class code does something to make that happen. The type of such an object is *not* the type that it inherits from, it's a Python class type. As demonstrated, such classes aren't immutable, so Python needs to make different instances different objects even if they share the same value. If you want behavior different from that, the class or metaclass has to provide it. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Le 18/04/2012 03:23, Max Moroz a écrit :
Hi, I agree that the definition of "immutable" is not very clear, but I don’t think that your Card class is immutable. As Card inherits without __slots__, it gets a __dict__ and can hold arbitrary attributes. Even if none of its methods do so, this is perfectly okay: x = Card(14) y = Card(14) x.foo = 42 y.foo # AttributeError Because of their __dict__, Card objects can never be considered immutable. (Now, I’m not sure what would happen with an empty __slots__.) Regards, -- Simon Sapin

Simon Sapin <simon.sapin@kozea.fr> wrote:
I agree that the definition of "immutable" is not very clear, but I don’t think that your Card class is immutable.
I shouldn't have used the word "immutable"; it is a bit confusing and distracts from my real concern. I'm really just trying to get a guarantee from the language that would make my original code safe. As is, it relies on a very reasonable, but undocumented, assumption about the behavior of built-in classes' __new__ method The exact guarantee I need is: "Any built-in class' __new__ method called with the cls argument set to a user-defined subclass, will always return a new instance of type cls." (Emphasis on "new instance" - as opposed to a reference to an existing object.) Nick Coghlan <ncoghlan@gmail.com> wrote:
My example might have been poor. Still, I have use cases for objects nearly identical to `int`, `tuple`, etc., but where I want to distinguish two objects created at different times (place them both in a set, compare them as unequal, etc.). Thanks for your comments. Max

Max Moroz schrieb am Wed, 18. Apr 2012, um 02:57:37 -0700:
Simon's point is that your current code *is* safe since your instances are not immutable.
As long as your class does not set `__slots__` to an empty sequence, you already have this guarantee, since your type is not immutable. And while the current documentation might suggest that built-in types would be allowed to check for empty `__slots__` and reuse already created instances of a subclass in that case, it's very unlikely they will ever implement such a mechanism. So just don't define `__slots__` if you want this kind of guarantee, or better even, add an ID to your instances to make the differences you rely on explicit. Cheers, Sven

Max Moroz wrote:
I can't help feel that you are worrying about nothing. Why would a built-in class ever return an existing instance of a sub-class? While technically it would be possible, it would require the built-in class to keep a cache of instances for each subclass. Who is going to do that, and why would they bother? It seems to me that you're asking for a guarantee like: "Calling len() on a list will never randomly shuffle the list as a side-effect." The fact that len() doesn't shuffle the list as a side-effect is not a documented promise of the language. But does it have to be? Some things would be just stupid for any implementation to do. There is no limit to the number of stupid things an implementation might do, and for the language to specify that it doesn't do any of them is impossible. I think that __new__ returning an existing instance of a subclass would be one of those stupid things. After all, it is a *constructor*, it is supposed to construct a new instance, if it doesn't do so in the case of being called with a subclass argument it isn't living up to the implied contract. I guess what this comes down to is that I'm quite satisfied with the implied promise that constructors will construct new instances and don't think it is necessary to make that explicit. It's not that I object to your request as that I think it's unnecessary. -- Steven

After reading the comments, and especially the one below, I am now persuaded that this implied guarantee is sufficient. Part of the problem was that I didn't have a clear picture of what __new__ is supposed to do when called with a (proper) subclass argument. Now I (hopefully correctly) understand that a well-behaved __new__ should in this case simply pass the call to object.__new__, or at least do something very similar: the subclass has every right to expect this behavior to remain unchanged whether or not one of its parent classes defined a custom __new__. No explicit guarantee is needed to confirm that this is what built-in classes do. Steven D'Aprano wrote:

Steven D'Aprano schrieb am Wed, 18. Apr 2012, um 23:22:55 +1000:
This was also my first reaction; there is one case, though, which you wouldn't need a cache for: if the constructor is called with an instance of the subclass as an argument. As an example, the tuple implementation does not have a cache of instances, and reuses only tuples that are directly passed to the constructor: >>> a = 1, 2 >>> b = 1, 2 >>> a is b False >>> b = tuple(a) >>> a is b True It wouldn't be completely unthinkable that a Python implementation chooses to extend this behaviour to immutable subclasses of immutable types. I don't think there is any reason to disallow such an implementation either. Cheers, Sven

On Thu, 19 Apr 2012 11:35:13 +0100 Sven Marnach <sven@marnach.net> wrote:
How would the implementation determine that the subclass was immutable? <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On Wed, Apr 18, 2012 at 11:23 AM, Max Moroz <maxmoroz@gmail.com> wrote:
It's up to the objects themselves (and their metaclasses) - any such optimisation must be implemented in cls.__new__ or metacls.__call__. So, no, you're not going to get a stronger guarantee than is already in place (and you'd be better of just writing Card properly - inheriting from int for an object that should model a "value, suit" 2-tuple is a bad idea. Using collections.namedtuple would be a much better option. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 18 Apr 2012 11:44:47 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
First, the Python docs don't clearly tell you what objects are immutable because, well, it's an extensible language. With that constraint, the best you can do about that is what it says not far above the section you quoted: An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable. I.e. - that this is determined by it's type, and a list of builtin types that are immutable.
So, no, you're not going to get a stronger guarantee than is already in place
I believe that this guarantee is strong enough to guarantee that classes that inherit from immutable types won't share values unless the class code does something to make that happen. The type of such an object is *not* the type that it inherits from, it's a Python class type. As demonstrated, such classes aren't immutable, so Python needs to make different instances different objects even if they share the same value. If you want behavior different from that, the class or metaclass has to provide it. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Le 18/04/2012 03:23, Max Moroz a écrit :
Hi, I agree that the definition of "immutable" is not very clear, but I don’t think that your Card class is immutable. As Card inherits without __slots__, it gets a __dict__ and can hold arbitrary attributes. Even if none of its methods do so, this is perfectly okay: x = Card(14) y = Card(14) x.foo = 42 y.foo # AttributeError Because of their __dict__, Card objects can never be considered immutable. (Now, I’m not sure what would happen with an empty __slots__.) Regards, -- Simon Sapin

Simon Sapin <simon.sapin@kozea.fr> wrote:
I agree that the definition of "immutable" is not very clear, but I don’t think that your Card class is immutable.
I shouldn't have used the word "immutable"; it is a bit confusing and distracts from my real concern. I'm really just trying to get a guarantee from the language that would make my original code safe. As is, it relies on a very reasonable, but undocumented, assumption about the behavior of built-in classes' __new__ method The exact guarantee I need is: "Any built-in class' __new__ method called with the cls argument set to a user-defined subclass, will always return a new instance of type cls." (Emphasis on "new instance" - as opposed to a reference to an existing object.) Nick Coghlan <ncoghlan@gmail.com> wrote:
My example might have been poor. Still, I have use cases for objects nearly identical to `int`, `tuple`, etc., but where I want to distinguish two objects created at different times (place them both in a set, compare them as unequal, etc.). Thanks for your comments. Max

Max Moroz schrieb am Wed, 18. Apr 2012, um 02:57:37 -0700:
Simon's point is that your current code *is* safe since your instances are not immutable.
As long as your class does not set `__slots__` to an empty sequence, you already have this guarantee, since your type is not immutable. And while the current documentation might suggest that built-in types would be allowed to check for empty `__slots__` and reuse already created instances of a subclass in that case, it's very unlikely they will ever implement such a mechanism. So just don't define `__slots__` if you want this kind of guarantee, or better even, add an ID to your instances to make the differences you rely on explicit. Cheers, Sven

Max Moroz wrote:
I can't help feel that you are worrying about nothing. Why would a built-in class ever return an existing instance of a sub-class? While technically it would be possible, it would require the built-in class to keep a cache of instances for each subclass. Who is going to do that, and why would they bother? It seems to me that you're asking for a guarantee like: "Calling len() on a list will never randomly shuffle the list as a side-effect." The fact that len() doesn't shuffle the list as a side-effect is not a documented promise of the language. But does it have to be? Some things would be just stupid for any implementation to do. There is no limit to the number of stupid things an implementation might do, and for the language to specify that it doesn't do any of them is impossible. I think that __new__ returning an existing instance of a subclass would be one of those stupid things. After all, it is a *constructor*, it is supposed to construct a new instance, if it doesn't do so in the case of being called with a subclass argument it isn't living up to the implied contract. I guess what this comes down to is that I'm quite satisfied with the implied promise that constructors will construct new instances and don't think it is necessary to make that explicit. It's not that I object to your request as that I think it's unnecessary. -- Steven

After reading the comments, and especially the one below, I am now persuaded that this implied guarantee is sufficient. Part of the problem was that I didn't have a clear picture of what __new__ is supposed to do when called with a (proper) subclass argument. Now I (hopefully correctly) understand that a well-behaved __new__ should in this case simply pass the call to object.__new__, or at least do something very similar: the subclass has every right to expect this behavior to remain unchanged whether or not one of its parent classes defined a custom __new__. No explicit guarantee is needed to confirm that this is what built-in classes do. Steven D'Aprano wrote:

Steven D'Aprano schrieb am Wed, 18. Apr 2012, um 23:22:55 +1000:
This was also my first reaction; there is one case, though, which you wouldn't need a cache for: if the constructor is called with an instance of the subclass as an argument. As an example, the tuple implementation does not have a cache of instances, and reuses only tuples that are directly passed to the constructor: >>> a = 1, 2 >>> b = 1, 2 >>> a is b False >>> b = tuple(a) >>> a is b True It wouldn't be completely unthinkable that a Python implementation chooses to extend this behaviour to immutable subclasses of immutable types. I don't think there is any reason to disallow such an implementation either. Cheers, Sven

On Thu, 19 Apr 2012 11:35:13 +0100 Sven Marnach <sven@marnach.net> wrote:
How would the implementation determine that the subclass was immutable? <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
participants (7)
-
Amaury Forgeot d'Arc
-
Max Moroz
-
Mike Meyer
-
Nick Coghlan
-
Simon Sapin
-
Steven D'Aprano
-
Sven Marnach