[Python-Dev] subclassing builtin data structures

Sat Feb 14 13:23:32 CET 2015

On Fri, Feb 13, 2015 at 06:03:35PM -0500, Neil Girdhar wrote:
> I personally don't think this is a big enough issue to warrant any changes,
> but I think Serhiy's solution would be the ideal best with one additional
> parameter: the caller's type.  Something like
> 
> def __make_me__(self, cls, *args, **kwargs)
> 
> and the idea is that any time you want to construct a type, instead of
> 
> self.__class__(assumed arguments…)
> 
> where you are not sure that the derived class' constructor knows the right
> argument types, you do
> 
> def SomeCls:
>      def some_method(self, ...):
>            return self.__make_me__(SomeCls, assumed arguments…)
> 
> Now the derived class knows who is asking for a copy.

What if you wish to return an instance from a classmethod? You don't 
have a `self` available.

class SomeCls:
    def __init__(self, x, y, z):
        ...
    @classmethod
    def from_spam(cls, spam):
        x, y, z = process(spam)
        return cls.__make_me__(self, cls, x, y, z)  # oops, no self

Even if you are calling from an instance method, and self is available, 
you cannot assume that the information needed for the subclass 
constructor is still available. Perhaps that information is used in the 
constructor and then discarded.

The problem we wish to solve is that when subclassing, methods of some
base class blindly return instances of itself, instead of self's type:

py> class MyInt(int):
...     pass
...
py> n = MyInt(23)
py> assert isinstance(n, MyInt)
py> assert isinstance(n+1, MyInt)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AssertionError

The means that subclasses often have to override all the parent's 
methods, just to ensure the type is correct:

class MyInt(int):
    def __add__(self, other):
        o = super().__add__(other)
        if o is not NotImplemented:
            o = type(self)(o)
        return o

Something like that, repeated for all the int methods, should work:

py> n = MyInt(23)
py> type(n+1)
<class '__main__.MyInt'>

This is tedious and error prone, but at least once it is done, 
subclasses of MyInt will Just Work:

py> class MyOtherInt(MyInt):
...     pass
...
py> a = MyOtherInt(42)
py> type(a + 1000)
<class '__main__.MyOtherInt'>

(At least, *in general* they will work. See below.)

So, why not have int's methods use type(self) instead of hard coding 
int? The answer is that *some* subclasses might override the 
constructor, which would cause the __add__ method to fail:

    # this will fail if the constructor has a different signature
    o = type(self)(o)

Okay, but changing the constructor signature is quite unusual. Mostly, 
people subclass to add new methods or attributes, or to override a 
specific method. The dict/defaultdict situation is relatively uncommon.

Instead of requiring *every* subclass to override all the methods, 
couldn't we require the base classes (like int) to assume that the 
signature is unchanged and call type(self), and leave it up to the 
subclass to override all the methods *only* if the signature has 
changed? (Which they probably would have to do anyway.)

As the MyInt example above shows, or datetime in the standard library, 
this actually works fine in practice:

py> from datetime import datetime
py> class MySpecialDateTime(datetime):
...     pass
...
py> t = MySpecialDateTime.today()
py> type(t)
<class '__main__.MySpecialDateTime'>

Why can't int, str, list, tuple etc. be more like datetime?

-- 
Steve