[Python-Dev] subclassing builtin data structures
Steven D'Aprano
steve at pearwood.info
Sat Feb 14 13:23:32 CET 2015
On Fri, Feb 13, 2015 at 06:03:35PM -0500, Neil Girdhar wrote:
> I personally don't think this is a big enough issue to warrant any changes,
> but I think Serhiy's solution would be the ideal best with one additional
> parameter: the caller's type. Something like
>
> def __make_me__(self, cls, *args, **kwargs)
>
> and the idea is that any time you want to construct a type, instead of
>
> self.__class__(assumed arguments…)
>
> where you are not sure that the derived class' constructor knows the right
> argument types, you do
>
> def SomeCls:
> def some_method(self, ...):
> return self.__make_me__(SomeCls, assumed arguments…)
>
> Now the derived class knows who is asking for a copy.
What if you wish to return an instance from a classmethod? You don't
have a `self` available.
class SomeCls:
def __init__(self, x, y, z):
...
@classmethod
def from_spam(cls, spam):
x, y, z = process(spam)
return cls.__make_me__(self, cls, x, y, z) # oops, no self
Even if you are calling from an instance method, and self is available,
you cannot assume that the information needed for the subclass
constructor is still available. Perhaps that information is used in the
constructor and then discarded.
The problem we wish to solve is that when subclassing, methods of some
base class blindly return instances of itself, instead of self's type:
py> class MyInt(int):
... pass
...
py> n = MyInt(23)
py> assert isinstance(n, MyInt)
py> assert isinstance(n+1, MyInt)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError
The means that subclasses often have to override all the parent's
methods, just to ensure the type is correct:
class MyInt(int):
def __add__(self, other):
o = super().__add__(other)
if o is not NotImplemented:
o = type(self)(o)
return o
Something like that, repeated for all the int methods, should work:
py> n = MyInt(23)
py> type(n+1)
<class '__main__.MyInt'>
This is tedious and error prone, but at least once it is done,
subclasses of MyInt will Just Work:
py> class MyOtherInt(MyInt):
... pass
...
py> a = MyOtherInt(42)
py> type(a + 1000)
<class '__main__.MyOtherInt'>
(At least, *in general* they will work. See below.)
So, why not have int's methods use type(self) instead of hard coding
int? The answer is that *some* subclasses might override the
constructor, which would cause the __add__ method to fail:
# this will fail if the constructor has a different signature
o = type(self)(o)
Okay, but changing the constructor signature is quite unusual. Mostly,
people subclass to add new methods or attributes, or to override a
specific method. The dict/defaultdict situation is relatively uncommon.
Instead of requiring *every* subclass to override all the methods,
couldn't we require the base classes (like int) to assume that the
signature is unchanged and call type(self), and leave it up to the
subclass to override all the methods *only* if the signature has
changed? (Which they probably would have to do anyway.)
As the MyInt example above shows, or datetime in the standard library,
this actually works fine in practice:
py> from datetime import datetime
py> class MySpecialDateTime(datetime):
... pass
...
py> t = MySpecialDateTime.today()
py> type(t)
<class '__main__.MySpecialDateTime'>
Why can't int, str, list, tuple etc. be more like datetime?
--
Steve
More information about the Python-Dev
mailing list