[Python-ideas] Eliminating special method lookup (was Re: Missing Core Feature: + - * / | & do not call __getattr__)
Andrew Barnert
abarnert at yahoo.com
Sat Dec 5 00:21:26 EST 2015
On Dec 4, 2015, at 18:38, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> On Fri, Dec 04, 2015 at 12:02:46PM -0800, Andrew Barnert via Python-ideas wrote:
>>
>> So, maybe the way to get from here to there is to explicitly document
>> the methods CPython treats as magic methods,
>
> That's probably a good idea regardless of anything else.
>
>
>> and allow only allow
>> other implementations to do the same for (a subset of) the same, and
>> people can gradually tackle and remove parts of that list as people
>> come up with ideas, and if the list eventually becomes empty (or gets
>> to the point where it's 2 rare things that aren't important enough to
>> keep extra complexity in the language), then the whole notion of
>> special method lookup can finally go away.
>
> Well, maybe. I haven't decided whether special method lookup is an
> annoying inconsistency or a feature. It seems to me that operators, at
> least, are somewhat different from method calls, and the special
> handling should be considered a deliberate design feature. My
> reasoning goes like this:
>
> Take a method call:
>
> obj.spam(arg)
>
> That's clearly called on the instance itself (obj), and syntactically we
> can see that the method call "belongs" to obj, and so semantically we
> should give obj a chance to use its own custom method before the method
> defined in the class.
>
> (At least in languages like Python where this feature is supported. I
> believe that the "Design Patterns" world actually gives this idiom a
> name, and in Java you have to jump through flaming hoops to make it
> work, but I can never remember what it is called.)
Of course in JavaScript, you have to jump through flaming hoops to make it _not_ work that way, and yet most JS developers jump through those hoops.
But anyway, Python is obviously the best of both worlds here.
> So it makes sense that ordinary methods should be first looked up on the
> instance, then the class. But now take an operator call:
>
> obj + foo
>
> (where, for simplicity, both obj and foo are instances of the same
> class).
You've simplified away the main problem here. Binary dispatch is what makes designing operator overloading in OO languages hard (except OO languages that are multimethod-based from the start, like Dylan or Common Lisp, or that just ban operator overloading, like Java). Not recognizing how well Python handled this is missing something big.
Anyway, saying that they belong to "the class" doesn't solve the problem; it leaves you with exactly the same question of who defines 2+3.5 that you started with. And now it seems a lot more artificial to say that it's type(3.5), not 3.5, that does so.
Meanwhile, let's look at a concrete example. Why would I want an instance of some type that has arithmetic operations to override methods from the class in the first place?
One obvious possibility for such special values is mathematical special values--e.g., datetimes at positive and negative infinity. They can return float infinities when you ask to convert them to seconds-since-epoch, format themselves as special strings, raise on isdst, etc. But what you really want them to do is be > any normal value. And to return themselves when any timedelta is added. And so on. The operators are exactly what you _do_ want to overload.
And you definitely want both values to participate in that overloading, not just the types, because otherwise you have to write all the logic for how infinities work with each other twice. (If that doesn't sound bad, imagine you had 5 special values instead of 2. Or imagine that some of the special values were written by one person, and you came along and created new ones later.)
And of course this would simplify the language, not add extra complexity--instead of two different rules for method lookup, we have one.
So, I think Python would be better if it allowed instances to overload special methods.
The obvious alternative today is to make each special value a singleton instance of a new subclass. That gets you most of the same benefits as instance overloads, but at the cost of writing more code and occasionally making debugging a bit clumsier. Exactly the same cost as using singleton subclasses instead of instance overloading for normal methods. If eliminating that cost for normal methods is good, why not for special methods?
> Syntactically, that's *not* clearly a call on a specific
> instance. There's no good reason to think that the + operator "belongs"
> to obj, or foo, or either of them.
No, it belongs to _both_ of them. That's the problem.
I think your intuition here is that we can solve the problem by saying it belongs to some thing they both have in common, their type, which sounds good at first--but once you remember that everything that makes this hard arises from the fact that we can't assume they have the same type, and therefore they don't share anything, the solution no longer has anything going for it.
>
> We could nevertheless arbitrarily decide that the left hand instance obj
> gets first crack at this, and if it doesn't customise the call to
> __add__, the right hand instance foo is allowed to customise __radd__,
> and if that doesn't exist we go back to the class __add__ (and if that
> also fails to exist we try the class __radd__).
> That's okay, I guess,
> but the added complexity feels like it would be a bug magnet.
Why make it so complex?
The existing rule is: if issubclass(type(b), type(a)), look for type(b).__radd__ first; otherwise, look for type(a).__add__ first.
If we allowed instance overloading, the rule would be: if isinstance(b, type(a)), look for b.__radd__ first; otherwise, look for a.__add__.
No more complex or arbitrary than what we already have, and more flexible, and it doesn't require learning a second lookup rule that applies to some magic methods but not all.
Also, notice that this allows instancehooks instead of just subclasshooks to get involved, which I think is also desirable. If I implement an instancehook (or a register-for-my-instancehook method), it's because I want those objects to act like my instances as far as pythonly possible. Each place where they don't, like __radd__ lookup, is a hole in the abstraction.
> So *maybe* we should decide that the right design is to say that
> operators don't belong to either instance, they belong to the class, and
> neither obj nor foo get the chance to override __[r]add__ on a
> per-instance basis.
>
> Similar reasoning applies to __getattr__ and other special methods. We
> can, I think, convince ourselves that they belong to the class, not the
> instance, in a way that ordinary methods are not.
This seems even more dubious to me. The fact that instances' attributes are defined arbitrarily, rather than constrained by the class, is fundamental to Python. (Of course you can jump through hoops with slots, properties, or custom descriptors when you don't want that, but it's obviously the default behavior, and it's pretty uncommon that you don't want it.)
So, why should dynamic attribute lookup be any different than normal dict attribute lookup?
Of course very often that's more dynamic than you need, so you wouldn't write an instance __getattr__ that often--but that's exactly the same reason you don't write instance overloads that often in general (or __getattr__, for that matter); there's nothing new or different here.
> We define ordinary methods in the class definition for convenience and
> efficency, but syntactically and semantically obj.spam looks up spam in
> the instance obj first, so obj has a chance to override any spam defined
> on the class. That's a good thing.
>
> But syntactically, special dunder methods like __getattr__ and __add__
> don't *clearly* belong to the instance, so maybe we should decide that
> it is a positive feature that they can't be overridden by the instance.
> By analogy, if we use an unbound method instead:
>
> TheClass.spam(obj, arg)
>
> then I think we would all agree that the per-instance obj.spam() method
> should *not* be called. It seems to me that operator methods "feel like"
> they are closer to unbound method calls than bound method calls. And
> likewise for __getattr__ etc
I don't see the analogy at all. The bound __add__ call or __getattr__ call is a bound method, and we can always explicitly look up and pass around the unbound method if we want. (I've seen people pass around int.__neg__ the same way they'd pass around str.split; even though I think I'd use operator.neg instead for that, I don't see any problem with it.) The distinction is the same as with normal methods.
Also, consider that passing around spam.eggs always does what you expected, whether it's a bound method or a per-instance override, but spam.__add__ doesn't do what you expect unless it's a bound method. That's an extra thing to figure out and/or memorize about the language, and I don't see any benefit.
Well, except for the performance benefit to CPython, of course. In a new language, I'm don't think that would be enough to sway me no matter how big the benefit, but in a language where people have been relying on that optimization for years (whether we're talking about today, or when new-style classes were first added and Guido had to convince everyone they weren't going to slow down the language), it obviously is a major factor. So I don't think Python should change these methods until someone can work out a way to make the optimization unnecessary.
One other possible consideration is that, while a tracing JIT should be able to optimize instance overloads just as easily as class overloads, an AIT optimizer might not be able to, simply because types are so central to both the research and the industrial experience there (although JS might be changing that? I'm not up to date...). So, that might be another reason to treat these methods as special in a new language. But that doesn't really apply to Python. (Sure, someone _might_ write a traditional-style AIT optimizer for Python to give us C-competitive speeds some day, but at this point it doesn't seem like an important-enough prospect that we should design the language around it...)
More information about the Python-ideas
mailing list