Missing Core Feature: + - * / | & do not call __getattr__
Dear all, I just stumbled upon a very weird behaviour of python 2 and python 3. At least I was not able to find a solution. *The point is to dynamically define __add__, __or__ and so on via __getattr__* (for example by deriving them from __iadd__ or similar in a generic way). However this very intuitive idea is currently NOT POSSIBLE because * - * / & | and so on just bypass this standard procedure. I found two stackoverflow contributions stating this: http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instanc... http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operatio... Neither the mentioned posts, nor I myself can see any reason why this is the way it is, nor how the operators are actually implemented to maybe bypass this unintuitive behaviour. Any help? Comments? Ides? best, Stephan
Le ven. 4 déc. 2015 à 14:21, Stephan Sahm <Stephan.Sahm@gmx.de> a écrit :
Dear all,
I just stumbled upon a very weird behaviour of python 2 and python 3. At least I was not able to find a solution.
*The point is to dynamically define __add__, __or__ and so on via __getattr__* (for example by deriving them from __iadd__ or similar in a generic way). However this very intuitive idea is currently NOT POSSIBLE because * - * / & | and so on just bypass this standard procedure.
It is possible if you use indirection, and another name for the dynamic methods: class C: def __add__(self, other): try: method = self.dynamic_add except AttributeError: return NotImplemented else: return method(other) obj = C() print(obj + 1) # raises TypeError obj.dynamic_add = lambda other: 42 + other print(obj + 1) # 43
I found two stackoverflow contributions stating this:
http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instanc...
http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operatio...
Neither the mentioned posts, nor I myself can see any reason why this is the way it is, nor how the operators are actually implemented to maybe bypass this unintuitive behaviour.
Any help? Comments? Ides?
best, Stephan _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Thanks for the constructive response! this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code. However, in the long run, such placeholders seem like an unnecessary overhead to me. I myself thought whether one can insert the methods into the class itself using __new__ or something else concerning meta-classes. My background is however not good enough in order to seriously explore this direction. Any further suggestions? On 4 December 2015 at 14:50, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
Le ven. 4 déc. 2015 à 14:21, Stephan Sahm <Stephan.Sahm@gmx.de> a écrit :
Dear all,
I just stumbled upon a very weird behaviour of python 2 and python 3. At least I was not able to find a solution.
*The point is to dynamically define __add__, __or__ and so on via __getattr__* (for example by deriving them from __iadd__ or similar in a generic way). However this very intuitive idea is currently NOT POSSIBLE because * - * / & | and so on just bypass this standard procedure.
It is possible if you use indirection, and another name for the dynamic methods:
class C: def __add__(self, other): try: method = self.dynamic_add except AttributeError: return NotImplemented else: return method(other)
obj = C() print(obj + 1) # raises TypeError obj.dynamic_add = lambda other: 42 + other print(obj + 1) # 43
I found two stackoverflow contributions stating this:
http://stackoverflow.com/questions/11629287/python-how-to-forward-an-instanc...
http://stackoverflow.com/questions/33393474/lazy-evaluation-forward-operatio...
Neither the mentioned posts, nor I myself can see any reason why this is the way it is, nor how the operators are actually implemented to maybe bypass this unintuitive behaviour.
Any help? Comments? Ides?
best, Stephan _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Dec 5, 2015 at 12:55 AM, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code.
However, in the long run, such placeholders seem like an unnecessary overhead to me.
They might be, in the unusual case where the *instance* determines how it reacts to an operator. In the far more common case where the *type* determines it (that is, where all instances of a particular type behave the same way), the current system has less overhead. (Though I'm not sure how much less, nor even if it's been measured.) If you really do need this kind of per-object dynamism, you still probably don't actually need it for _every_ operator, so you can simply create a bouncer for each dunder method that you actually need this dynamic dispatch on, and as you say, toss them into a mixin if necessary. ChrisA
On 4 December 2015 at 23:55, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
Thanks for the constructive response!
this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code.
However, in the long run, such placeholders seem like an unnecessary overhead to me.
The current behaviour is by design - special methods are looked up as slots on the object's class, not as instance attributes. This allows the interpreter to bypass several steps in the normal instance attribute lookup process.
I myself thought whether one can insert the methods into the class itself using __new__ or something else concerning meta-classes. My background is however not good enough in order to seriously explore this direction.
Any further suggestions?
Graham Dumpleton's wrapt module has a robust implementation of object proxies: http://wrapt.readthedocs.org/en/latest/wrappers.html Even if that library isn't directly applicable to your use case, studying the implementation should be informative. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
thank you two the wrapt link is on my future todo-list, will need some more time than I currently have You both mentioned that the operators might be better tackleable via class-interfaces (__slots__ e.g.?) How would it look like to change things on this level? My current usecase is to implement a Mixin abc class which shall combine all the available __iadd__ __ior__ and so on with a copy() function (this one is then the abstractmethod) to produce automatically the respective __add__, __or__ and so on On 4 December 2015 at 15:14, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 December 2015 at 23:55, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
Thanks for the constructive response!
this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code.
However, in the long run, such placeholders seem like an unnecessary overhead to me.
The current behaviour is by design - special methods are looked up as slots on the object's class, not as instance attributes. This allows the interpreter to bypass several steps in the normal instance attribute lookup process.
I myself thought whether one can insert the methods into the class itself using __new__ or something else concerning meta-classes. My background is however not good enough in order to seriously explore this direction.
Any further suggestions?
Graham Dumpleton's wrapt module has a robust implementation of object proxies: http://wrapt.readthedocs.org/en/latest/wrappers.html
Even if that library isn't directly applicable to your use case, studying the implementation should be informative.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
This is explained in the documentation (https://docs.python.org/3/reference/datamodel.html#special-method-lookup for 3.x; for 2.x, it's largely the same except that old-style classes exist and have a different rule). There is also at least one StackOverflow answer that explains this (I know, because I wrote one...), but it's not even on the first page of a search, while the docs answer is the first result that shows up. Plus, the docs are written by the Python dev team in an open community process, while my SO answer is only as good as one user's understanding and writing ability; even if it weren't thrown in with a dozen answers that are wrong or just say "Python does this because it's dumb" or whatever, I'd go to the docs first. On Friday, December 4, 2015 6:22 AM, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
the wrapt link is on my future todo-list, will need some more time than I currently have
The wrapt link has the sample code to do exactly what you want. Pop it off your todo list. If you have more time in the future, you may want to look at some bridging libraries like PyObjC; once you can understand how to map between Python's and ObjC's different notions of method lookup, you'll understand this well enough to teach a class on it. :) But for now, wrapt is more than enough.
You both mentioned that the operators might be better tackleable via class-interfaces (__slots__ e.g.?)
Not __slots__. The terminology gets a bit confusing here. What they're referring to is the C-API notion of slots. In particular, every builtin (or C-extension) is defined by a C struct which contains, among other things, members like "np_add", which is a pointer to an adding function. For example, the int type's np_add member points to a function that adds integers to other things. Slightly oversimplified, the np_add slot of every class implemented in Python just points to a function that does a stripped-down lookup for '__add__' instead of the usual __getattribute__ mechanism. (Which means you get no __getattr__, __slots__, @properties from the metaclass, etc.) So, why are these two things both called "slots"? Well, the point of __slots__ is to give you the space savings, and static collection of members that exist on every instance, that builtin types' instances get by using slots in C structs. So, the intuitive notion of C struct layout rather than dict lookup is central in both cases, just in different ways. (If you really want to, you could think about np_add as being a member of the __slots__ of a special metaclass that all builtin classes use. But that's probably more misleading than helpful, because CPython isn't actually implemented that way.)
My current usecase is to implement a Mixin abc class which shall combine all the available __iadd__ __ior__ and so on with a copy() function (this one is then the abstractmethod) to produce automatically the respective __add__, __or__ and so on
OK, so if you can't handle __add__ dynamically at method lookup time, how do you deal with that? Of course you can just write lots of boilerplate, but presumably you're using Python instead of Java for a reason. You could also write Python code that generates the boilerplate (as a module, or as code to exec), but presumably you're using Python instead of Tcl for a reason. If you need methods that are dynamically generated at class creation time, just dynamically generate your methods at class creation time: def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): ifunc = getattr(cls, '__i{}__'.format(name)) def wrapper(self, other): self_copy = self.copy() ifunc(self_copy, other) return self_copy setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def __iadd__(self, other): self.x += other.x self.y += other.y self.z += other.z self.w += other.w return self # etc. Obviously in real life you'll want to fix up the name, docstring, etc. of wrapper (see functools.wraps for how to do this). And you'll want to use tested code rather than something I wrote in an email. You may also want to use a metaclass rather than a decorator (which can be easily hidden from the end-user, because they just need to inherit a mixin that uses that metaclass). Also, if you only ever need to do this dynamic stuff for exactly one class (your mixin), you may not want to use a decorator _or_ a metaclass; just munge the class up in module-level code right after the class definition. But you get the idea. If you really need the lookup to be dynamic (but I don't think you do here, in which case you'd just be adding extra complexity and inefficiency for no reason), you just need to write your own protocol for this and dynamically generate the methods that bounce to that protocol. For example: def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): def wrapper(self, other): self._get_math_method('{}'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def _get_math_method(self, name): def wrapper(self, other): # see previous example return wrapper In fact, if you really wanted to, you could even abuse __getattr__. I think that would be more likely to confuse people than to help them, but... def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): def wrapper(self, other): self.__getattr__('__{}__'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls @mathify class Quaternion: def __getattr__(self, name): if name.startswith('__') and name.endswith('__'): name = name.strip('_') def wrapper(self, other): # see previous example return wrapper Again, don't do this last one. I think the first example (or even the simpler version where you just dynamically generate your mixin's members with simple module-level code) is all you need.
Thank you very much! lovely fast and lovely extensive answers. I fear, I myself cannot react as fast in completeness (i.e. reading those references and respond with more background knowledge) - this needs more time and the weekend is already quite busy in my case - but at least I want to seize the moment and thank all of you for creating this amazing community! On 4 December 2015 at 20:24, Andrew Barnert <abarnert@yahoo.com> wrote:
This is explained in the documentation ( https://docs.python.org/3/reference/datamodel.html#special-method-lookup for 3.x; for 2.x, it's largely the same except that old-style classes exist and have a different rule).
There is also at least one StackOverflow answer that explains this (I know, because I wrote one...), but it's not even on the first page of a search, while the docs answer is the first result that shows up. Plus, the docs are written by the Python dev team in an open community process, while my SO answer is only as good as one user's understanding and writing ability; even if it weren't thrown in with a dozen answers that are wrong or just say "Python does this because it's dumb" or whatever, I'd go to the docs first.
On Friday, December 4, 2015 6:22 AM, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
the wrapt link is on my future todo-list, will need some more time than I currently have
The wrapt link has the sample code to do exactly what you want. Pop it off your todo list.
If you have more time in the future, you may want to look at some bridging libraries like PyObjC; once you can understand how to map between Python's and ObjC's different notions of method lookup, you'll understand this well enough to teach a class on it. :) But for now, wrapt is more than enough.
You both mentioned that the operators might be better tackleable via class-interfaces (__slots__ e.g.?)
Not __slots__. The terminology gets a bit confusing here. What they're referring to is the C-API notion of slots. In particular, every builtin (or C-extension) is defined by a C struct which contains, among other things, members like "np_add", which is a pointer to an adding function. For example, the int type's np_add member points to a function that adds integers to other things. Slightly oversimplified, the np_add slot of every class implemented in Python just points to a function that does a stripped-down lookup for '__add__' instead of the usual __getattribute__ mechanism. (Which means you get no __getattr__, __slots__, @properties from the metaclass, etc.)
So, why are these two things both called "slots"? Well, the point of __slots__ is to give you the space savings, and static collection of members that exist on every instance, that builtin types' instances get by using slots in C structs. So, the intuitive notion of C struct layout rather than dict lookup is central in both cases, just in different ways. (If you really want to, you could think about np_add as being a member of the __slots__ of a special metaclass that all builtin classes use. But that's probably more misleading than helpful, because CPython isn't actually implemented that way.)
My current usecase is to implement a Mixin abc class which shall combine all the available __iadd__ __ior__ and so on with a copy() function (this one is then the abstractmethod) to produce automatically the respective __add__, __or__ and so on
OK, so if you can't handle __add__ dynamically at method lookup time, how do you deal with that?
Of course you can just write lots of boilerplate, but presumably you're using Python instead of Java for a reason. You could also write Python code that generates the boilerplate (as a module, or as code to exec), but presumably you're using Python instead of Tcl for a reason. If you need methods that are dynamically generated at class creation time, just dynamically generate your methods at class creation time:
def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): ifunc = getattr(cls, '__i{}__'.format(name)) def wrapper(self, other): self_copy = self.copy() ifunc(self_copy, other) return self_copy setattr(cls, '__{}__'.format(name), wrapper) return cls
@mathify class Quaternion: def __iadd__(self, other): self.x += other.x
self.y += other.y
self.z += other.z self.w += other.w return self # etc.
Obviously in real life you'll want to fix up the name, docstring, etc. of wrapper (see functools.wraps for how to do this). And you'll want to use tested code rather than something I wrote in an email. You may also want to use a metaclass rather than a decorator (which can be easily hidden from the end-user, because they just need to inherit a mixin that uses that metaclass). Also, if you only ever need to do this dynamic stuff for exactly one class (your mixin), you may not want to use a decorator _or_ a metaclass; just munge the class up in module-level code right after the class definition. But you get the idea.
If you really need the lookup to be dynamic (but I don't think you do here, in which case you'd just be adding extra complexity and inefficiency for no reason), you just need to write your own protocol for this and dynamically generate the methods that bounce to that protocol. For example:
def mathify(cls):
for name in ('add', 'sub', 'mul', 'div'):
def wrapper(self, other):
self._get_math_method('{}'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls
@mathify class Quaternion: def _get_math_method(self, name): def wrapper(self, other): # see previous example return wrapper
In fact, if you really wanted to, you could even abuse __getattr__. I think that would be more likely to confuse people than to help them, but...
def mathify(cls): for name in ('add', 'sub', 'mul', 'div'): def wrapper(self, other): self.__getattr__('__{}__'.format(name))(other) setattr(cls, '__{}__'.format(name), wrapper) return cls
@mathify class Quaternion: def __getattr__(self, name): if name.startswith('__') and name.endswith('__'): name = name.strip('_') def wrapper(self, other): # see previous example return wrapper
Again, don't do this last one. I think the first example (or even the simpler version where you just dynamically generate your mixin's members with simple module-level code) is all you need.
On 2015-12-04 06:14, Nick Coghlan wrote:
On 4 December 2015 at 23:55, Stephan Sahm <Stephan.Sahm@gmx.de> wrote:
Thanks for the constructive response!
this is good to know, however it still forces me make such a dynamic placeholder explicit for every single operator. Still, I could put everything in a Mixin class so that it is in fact useful to clean up code.
However, in the long run, such placeholders seem like an unnecessary overhead to me.
The current behaviour is by design - special methods are looked up as slots on the object's class, not as instance attributes. This allows the interpreter to bypass several steps in the normal instance attribute lookup process.
It is worth noting that the behavior is even more magical than this. Even when looked up on the class, implicit special method lookup bypasses __getattr__ and __getattribute__ of the metaclass. So the special method lookup is not just an ordinary lookup that happens to start on the class instead of the instance; it is a fully magic lookup that does not engage the usual attribute-access-customization hooks at any level. This is (https://docs.python.org/3/reference/datamodel.html#special-method-lookup) documented but it is often surprising to new users. Personally I find it an annoying inconsistency and hope that at some time in the future Python will become fast enough overall that the extra overhead will be acceptable. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Friday, December 4, 2015 11:05 AM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
This is (https://docs.python.org/3/reference/datamodel.html#special-method-lookup) documented but it is often surprising to new users. Personally I find it an annoying inconsistency and hope that at some time in the future Python will become fast enough overall that the extra overhead will be acceptable.
Unless it's changed since I last looked, Python doesn't actually define _which_ methods are looked up this way; it just allows implementations to do it for any subset of (presumably) the methods defined in the Data Model chapter. I don't think any of the major implementations besides CPython need this optimization at all (a JIT is generally going to do the lookup once and cache it, right?), but at least PyPy follows CPython exactly anyway, to avoid any possible compatibility problems. Meanwhile, there might not be one solution once and for all. For example, __add__ has to switch on the other object's type, unbox the values, malloc and box up a result, etc.; __bool__ often just has to check a struct member != 0 and return the constant True or False, so reducing the overhead to 10% on __add__ may still mean 105% on __bool__. From another angle, the __add__ lookup is part of a larger thing that includes __radd__ lookup and comparing types and so on, and that process might end up with some way to optimize lookup of the methods that wouldn't apply to unary operators or non-operator methods. So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods, and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away.
On Fri, Dec 04, 2015 at 12:02:46PM -0800, Andrew Barnert via Python-ideas wrote:
So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods,
That's probably a good idea regardless of anything else.
and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away.
Well, maybe. I haven't decided whether special method lookup is an annoying inconsistency or a feature. It seems to me that operators, at least, are somewhat different from method calls, and the special handling should be considered a deliberate design feature. My reasoning goes like this: Take a method call: obj.spam(arg) That's clearly called on the instance itself (obj), and syntactically we can see that the method call "belongs" to obj, and so semantically we should give obj a chance to use its own custom method before the method defined in the class. (At least in languages like Python where this feature is supported. I believe that the "Design Patterns" world actually gives this idiom a name, and in Java you have to jump through flaming hoops to make it work, but I can never remember what it is called.) So it makes sense that ordinary methods should be first looked up on the instance, then the class. But now take an operator call: obj + foo (where, for simplicity, both obj and foo are instances of the same class). Syntactically, that's *not* clearly a call on a specific instance. There's no good reason to think that the + operator "belongs" to obj, or foo, or either of them. We could nevertheless arbitrarily decide that the left hand instance obj gets first crack at this, and if it doesn't customise the call to __add__, the right hand instance foo is allowed to customise __radd__, and if that doesn't exist we go back to the class __add__ (and if that also fails to exist we try the class __radd__). That's okay, I guess, but the added complexity feels like it would be a bug magnet. So *maybe* we should decide that the right design is to say that operators don't belong to either instance, they belong to the class, and neither obj nor foo get the chance to override __[r]add__ on a per-instance basis. Similar reasoning applies to __getattr__ and other special methods. We can, I think, convince ourselves that they belong to the class, not the instance, in a way that ordinary methods are not. We define ordinary methods in the class definition for convenience and efficency, but syntactically and semantically obj.spam looks up spam in the instance obj first, so obj has a chance to override any spam defined on the class. That's a good thing. But syntactically, special dunder methods like __getattr__ and __add__ don't *clearly* belong to the instance, so maybe we should decide that it is a positive feature that they can't be overridden by the instance. By analogy, if we use an unbound method instead: TheClass.spam(obj, arg) then I think we would all agree that the per-instance obj.spam() method should *not* be called. It seems to me that operator methods "feel like" they are closer to unbound method calls than bound method calls. And likewise for __getattr__ etc. As I said, I haven't decided myself which way I lean. If Python didn't already support per-instance method overriding, I would argue strongly that it should. But per-instance operator overloading? I'm undecided. -- Steve
On Dec 4, 2015, at 18:38, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Dec 04, 2015 at 12:02:46PM -0800, Andrew Barnert via Python-ideas wrote:
So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods,
That's probably a good idea regardless of anything else.
and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away.
Well, maybe. I haven't decided whether special method lookup is an annoying inconsistency or a feature. It seems to me that operators, at least, are somewhat different from method calls, and the special handling should be considered a deliberate design feature. My reasoning goes like this:
Take a method call:
obj.spam(arg)
That's clearly called on the instance itself (obj), and syntactically we can see that the method call "belongs" to obj, and so semantically we should give obj a chance to use its own custom method before the method defined in the class.
(At least in languages like Python where this feature is supported. I believe that the "Design Patterns" world actually gives this idiom a name, and in Java you have to jump through flaming hoops to make it work, but I can never remember what it is called.)
Of course in JavaScript, you have to jump through flaming hoops to make it _not_ work that way, and yet most JS developers jump through those hoops. But anyway, Python is obviously the best of both worlds here.
So it makes sense that ordinary methods should be first looked up on the instance, then the class. But now take an operator call:
obj + foo
(where, for simplicity, both obj and foo are instances of the same class).
You've simplified away the main problem here. Binary dispatch is what makes designing operator overloading in OO languages hard (except OO languages that are multimethod-based from the start, like Dylan or Common Lisp, or that just ban operator overloading, like Java). Not recognizing how well Python handled this is missing something big. Anyway, saying that they belong to "the class" doesn't solve the problem; it leaves you with exactly the same question of who defines 2+3.5 that you started with. And now it seems a lot more artificial to say that it's type(3.5), not 3.5, that does so. Meanwhile, let's look at a concrete example. Why would I want an instance of some type that has arithmetic operations to override methods from the class in the first place? One obvious possibility for such special values is mathematical special values--e.g., datetimes at positive and negative infinity. They can return float infinities when you ask to convert them to seconds-since-epoch, format themselves as special strings, raise on isdst, etc. But what you really want them to do is be > any normal value. And to return themselves when any timedelta is added. And so on. The operators are exactly what you _do_ want to overload. And you definitely want both values to participate in that overloading, not just the types, because otherwise you have to write all the logic for how infinities work with each other twice. (If that doesn't sound bad, imagine you had 5 special values instead of 2. Or imagine that some of the special values were written by one person, and you came along and created new ones later.) And of course this would simplify the language, not add extra complexity--instead of two different rules for method lookup, we have one. So, I think Python would be better if it allowed instances to overload special methods. The obvious alternative today is to make each special value a singleton instance of a new subclass. That gets you most of the same benefits as instance overloads, but at the cost of writing more code and occasionally making debugging a bit clumsier. Exactly the same cost as using singleton subclasses instead of instance overloading for normal methods. If eliminating that cost for normal methods is good, why not for special methods?
Syntactically, that's *not* clearly a call on a specific instance. There's no good reason to think that the + operator "belongs" to obj, or foo, or either of them.
No, it belongs to _both_ of them. That's the problem. I think your intuition here is that we can solve the problem by saying it belongs to some thing they both have in common, their type, which sounds good at first--but once you remember that everything that makes this hard arises from the fact that we can't assume they have the same type, and therefore they don't share anything, the solution no longer has anything going for it.
We could nevertheless arbitrarily decide that the left hand instance obj gets first crack at this, and if it doesn't customise the call to __add__, the right hand instance foo is allowed to customise __radd__, and if that doesn't exist we go back to the class __add__ (and if that also fails to exist we try the class __radd__). That's okay, I guess, but the added complexity feels like it would be a bug magnet.
Why make it so complex? The existing rule is: if issubclass(type(b), type(a)), look for type(b).__radd__ first; otherwise, look for type(a).__add__ first. If we allowed instance overloading, the rule would be: if isinstance(b, type(a)), look for b.__radd__ first; otherwise, look for a.__add__. No more complex or arbitrary than what we already have, and more flexible, and it doesn't require learning a second lookup rule that applies to some magic methods but not all. Also, notice that this allows instancehooks instead of just subclasshooks to get involved, which I think is also desirable. If I implement an instancehook (or a register-for-my-instancehook method), it's because I want those objects to act like my instances as far as pythonly possible. Each place where they don't, like __radd__ lookup, is a hole in the abstraction.
So *maybe* we should decide that the right design is to say that operators don't belong to either instance, they belong to the class, and neither obj nor foo get the chance to override __[r]add__ on a per-instance basis.
Similar reasoning applies to __getattr__ and other special methods. We can, I think, convince ourselves that they belong to the class, not the instance, in a way that ordinary methods are not.
This seems even more dubious to me. The fact that instances' attributes are defined arbitrarily, rather than constrained by the class, is fundamental to Python. (Of course you can jump through hoops with slots, properties, or custom descriptors when you don't want that, but it's obviously the default behavior, and it's pretty uncommon that you don't want it.) So, why should dynamic attribute lookup be any different than normal dict attribute lookup? Of course very often that's more dynamic than you need, so you wouldn't write an instance __getattr__ that often--but that's exactly the same reason you don't write instance overloads that often in general (or __getattr__, for that matter); there's nothing new or different here.
We define ordinary methods in the class definition for convenience and efficency, but syntactically and semantically obj.spam looks up spam in the instance obj first, so obj has a chance to override any spam defined on the class. That's a good thing.
But syntactically, special dunder methods like __getattr__ and __add__ don't *clearly* belong to the instance, so maybe we should decide that it is a positive feature that they can't be overridden by the instance. By analogy, if we use an unbound method instead:
TheClass.spam(obj, arg)
then I think we would all agree that the per-instance obj.spam() method should *not* be called. It seems to me that operator methods "feel like" they are closer to unbound method calls than bound method calls. And likewise for __getattr__ etc
I don't see the analogy at all. The bound __add__ call or __getattr__ call is a bound method, and we can always explicitly look up and pass around the unbound method if we want. (I've seen people pass around int.__neg__ the same way they'd pass around str.split; even though I think I'd use operator.neg instead for that, I don't see any problem with it.) The distinction is the same as with normal methods. Also, consider that passing around spam.eggs always does what you expected, whether it's a bound method or a per-instance override, but spam.__add__ doesn't do what you expect unless it's a bound method. That's an extra thing to figure out and/or memorize about the language, and I don't see any benefit. Well, except for the performance benefit to CPython, of course. In a new language, I'm don't think that would be enough to sway me no matter how big the benefit, but in a language where people have been relying on that optimization for years (whether we're talking about today, or when new-style classes were first added and Guido had to convince everyone they weren't going to slow down the language), it obviously is a major factor. So I don't think Python should change these methods until someone can work out a way to make the optimization unnecessary. One other possible consideration is that, while a tracing JIT should be able to optimize instance overloads just as easily as class overloads, an AIT optimizer might not be able to, simply because types are so central to both the research and the industrial experience there (although JS might be changing that? I'm not up to date...). So, that might be another reason to treat these methods as special in a new language. But that doesn't really apply to Python. (Sure, someone _might_ write a traditional-style AIT optimizer for Python to give us C-competitive speeds some day, but at this point it doesn't seem like an important-enough prospect that we should design the language around it...)
On 2015-12-04 18:38, Steven D'Aprano wrote:
Well, maybe. I haven't decided whether special method lookup is an annoying inconsistency or a feature. It seems to me that operators, at least, are somewhat different from method calls, and the special handling should be considered a deliberate design feature. My reasoning goes like this:
What you're describing is orthogonal to the issue I was raising. I agree that it makes sense for "obj1 + obj2" to look up __add__ on the class. What I'm saying is that currently the lookup of __add__ on the class is *still* not an ordinary lookup. An ordinary lookup on the class would make use of __getattr__/__getattribute__ on the *metaclass* (just as an ordinary lookup on the instance, like foo.attr, would make use of __getattr__ on the class). But special method lookup doesn't do this; type(foo).__getattribute__("__add__") is never called. There is no way to customize special method name lookup at all. You have to actually define the methods on the class. (You can do that in labor-saving ways like writing a metaclass or class decorator that inserts appropriate methods into the class dict, but you still have to statically set those methods; you can't intercept each special method lookup as it comes in and decide what to do with it, as you can with other attributes.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On 2015-12-04 12:02, Andrew Barnert wrote:
So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods, and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away.
Presumably you mean "treats as magic methods for the purposes of this lookup short-circuiting"? To me the term "magic methods" just means the ones that are implicitly invoked by syntax (e.g., operator overloading), and what those are is already documented. If that's what you mean, I agree that would be a good starting point. I am always nervous when the docs say the kind of thing they say for this, which is "implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass". "Generally"? What does that mean? It seems to me that whether and when __getattribute__ is called should be a clearly-specified part of the semantics of magic methods, and it's somewhat disturbing that it doesn't seem to be. --- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Dec 4, 2015, at 21:55, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2015-12-04 12:02, Andrew Barnert wrote: So, maybe the way to get from here to there is to explicitly document the methods CPython treats as magic methods, and allow only allow other implementations to do the same for (a subset of) the same, and people can gradually tackle and remove parts of that list as people come up with ideas, and if the list eventually becomes empty (or gets to the point where it's 2 rare things that aren't important enough to keep extra complexity in the language), then the whole notion of special method lookup can finally go away.
Presumably you mean "treats as magic methods for the purposes of this lookup short-circuiting"?
Yes; sorry it's not clear without the context, but that's what I meant.
If that's what you mean, I agree that would be a good starting point. I am always nervous when the docs say the kind of thing they say for this, which is "implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass". "Generally"? What does that mean? It seems to me that whether and when __getattribute__ is called should be a clearly-specified part of the semantics of magic methods, and it's somewhat disturbing that it doesn't seem to be.
Well, I think it may be a good thing that Python allows other implementations to not skip the normal lookup steps. But yes, at the very least, there should be a list of exactly which methods are allowed to do that, rather than a vague description, and an implementation note mentioning exactly which ones do so in CPython.
I wish there was less armchair language design and post-rationalization on this topic, and more research into *why* we actually changed this. I recall we did not take the decision lightly. Surely correlating the source control logs and the mailing list archives can shed more insight on this topic than thought experiments about classes with five special values. :-) Patches to the reference manual are also very welcome, BTW. We're not proud of the imprecision of much of the language there -- I would love for some language lawyers to help tighten the language (not just on this topic but all over the reference manual). Finally. Regardless of what the reference manual may (not) say, other implementations do *not* have the freedom to change the operator lookup to look in the instance dict first. (However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.) -- --Guido van Rossum (python.org/~guido)
On Dec 5, 2015, at 09:30, Guido van Rossum <guido@python.org> wrote:
I wish there was less armchair language design and post-rationalization on this topic, and more research into *why* we actually changed this.
I think the documentation already describes the reasoning just fine: 1. Performance. It "provides significant scope for speed optimizations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter)." 2. As a solution to the metaclass problem with the handful of methods implemented by both type and normal types--e.g., you don't want hash(int) to call int.__hash__, but type.__hash__. (This one obviously doesn't affect most operators; you can't generally add or xor types--but List[int] shows why that can't be ignored.) Anyway, I had already looked at the relevant PEPs, what's new, descrintro, and the old PLAN.txt. But, at your suggestion, I decided to take a look back at the commit comments and Python-dev archives as well, to see if there is some other reason for the change beyond those mentioned in the docs. In fact, there was almost no discussion at all. The descr branch was merged to trunk on 2 August 2001, nobody mentioned either problem, and then on 28 Aug you committed #19535 and #19538. The performance issue was raised in descrintro and the PEP, and a brief early thread, but it seems like everyone was happy to wait until you declared it done before griping about performance problems that may or may not result. Everyone was happy with 2.2a4 (which was after the change), so nobody had a reason to complain. As far as I can see, nobody noticed the hash(int) issue before you found and fixed it. It wasn't mentioned in PLAN.txt, any mailing list threads, etc. It apparently didn't even come up after the fact (except in a brief aside about PEP 266) until much later, when people got used to using the new features in practice and started asking why they can't do instance overrides for some special methods (at which point the answer was already in the docs). Unless I'm missing something (which is certainly possible), or you can remember your reasoning 14 years ago, I think what the docs say is pretty much all there is to say. In particular, do you actually have a reason that spam + eggs shouldn't look at spam.__dict__['__add__'] other than for performance and for consistency with the hash(int) solution?
I recall we did not take the decision lightly. Surely correlating the source control logs and the mailing list archives can shed more insight on this topic than thought experiments about classes with five special values. :-)
Patches to the reference manual are also very welcome, BTW. We're not proud of the imprecision of much of the language there -- I would love for some language lawyers to help tighten the language (not just on this topic but all over the reference manual).
Finally. Regardless of what the reference manual may (not) say, other implementations do *not* have the freedom to change the operator lookup to look in the instance dict first.
IIRC, Jython 2.2 looks at the instance first unless the type is a Java class, and nobody ever complained (in fact, I remember someone complaining about CPython breaking his code that worked in Jython...). But it would certainly be easier to tighten up the language in 3.3.9 if it applies to all Python implementations.
(However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.)
I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so?
On 6 December 2015 at 11:56, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Dec 5, 2015, at 09:30, Guido van Rossum <guido@python.org> wrote:
(However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.)
I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so?
Ronald Oussoren has elaborated on that aspect of the problem in his __getdescriptor__ PEP: https://www.python.org/dev/peps/pep-0447/ The main reason it's separate from __getattribute__ is that this is necessary to avoid changing the semantics of super.__getattribute__, but it's also the case that things would otherwise get quite confusing with object.__getattribute__ and super.__getattribute__ potentially calling type.__getattribute__, which then has the potential for strange consequences when you consider that "type" is itself an instance of "type". My recollection of the previous round of discussions on that PEP is that we're actually pretty happy with the design - it's now dependent on someone with the roundtuits to update the reference implementation to match the current PEP text and the head of the current development branch. Regards, Nick. P.S. Now that I've realised PEP 447 is relevant to the current discussion, I've also realised that providing a base case in type.__getattribute__ that terminates the metaclass lookup chain for attributes is likely sufficient explanation for why this works the way it does now - if it didn't, type being its own metaclass would trigger a recursion error. PEP 447 extends the chain one step further (to the metaclass) by introducing a new magic method, but *that* method in turn still can't be altered the metaclass of the metaclass. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 06 Dec 2015, at 13:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 December 2015 at 11:56, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Dec 5, 2015, at 09:30, Guido van Rossum <guido@python.org> wrote:
(However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.)
I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so?
Ronald Oussoren has elaborated on that aspect of the problem in his __getdescriptor__ PEP: https://www.python.org/dev/peps/pep-0447/
The main reason it's separate from __getattribute__ is that this is necessary to avoid changing the semantics of super.__getattribute__, but it's also the case that things would otherwise get quite confusing with object.__getattribute__ and super.__getattribute__ potentially calling type.__getattribute__, which then has the potential for strange consequences when you consider that "type" is itself an instance of "type".
My recollection of the previous round of discussions on that PEP is that we're actually pretty happy with the design - it's now dependent on someone with the roundtuits to update the reference implementation to match the current PEP text and the head of the current development branch.
Mark Shannon had some concerns about how my proposal affects the object model. I didn’t quite get his concerns at the time, probably because I was thinking to much about the implementation. BTW. Sorry about not following up at the time, I got sucked back into work :-(. My plan for the week is to get a version of PyObC with support of OSX 10.11 on PyPI, after that I hope to return to PEP 447. Ronald
I distinctly remember that by the time I committed that code I had seen numerous examples (perhaps in proprietary code at Zope) of attempts to use per-instance overloading going wrong. I believe what was happening was that users were trying to overload operators using instance attributes named after operators, and were trying to come up with clever tricks to ensure that the instance argument was passed to the method (there use case was not special cases like Infinities, they had code that needed access to both arguments). When I saw the resulting ugly code I figured there would be little loss if we just not allowed the practice. That's really all I can recall. I don't think there's a need to make another U-turn on this topic. Regarding the docs we may still have to add language to make it clear that the behavior you describe for Jython is wrong. (I also believe there was a request earlier in this thread to clarify exactly which methods are not looked up on the instance.) On Sat, Dec 5, 2015 at 5:56 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Dec 5, 2015, at 09:30, Guido van Rossum <guido@python.org> wrote:
I wish there was less armchair language design and post-rationalization
on this topic, and more research into *why* we actually changed this.
I think the documentation already describes the reasoning just fine:
1. Performance. It "provides significant scope for speed optimizations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter)."
2. As a solution to the metaclass problem with the handful of methods implemented by both type and normal types--e.g., you don't want hash(int) to call int.__hash__, but type.__hash__. (This one obviously doesn't affect most operators; you can't generally add or xor types--but List[int] shows why that can't be ignored.)
Anyway, I had already looked at the relevant PEPs, what's new, descrintro, and the old PLAN.txt. But, at your suggestion, I decided to take a look back at the commit comments and Python-dev archives as well, to see if there is some other reason for the change beyond those mentioned in the docs.
In fact, there was almost no discussion at all. The descr branch was merged to trunk on 2 August 2001, nobody mentioned either problem, and then on 28 Aug you committed #19535 and #19538.
The performance issue was raised in descrintro and the PEP, and a brief early thread, but it seems like everyone was happy to wait until you declared it done before griping about performance problems that may or may not result. Everyone was happy with 2.2a4 (which was after the change), so nobody had a reason to complain.
As far as I can see, nobody noticed the hash(int) issue before you found and fixed it. It wasn't mentioned in PLAN.txt, any mailing list threads, etc. It apparently didn't even come up after the fact (except in a brief aside about PEP 266) until much later, when people got used to using the new features in practice and started asking why they can't do instance overrides for some special methods (at which point the answer was already in the docs).
Unless I'm missing something (which is certainly possible), or you can remember your reasoning 14 years ago, I think what the docs say is pretty much all there is to say.
In particular, do you actually have a reason that spam + eggs shouldn't look at spam.__dict__['__add__'] other than for performance and for consistency with the hash(int) solution?
I recall we did not take the decision lightly. Surely correlating the source control logs and the mailing list archives can shed more insight on this topic than thought experiments about classes with five special values. :-)
Patches to the reference manual are also very welcome, BTW. We're not
proud of the imprecision of much of the language there -- I would love for some language lawyers to help tighten the language (not just on this topic but all over the reference manual).
Finally. Regardless of what the reference manual may (not) say, other implementations do *not* have the freedom to change the operator lookup to look in the instance dict first.
IIRC, Jython 2.2 looks at the instance first unless the type is a Java class, and nobody ever complained (in fact, I remember someone complaining about CPython breaking his code that worked in Jython...).
But it would certainly be easier to tighten up the language in 3.3.9 if it applies to all Python implementations.
(However, giving the metaclass more control is not unreasonable. There also seems to be some interesting behavior here related to slots.)
I'm not sure what you're suggesting here. That implementations can let a metaclass __getattribute__ hook special method lookup, but some implementations (including CPython 3.6) won't do so?
-- --Guido van Rossum (python.org/~guido)
participants (9)
-
Amaury Forgeot d'Arc
-
Andrew Barnert
-
Brendan Barnwell
-
Chris Angelico
-
Guido van Rossum
-
Nick Coghlan
-
Ronald Oussoren
-
Stephan Sahm
-
Steven D'Aprano