Python-versus-CPython question for __mul__ dispatch
Hi all, While attempting to clean up some of the more squamous aspects of numpy's operator dispatch code [1][2], I've encountered a situation where the semantics we want and are using are possible according to CPython-the-interpreter, but AFAICT ought not to be possible according to Python-the-language, i.e., it's not clear to me whether it's possible even in principle to implement an object that works the way numpy.ndarray does in any other interpreter. Which makes me a bit nervous, so I wanted to check if there was any ruling on this. Specifically, the quirk we are relying on is this: in CPython, if you do [1, 2] * my_object then my_object's __rmul__ gets called *before* list.__mul__, *regardless* of the inheritance relationship between list and type(my_object). This occurs as a side-effect of the weirdness involved in having both tp_as_number->nb_multiply and tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b", CPython tries a's nb_multiply, then b's nb_multiply, then a's sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an nb_multiply, this means that my_object's nb_multiply gets called before any list method. Here's an example demonstrating how weird this is. list.__mul__ wants an integer, and by "integer" it means "any object with an __index__ method". So here's a class that list is happy to be multiplied by -- according to the ordinary rules for operator dispatch, in the example below Indexable.__mul__ and __rmul__ shouldn't even get a look-in: In [3]: class Indexable(object): ...: def __index__(self): ...: return 2 ...: In [4]: [1, 2] * Indexable() Out[4]: [1, 2, 1, 2] But, if I add an __rmul__ method, then this actually wins: In [6]: class IndexableWithMul(object): ...: def __index__(self): ...: return 2 ...: def __mul__(self, other): ...: return "indexable forward mul" ...: def __rmul__(self, other): ...: return "indexable reverse mul" In [7]: [1, 2] * IndexableWithMul() Out[7]: 'indexable reverse mul' In [8]: IndexableWithMul() * [1, 2] Out[8]: 'indexable forward mul' NumPy arrays, of course, correctly define both __index__ method (which raises an array on general arrays but coerces to int for arrays that contain exactly 1 integer), and also defines an nb_multiply slot which accepts lists and performs elementwise multiplication: In [9]: [1, 2] * np.array(2) Out[9]: array([2, 4]) And that's all great! Just what we want. But the only reason this is possible, AFAICT, is that CPython 'list' is a weird type with undocumented behaviour that you can't actually define using pure Python code. Should I be worried? -n [1] https://github.com/numpy/numpy/pull/5864 [2] https://github.com/numpy/numpy/issues/5844 -- Nathaniel J. Smith -- http://vorpus.org
Should I be worried?
You mean should *I* be worried ;) Stuff like this is highly relevant for JyNI, so thanks very much for clarifying this subtle behavior. It went onto my todo-list right now to ensure that JyNI will emulate this behavior as soon as I am done with gc-support. (Hopefully it will be feasible, but I can only tell in half a year or so since there are currently other priorities.) Still, this "essay" potentially will save me a lot of time. So, everybody please feel encouraged to post things like this as they come up. Maybe there could be kind of a pitfalls-page somewhere in the docs collecting these things. Best Stefan
Gesendet: Freitag, 15. Mai 2015 um 02:45 Uhr Von: "Nathaniel Smith"
An: "Python Dev" Betreff: [Python-Dev] Python-versus-CPython question for __mul__ dispatch Hi all,
While attempting to clean up some of the more squamous aspects of numpy's operator dispatch code [1][2], I've encountered a situation where the semantics we want and are using are possible according to CPython-the-interpreter, but AFAICT ought not to be possible according to Python-the-language, i.e., it's not clear to me whether it's possible even in principle to implement an object that works the way numpy.ndarray does in any other interpreter. Which makes me a bit nervous, so I wanted to check if there was any ruling on this.
Specifically, the quirk we are relying on is this: in CPython, if you do
[1, 2] * my_object
then my_object's __rmul__ gets called *before* list.__mul__, *regardless* of the inheritance relationship between list and type(my_object). This occurs as a side-effect of the weirdness involved in having both tp_as_number->nb_multiply and tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b", CPython tries a's nb_multiply, then b's nb_multiply, then a's sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an nb_multiply, this means that my_object's nb_multiply gets called before any list method.
Here's an example demonstrating how weird this is. list.__mul__ wants an integer, and by "integer" it means "any object with an __index__ method". So here's a class that list is happy to be multiplied by -- according to the ordinary rules for operator dispatch, in the example below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:
In [3]: class Indexable(object): ...: def __index__(self): ...: return 2 ...:
In [4]: [1, 2] * Indexable() Out[4]: [1, 2, 1, 2]
But, if I add an __rmul__ method, then this actually wins:
In [6]: class IndexableWithMul(object): ...: def __index__(self): ...: return 2 ...: def __mul__(self, other): ...: return "indexable forward mul" ...: def __rmul__(self, other): ...: return "indexable reverse mul"
In [7]: [1, 2] * IndexableWithMul() Out[7]: 'indexable reverse mul'
In [8]: IndexableWithMul() * [1, 2] Out[8]: 'indexable forward mul'
NumPy arrays, of course, correctly define both __index__ method (which raises an array on general arrays but coerces to int for arrays that contain exactly 1 integer), and also defines an nb_multiply slot which accepts lists and performs elementwise multiplication:
In [9]: [1, 2] * np.array(2) Out[9]: array([2, 4])
And that's all great! Just what we want. But the only reason this is possible, AFAICT, is that CPython 'list' is a weird type with undocumented behaviour that you can't actually define using pure Python code.
Should I be worried?
-n
[1] https://github.com/numpy/numpy/pull/5864 [2] https://github.com/numpy/numpy/issues/5844
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/stefan.richthofer%40gmx.d...
I expect you can make something that behaves like list by defining __mul__
and __rmul__ and returning NotImplemented.
On Thursday, May 14, 2015, Stefan Richthofer
Should I be worried?
You mean should *I* be worried ;)
Stuff like this is highly relevant for JyNI, so thanks very much for clarifying this subtle behavior. It went onto my todo-list right now to ensure that JyNI will emulate this behavior as soon as I am done with gc-support. (Hopefully it will be feasible, but I can only tell in half a year or so since there are currently other priorities.) Still, this "essay" potentially will save me a lot of time.
So, everybody please feel encouraged to post things like this as they come up. Maybe there could be kind of a pitfalls-page somewhere in the docs collecting these things.
Best
Stefan
Gesendet: Freitag, 15. Mai 2015 um 02:45 Uhr Von: "Nathaniel Smith"
javascript:;> An: "Python Dev" javascript:;> Betreff: [Python-Dev] Python-versus-CPython question for __mul__ dispatch Hi all,
While attempting to clean up some of the more squamous aspects of numpy's operator dispatch code [1][2], I've encountered a situation where the semantics we want and are using are possible according to CPython-the-interpreter, but AFAICT ought not to be possible according to Python-the-language, i.e., it's not clear to me whether it's possible even in principle to implement an object that works the way numpy.ndarray does in any other interpreter. Which makes me a bit nervous, so I wanted to check if there was any ruling on this.
Specifically, the quirk we are relying on is this: in CPython, if you do
[1, 2] * my_object
then my_object's __rmul__ gets called *before* list.__mul__, *regardless* of the inheritance relationship between list and type(my_object). This occurs as a side-effect of the weirdness involved in having both tp_as_number->nb_multiply and tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b", CPython tries a's nb_multiply, then b's nb_multiply, then a's sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an nb_multiply, this means that my_object's nb_multiply gets called before any list method.
Here's an example demonstrating how weird this is. list.__mul__ wants an integer, and by "integer" it means "any object with an __index__ method". So here's a class that list is happy to be multiplied by -- according to the ordinary rules for operator dispatch, in the example below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:
In [3]: class Indexable(object): ...: def __index__(self): ...: return 2 ...:
In [4]: [1, 2] * Indexable() Out[4]: [1, 2, 1, 2]
But, if I add an __rmul__ method, then this actually wins:
In [6]: class IndexableWithMul(object): ...: def __index__(self): ...: return 2 ...: def __mul__(self, other): ...: return "indexable forward mul" ...: def __rmul__(self, other): ...: return "indexable reverse mul"
In [7]: [1, 2] * IndexableWithMul() Out[7]: 'indexable reverse mul'
In [8]: IndexableWithMul() * [1, 2] Out[8]: 'indexable forward mul'
NumPy arrays, of course, correctly define both __index__ method (which raises an array on general arrays but coerces to int for arrays that contain exactly 1 integer), and also defines an nb_multiply slot which accepts lists and performs elementwise multiplication:
In [9]: [1, 2] * np.array(2) Out[9]: array([2, 4])
And that's all great! Just what we want. But the only reason this is possible, AFAICT, is that CPython 'list' is a weird type with undocumented behaviour that you can't actually define using pure Python code.
Should I be worried?
-n
[1] https://github.com/numpy/numpy/pull/5864 [2] https://github.com/numpy/numpy/issues/5844
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org javascript:; https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/stefan.richthofer%40gmx.d...
_______________________________________________ Python-Dev mailing list Python-Dev@python.org javascript:; https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (on iPad)
On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum
I expect you can make something that behaves like list by defining __mul__ and __rmul__ and returning NotImplemented.
Hmm, it's fairly tricky, and part of the trick is that you can never return NotImplemented (because you have to pretty much take over and entirely replace the normal dispatch rules inside __mul__ and __rmul__), but see attached for something I think should work. So I guess this is just how Python's list, tuple, etc. work, and PyPy and friends need to match... -n
On Thursday, May 14, 2015, Stefan Richthofer
wrote: Should I be worried?
You mean should *I* be worried ;)
Stuff like this is highly relevant for JyNI, so thanks very much for clarifying this subtle behavior. It went onto my todo-list right now to ensure that JyNI will emulate this behavior as soon as I am done with gc-support. (Hopefully it will be feasible, but I can only tell in half a year or so since there are currently other priorities.) Still, this "essay" potentially will save me a lot of time.
So, everybody please feel encouraged to post things like this as they come up. Maybe there could be kind of a pitfalls-page somewhere in the docs collecting these things.
Best
Stefan
Gesendet: Freitag, 15. Mai 2015 um 02:45 Uhr Von: "Nathaniel Smith"
An: "Python Dev" Betreff: [Python-Dev] Python-versus-CPython question for __mul__ dispatch Hi all,
While attempting to clean up some of the more squamous aspects of numpy's operator dispatch code [1][2], I've encountered a situation where the semantics we want and are using are possible according to CPython-the-interpreter, but AFAICT ought not to be possible according to Python-the-language, i.e., it's not clear to me whether it's possible even in principle to implement an object that works the way numpy.ndarray does in any other interpreter. Which makes me a bit nervous, so I wanted to check if there was any ruling on this.
Specifically, the quirk we are relying on is this: in CPython, if you do
[1, 2] * my_object
then my_object's __rmul__ gets called *before* list.__mul__, *regardless* of the inheritance relationship between list and type(my_object). This occurs as a side-effect of the weirdness involved in having both tp_as_number->nb_multiply and tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b", CPython tries a's nb_multiply, then b's nb_multiply, then a's sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an nb_multiply, this means that my_object's nb_multiply gets called before any list method.
Here's an example demonstrating how weird this is. list.__mul__ wants an integer, and by "integer" it means "any object with an __index__ method". So here's a class that list is happy to be multiplied by -- according to the ordinary rules for operator dispatch, in the example below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:
In [3]: class Indexable(object): ...: def __index__(self): ...: return 2 ...:
In [4]: [1, 2] * Indexable() Out[4]: [1, 2, 1, 2]
But, if I add an __rmul__ method, then this actually wins:
In [6]: class IndexableWithMul(object): ...: def __index__(self): ...: return 2 ...: def __mul__(self, other): ...: return "indexable forward mul" ...: def __rmul__(self, other): ...: return "indexable reverse mul"
In [7]: [1, 2] * IndexableWithMul() Out[7]: 'indexable reverse mul'
In [8]: IndexableWithMul() * [1, 2] Out[8]: 'indexable forward mul'
NumPy arrays, of course, correctly define both __index__ method (which raises an array on general arrays but coerces to int for arrays that contain exactly 1 integer), and also defines an nb_multiply slot which accepts lists and performs elementwise multiplication:
In [9]: [1, 2] * np.array(2) Out[9]: array([2, 4])
And that's all great! Just what we want. But the only reason this is possible, AFAICT, is that CPython 'list' is a weird type with undocumented behaviour that you can't actually define using pure Python code.
Should I be worried?
-n
[1] https://github.com/numpy/numpy/pull/5864 [2] https://github.com/numpy/numpy/issues/5844
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/stefan.richthofer%40gmx.d...
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (on iPad)
-- Nathaniel J. Smith -- http://vorpus.org
On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith
On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum
wrote: I expect you can make something that behaves like list by defining __mul__ and __rmul__ and returning NotImplemented.
Hmm, it's fairly tricky, and part of the trick is that you can never return NotImplemented (because you have to pretty much take over and entirely replace the normal dispatch rules inside __mul__ and __rmul__), but see attached for something I think should work.
So I guess this is just how Python's list, tuple, etc. work, and PyPy and friends need to match...
For the record, it looks like PyPy does already have a hack to implement this -- they do it by having a hidden flag on the built-in sequence types which the implementations of '*' and '+' check for, and if it's found it triggers a different rule for dispatching to the __op__ methods: https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa... -- Nathaniel J. Smith -- http://vorpus.org
On 15 May 2015 at 10:45, Nathaniel Smith
Hi all,
While attempting to clean up some of the more squamous aspects of numpy's operator dispatch code [1][2], I've encountered a situation where the semantics we want and are using are possible according to CPython-the-interpreter, but AFAICT ought not to be possible according to Python-the-language, i.e., it's not clear to me whether it's possible even in principle to implement an object that works the way numpy.ndarray does in any other interpreter. Which makes me a bit nervous, so I wanted to check if there was any ruling on this.
It's a known CPython operand precedence bug due to the fact several of the builtin types only implement sq_concat & sq_repeat without implementing nb_add & nb_mul: http://bugs.python.org/issue11477 There's then a related problem where we *don't* process "NotImplemented" results from sq_concat and sq_repeat properly, so all the builtin sequence types throw TypeError directly, instead of returning NotImplemented when they don't recognise the other type. I wrote a preliminary patch attempting to fix it a few years back after the issue was discovered by Mike Bayer and Alex Gaynor when porting SQL Alchemy to PyPy, but never committed it because my own verdict on the approach I used was that it rendered the abstract object API implementation for __mul__ and __add__ utterly unmaintainable. The better fix would be to make defining sq_concat and sq_repeat more like defining __add__ and __mul__ at the Python level: PyType_Ready should implicitly fill in nb_add and nb_mul references to standard implementations that delegate to sq_concat and sq_repeat, and we should update the implementations of the latter for the standard library sequence types implemented in C to return NotImplemented rather than throwing TypeError directly. However, my intermittent attempts to get anyone else interested in fixing it haven't borne any fruit, and I've prioritised other projects over coming up with a different patch myself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 15 May 2015 at 16:53, Nathaniel Smith
On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum
wrote: I expect you can make something that behaves like list by defining __mul__ and __rmul__ and returning NotImplemented.
Hmm, it's fairly tricky, and part of the trick is that you can never return NotImplemented (because you have to pretty much take over and entirely replace the normal dispatch rules inside __mul__ and __rmul__), but see attached for something I think should work.
So I guess this is just how Python's list, tuple, etc. work, and PyPy and friends need to match...
No, CPython is broken. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 16 May 2015 at 07:35, Nathaniel Smith
On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith
wrote: On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum
wrote: I expect you can make something that behaves like list by defining __mul__ and __rmul__ and returning NotImplemented.
Hmm, it's fairly tricky, and part of the trick is that you can never return NotImplemented (because you have to pretty much take over and entirely replace the normal dispatch rules inside __mul__ and __rmul__), but see attached for something I think should work.
So I guess this is just how Python's list, tuple, etc. work, and PyPy and friends need to match...
For the record, it looks like PyPy does already have a hack to implement this -- they do it by having a hidden flag on the built-in sequence types which the implementations of '*' and '+' check for, and if it's found it triggers a different rule for dispatching to the __op__ methods: https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa...
Oh, that's rather annoying that the PyPy team implemented bug-for-bug compatibility there, and didn't follow up on the operand precedence bug report to say that they had done so. We also hadn't previously been made aware that NumPy is relying on this operand precedence bug to implement publicly documented API behaviour, so fixing it *would* break end user code :( I guess that means someone in the numeric community will need to write a PEP to make this "try the other operand first" "feature" part of the language specification, so that other interpreters can implement it up front, rather than all having to come up with their own independent custom hacks just to make NumPy work. Regards, Nick. P.S. It would also be nice if someone could take on the PEP for a Python level buffer API for 3.6: http://bugs.python.org/issue13797 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, May 16, 2015 at 1:31 AM, Nick Coghlan
On 16 May 2015 at 07:35, Nathaniel Smith
wrote: On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith
wrote: On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum
wrote: I expect you can make something that behaves like list by defining __mul__ and __rmul__ and returning NotImplemented.
Hmm, it's fairly tricky, and part of the trick is that you can never return NotImplemented (because you have to pretty much take over and entirely replace the normal dispatch rules inside __mul__ and __rmul__), but see attached for something I think should work.
So I guess this is just how Python's list, tuple, etc. work, and PyPy and friends need to match...
For the record, it looks like PyPy does already have a hack to implement this -- they do it by having a hidden flag on the built-in sequence types which the implementations of '*' and '+' check for, and if it's found it triggers a different rule for dispatching to the __op__ methods: https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa...
Oh, that's rather annoying that the PyPy team implemented bug-for-bug compatibility there, and didn't follow up on the operand precedence bug report to say that they had done so. We also hadn't previously been made aware that NumPy is relying on this operand precedence bug to implement publicly documented API behaviour, so fixing it *would* break end user code :(
I don't think any of us were aware of it either :-). It is a fairly obscure case -- it only comes up specifically if you have a single-element integer array that you are trying to multiply by a list that you expect to be auto-coerced to an array. If Python semantics were such that this became impossible to handle correctly then we would survive. (We've certainly survived worse, e.g. arr[array_of_indices] += 1 silently gives the wrong/unexpected result when array_of_indices has duplicate entries, and this bites people constantly. Unfortunately I can't see any reasonable way to fix this within Python's semantics, so... oh well.) But yeah, given that we're at a point where list dispatch actually has worked this way forever and across multiple interpreter implementations, I think it's de facto going to end up part of the language specification unless someone does something pretty quick...
I guess that means someone in the numeric community will need to write a PEP to make this "try the other operand first" "feature" part of the language specification, so that other interpreters can implement it up front, rather than all having to come up with their own independent custom hacks just to make NumPy work.
I'll make a note...
P.S. It would also be nice if someone could take on the PEP for a Python level buffer API for 3.6: http://bugs.python.org/issue13797
At a guess, if you want to find people who have this itch strong enough to try scratching it, then probably numpy users are actually not your best bet, b/c if you have numpy then you already have workarounds. In particular, numpy still supports a legacy Python level buffer export API: http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#python-side So if all you want is to hand a buffer to numpy (rather than to an arbitrary PEP 3118 consumer) then this works fine, and if you do need an arbitrary PEP 3118 consumer then you can use numpy as an adaptor (use __array_interface__ to convert your object to ndarray -> ndarray supports the PEP 3118 API). -n -- Nathaniel J. Smith -- http://vorpus.org
Hi Nick,
On 16 May 2015 at 10:31, Nick Coghlan
Oh, that's rather annoying that the PyPy team implemented bug-for-bug compatibility there, and didn't follow up on the operand precedence bug report to say that they had done so.
It's sadly not the only place, by far, where a behavior of CPython could be considered an implementation detail, but people rely on it and so we need to write a workaround. We don't report all of them, particularly not the ones that are clearly of the kind "won't be changed in CPython 2.7". Maybe we should? Another example where this same bug occurs is: class T(tuple): def __radd__(self, other): return 42 lst = [ ] lst += T() which calls T.__radd__ in contradiction to all the general rules. (Yes, if you print(lst) afterwards, you get 42. And oops, trying this out on PyPy does not give 42; only "lst + T()" does. Probably another corner case to fix...) A bientôt, Armin.
We have a similar experience -- Pyston runs into a similar issue with
sqlalchemy (with "str() + foo" calling foo.__radd__ before str.sq_concat)
and we are working to match CPython's behavior.
On Tue, May 19, 2015 at 7:00 AM, Armin Rigo
Hi Nick,
On 16 May 2015 at 10:31, Nick Coghlan
wrote: Oh, that's rather annoying that the PyPy team implemented bug-for-bug compatibility there, and didn't follow up on the operand precedence bug report to say that they had done so.
It's sadly not the only place, by far, where a behavior of CPython could be considered an implementation detail, but people rely on it and so we need to write a workaround. We don't report all of them, particularly not the ones that are clearly of the kind "won't be changed in CPython 2.7". Maybe we should?
Another example where this same bug occurs is:
class T(tuple): def __radd__(self, other): return 42
lst = [ ] lst += T()
which calls T.__radd__ in contradiction to all the general rules. (Yes, if you print(lst) afterwards, you get 42. And oops, trying this out on PyPy does not give 42; only "lst + T()" does. Probably another corner case to fix...)
A bientôt,
Armin. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/kevmod%40gmail.com
participants (6)
-
Armin Rigo
-
Guido van Rossum
-
Kevin Modzelewski
-
Nathaniel Smith
-
Nick Coghlan
-
Stefan Richthofer