Is there a reason some of the PyLong_As* functions don't call an object's __int__?

IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case. I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not. Currently the following functions fall back on __int__ where available: PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask whereas the following (at least according to the docs--haven't checked the code in all cases) do not: PyLong_AsSsize_t PyLong_AsUnsignedLong PyLong_AsSize_t PyLong_AsUnsignedLongLong PyLong_AsDouble PyLong_AsVoidPtr I think this inconsistency should be fixed, unless there's some reason for it I'm not seeing. Thanks, Erik

On Fri, 8 Dec 2017 11:41:10 +0100 Erik Bray <erik.m.bray@gmail.com> wrote:
I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.
It should probably call PyNumber_AsSsize_t instead (which will call __index__, which is the right thing here).
I think this inconsistency should be fixed, unless there's some reason for it I'm not seeing.
That sounds reasonable to me. Regards Antoine.

08.12.17 12:41, Erik Bray пише:
IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case.
There is also a case of int subclasses. It is expected that PyLong_AsLong is atomic, and calling __int__ can lead to crashes or similar consequences.
I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.
PyLong_* functions provide an interface to PyLong objects. If they don't return the content of a PyLong object, how can it be retrieved? If you want to work with general numbers you should use PyNumber_* functions. In your particular case it is more reasonable to fallback to __index__ rather than __int__. Unlikely maxlen=4.2 makes sense.
Currently the following functions fall back on __int__ where available:
PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask
I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.

On Fri, 8 Dec 2017 13:26:48 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
Currently the following functions fall back on __int__ where available:
PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask
I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.
I think that's a bad idea. There are widely-used int-like classes out there and it will break actual code:
import numpy as np x = np.int64(5) isinstance(x, int) False x.__int__() 5
Regards Antoine.

08.12.17 13:36, Antoine Pitrou пише:
On Fri, 8 Dec 2017 13:26:48 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
Currently the following functions fall back on __int__ where available:
PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask
I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.
I think that's a bad idea. There are widely-used int-like classes out there and it will break actual code:
import numpy as np x = np.int64(5) isinstance(x, int) False x.__int__() 5
NumPy integers implement __index__.

On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
NumPy integers implement __index__.
That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow().
Right--pointing to __index__ basically implies that PyIndex_Check and subsequent PyNumber_AsSsize_t than there currently are. That I could agree with but then it becomes a question of where are those cases? And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). Add more PyNumber_ conversion functions?

On Fri, 8 Dec 2017 15:12:30 +0100 Erik Bray <erik.m.bray@gmail.com> wrote:
On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
NumPy integers implement __index__.
That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow().
Right--pointing to __index__ basically implies that PyIndex_Check and subsequent PyNumber_AsSsize_t than there currently are. That I could agree with but then it becomes a question of where are those cases? And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). Add more PyNumber_ conversion functions?
We would probably need more PyNumber_ conversion functions indeed. Regards Antoine.

On Fri, Dec 8, 2017 at 12:26 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
08.12.17 12:41, Erik Bray пише:
IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case.
There is also a case of int subclasses. It is expected that PyLong_AsLong is atomic, and calling __int__ can lead to crashes or similar consequences.
I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.
PyLong_* functions provide an interface to PyLong objects. If they don't return the content of a PyLong object, how can it be retrieved? If you want to work with general numbers you should use PyNumber_* functions.
By "you " I assume you meant the generic "you". I'm not the one who broke things in this case :)
In your particular case it is more reasonable to fallback to __index__ rather than __int__. Unlikely maxlen=4.2 makes sense.
That's true, but in Python 2 that was possible:
deque([], maxlen=4.2) deque([], maxlen=4)
More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used. It was mainly motivated by slices, but it *could* be used in general cases where it definitely wouldn't make sense to accept a float (I wonder if maybe the real problem here is that floats can be coerced automatically to ints....) In other words, there are probably countless other cases in the stdlib at all where it "doesn't make sense" to accept a float, but that otherwise should accept objects that can be coerced to int without having to manually wrap those objects with an int(o) call.
Currently the following functions fall back on __int__ where available:
PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask
I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.
Okay, but then there are probably many cases where they should be replaced with PyNumber_ equivalents or else who knows how much code would break.

On 12/08/2017 04:33 AM, Erik Bray wrote:
More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used.
__int__ is for coercion (float, fraction, etc) __index__ is for true integers Note that if __index__ is defined, __int__ should also be defined, and return the same value. https://docs.python.org/3/reference/datamodel.html#object.__index__ -- ~Ethan~

On Fri, Dec 8, 2017 at 7:20 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 12/08/2017 04:33 AM, Erik Bray wrote:
More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used.
__int__ is for coercion (float, fraction, etc)
__index__ is for true integers
Note that if __index__ is defined, __int__ should also be defined, and return the same value.
https://docs.python.org/3/reference/datamodel.html#object.__index__
This doesn't appear to be enforced, though I think maybe it should be. I'll also note that because of the changes I pointed out in my original post, it's now necessary for me to explicitly cast as int() objects that previously "just worked" when passed as arguments in some functions in itertools, collections, and other modules with C implementations. However, this is bad because if some broken code is passing floats to these arguments, they will be quietly cast to int and succeed, when really I should only be accepting objects that have __index__. There's no index() alternative to int(). I think changing all these functions to do the appropriate PyIndex_Check is a correct and valid fix, but I think it also stretches beyond the original purpose of __index__. I think that __index__ is relatively unknown, and perhaps there should be better documentation as to when and how it should be used over the better-known __int__.

On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
28.12.17 12:10, Erik Bray пише:
There's no index() alternative to int().
operator.index()
Okay, and it's broken. That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.

On 29 December 2017 at 22:58, Erik Bray <erik.m.bray@gmail.com> wrote:
On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
28.12.17 12:10, Erik Bray пише:
There's no index() alternative to int().
operator.index()
Okay, and it's broken.
Broken in what way? It has a fairly extensive test suite in https://github.com/python/cpython/blob/master/Lib/test/test_index.py (and some additional indirect testing in test_slice.py, which assumes that it works as advertised).
That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.
The size_t, ssize_t and void pointer conversions should only accept true integers (so either no fallback, or fall back to `__index__`). The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`. I'm less sure about the conversion to double, but allowing that to be used on float objects without reporting a type error seems like a bug magnet to me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, 30 Dec 2017 00:43:07 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.
The size_t, ssize_t and void pointer conversions should only accept true integers (so either no fallback, or fall back to `__index__`).
The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`.
That is the statu quo indeed... but the destination C type shouldn't be used as a criterion of which __dunder__ method is called. For example, let's assume I'm writing a piece of code that expects a pid number. The C type is `pid_t`, which presumably translates either to a C `int` or `long` (*). But it's not right to accept a float there... I think we really need a bunch a `PyIndex_AsXXX` functions (`PyIndex_AsLong`, etc.) to complement the current `PyLong_AsXXX` functions. That way, every `PyLong_AsXXX` function can continue calling `__int__` (if they ever did so), while `PyIndex_AsXXX` would only call `__index__`. (*) I got curious and went through the maze of type definitions on GNU/Linux. Which gives: #define __S32_TYPEDEF __signed__ int #define __PID_T_TYPE __S32_TYPE __STD_TYPE __PID_T_TYPE __pid_t; typedef __pid_t pid_t; Regards Antoine.

On 12/29/17 9:56 AM, Antoine Pitrou wrote:
(*) I got curious and went through the maze of type definitions on GNU/Linux. Which gives:
#define __S32_TYPEDEF __signed__ int #define __PID_T_TYPE __S32_TYPE __STD_TYPE __PID_T_TYPE __pid_t; typedef __pid_t pid_t;
Regards
Antoine.
One quick side note, just because it mapped to signed int on that Linux, doesn't mean it will always map to signed int on all Linuxes. One of the reasons for the multiple levels of indirection in types is to allow a given distribution to configure some parameter types to be 'optimal' for that implementation. -- Richard Damon

29.12.17 16:43, Nick Coghlan пише:
On 29 December 2017 at 22:58, Erik Bray <erik.m.bray@gmail.com> wrote:
Okay, and it's broken.
Broken in what way? It has a fairly extensive test suite in https://github.com/python/cpython/blob/master/Lib/test/test_index.py (and some additional indirect testing in test_slice.py, which assumes that it works as advertised).
Unfortunately the pure Python implementation doesn't work correctly in corner cases (https://bugs.python.org/issue18712). But in CPython the C implementation is used. Maybe Erik means something other.
The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`.
There is a code that relies on the atomicity of these functions. Calling __int__ or __index__ will introduce vulnerabilities in the existing code.

On 8 December 2017 at 22:33, Erik Bray <erik.m.bray@gmail.com> wrote:
In other words, there are probably countless other cases in the stdlib at all where it "doesn't make sense" to accept a float, but that otherwise should accept objects that can be coerced to int without having to manually wrap those objects with an int(o) call.
Updating these to call __index__ is fine (since that sets the expectation of a *lossless* conversion to an integer), but updating them to call __int__ generally isn't (since that conversion is allowed to be lossy, which may cause surprising behaviour). Indexing & slicing were the primary original use case for that approach (hence the method name), but it's also used for sequence repetition, and other operations. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Antoine Pitrou
-
Erik Bray
-
Ethan Furman
-
Nick Coghlan
-
Richard Damon
-
Serhiy Storchaka