Mailman 3 Is there a reason some of the PyLong_As* functions don't call an object's __int__? - Python-ideas

Is there a reason some of the PyLong_As* functions don't call an object's int?

Erik Bray

8 Dec 2017 8 Dec '17

10:41 a.m.

IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case. I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not. Currently the following functions fall back on __int__ where available: PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask whereas the following (at least according to the docs--haven't checked the code in all cases) do not: PyLong_AsSsize_t PyLong_AsUnsignedLong PyLong_AsSize_t PyLong_AsUnsignedLongLong PyLong_AsDouble PyLong_AsVoidPtr I think this inconsistency should be fixed, unless there's some reason for it I'm not seeing. Thanks, Erik

Show replies by date

Antoine Pitrou

8 Dec 8 Dec

10:59 a.m.

On Fri, 8 Dec 2017 11:41:10 +0100 Erik Bray <erik.m.bray@gmail.com> wrote:

...

I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.

It should probably call PyNumber_AsSsize_t instead (which will call __index__, which is the right thing here).

...

I think this inconsistency should be fixed, unless there's some reason for it I'm not seeing.

That sounds reasonable to me. Regards Antoine.

Serhiy Storchaka

11:26 a.m.

08.12.17 12:41, Erik Bray пише:

...

IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case.

There is also a case of int subclasses. It is expected that PyLong_AsLong is atomic, and calling __int__ can lead to crashes or similar consequences.

...

I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.

PyLong_* functions provide an interface to PyLong objects. If they don't return the content of a PyLong object, how can it be retrieved? If you want to work with general numbers you should use PyNumber_* functions. In your particular case it is more reasonable to fallback to __index__ rather than __int__. Unlikely maxlen=4.2 makes sense.

...

Currently the following functions fall back on __int__ where available:

PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask

I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.

Antoine Pitrou

11:36 a.m.

On Fri, 8 Dec 2017 13:26:48 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

...
Currently the following functions fall back on __int__ where available:

PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask

I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.

I think that's a bad idea. There are widely-used int-like classes out there and it will break actual code:

...

...
...
import numpy as np x = np.int64(5) isinstance(x, int) False x.__int__() 5

Regards Antoine.

Serhiy Storchaka

12:30 p.m.

08.12.17 13:36, Antoine Pitrou пише:

...

On Fri, 8 Dec 2017 13:26:48 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
...
Currently the following functions fall back on __int__ where available:

PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask

I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.

I think that's a bad idea. There are widely-used int-like classes out there and it will break actual code:

...
...
...
import numpy as np x = np.int64(5) isinstance(x, int) False x.__int__() 5

NumPy integers implement __index__.

Antoine Pitrou

12:52 p.m.

On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

NumPy integers implement __index__.

That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow(). Regards Antoine.

Erik Bray

2:12 p.m.

On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
NumPy integers implement __index__.

That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow().

Right--pointing to __index__ basically implies that PyIndex_Check and subsequent PyNumber_AsSsize_t than there currently are. That I could agree with but then it becomes a question of where are those cases? And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). Add more PyNumber_ conversion functions?

Antoine Pitrou

6:11 p.m.

On Fri, 8 Dec 2017 15:12:30 +0100 Erik Bray <erik.m.bray@gmail.com> wrote:

...

On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Fri, 8 Dec 2017 14:30:00 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
NumPy integers implement __index__.

That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow().

Right--pointing to __index__ basically implies that PyIndex_Check and subsequent PyNumber_AsSsize_t than there currently are. That I could agree with but then it becomes a question of where are those cases? And what do with, e.g. interfaces like PyLong_AsLongAndOverflow(). Add more PyNumber_ conversion functions?

We would probably need more PyNumber_ conversion functions indeed. Regards Antoine.

Erik Bray

12:33 p.m.

On Fri, Dec 8, 2017 at 12:26 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

08.12.17 12:41, Erik Bray пише:

...
IIUC, it seems to be carry-over from Python 2's PyLong API, but I don't see an obvious reason for it. In every case there's an explicit PyLong_Check first anyways, so not calling __int__ doesn't help for the common case of exact int objects; adding the fallback costs nothing in that case.

There is also a case of int subclasses. It is expected that PyLong_AsLong is atomic, and calling __int__ can lead to crashes or similar consequences.

...
I ran into this because I was passing an object that implements __int__ to the maxlen argument to deque(). On Python 2 this used PyInt_AsSsize_t which does fall back to calling __int__, whereas PyLong_AsSsize_t does not.

PyLong_* functions provide an interface to PyLong objects. If they don't return the content of a PyLong object, how can it be retrieved? If you want to work with general numbers you should use PyNumber_* functions.

By "you " I assume you meant the generic "you". I'm not the one who broke things in this case :)

...

In your particular case it is more reasonable to fallback to __index__ rather than __int__. Unlikely maxlen=4.2 makes sense.

That's true, but in Python 2 that was possible:

...

...
...
deque([], maxlen=4.2) deque([], maxlen=4)

More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used. It was mainly motivated by slices, but it *could* be used in general cases where it definitely wouldn't make sense to accept a float (I wonder if maybe the real problem here is that floats can be coerced automatically to ints....) In other words, there are probably countless other cases in the stdlib at all where it "doesn't make sense" to accept a float, but that otherwise should accept objects that can be coerced to int without having to manually wrap those objects with an int(o) call.

...

...
Currently the following functions fall back on __int__ where available:

PyLong_AsLong PyLong_AsLongAndOverflow PyLong_AsLongLong PyLong_AsLongLongAndOverflow PyLong_AsUnsignedLongMask PyLong_AsUnsignedLongLongMask

I think this should be deprecated (and there should be an open issue for this). Calling __int__ is just a Python 2 legacy.

Okay, but then there are probably many cases where they should be replaced with PyNumber_ equivalents or else who knows how much code would break.

Ethan Furman

6:20 p.m.

On 12/08/2017 04:33 AM, Erik Bray wrote:

...

More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used.

__int__ is for coercion (float, fraction, etc) __index__ is for true integers Note that if __index__ is defined, __int__ should also be defined, and return the same value. https://docs.python.org/3/reference/datamodel.html#object.__index__ -- ~Ethan~

Erik Bray

28 Dec 28 Dec

10:10 a.m.

On Fri, Dec 8, 2017 at 7:20 PM, Ethan Furman <ethan@stoneleaf.us> wrote:

...

On 12/08/2017 04:33 AM, Erik Bray wrote:

...
More importantly not as many objects that coerce to int actually implement __index__. They probably *should* but there seems to be some confusion about how that's to be used.

__int__ is for coercion (float, fraction, etc)

__index__ is for true integers

Note that if __index__ is defined, __int__ should also be defined, and return the same value.

https://docs.python.org/3/reference/datamodel.html#object.__index__

This doesn't appear to be enforced, though I think maybe it should be. I'll also note that because of the changes I pointed out in my original post, it's now necessary for me to explicitly cast as int() objects that previously "just worked" when passed as arguments in some functions in itertools, collections, and other modules with C implementations. However, this is bad because if some broken code is passing floats to these arguments, they will be quietly cast to int and succeed, when really I should only be accepting objects that have __index__. There's no index() alternative to int(). I think changing all these functions to do the appropriate PyIndex_Check is a correct and valid fix, but I think it also stretches beyond the original purpose of __index__. I think that __index__ is relatively unknown, and perhaps there should be better documentation as to when and how it should be used over the better-known __int__.

Serhiy Storchaka

7:42 p.m.

28.12.17 12:10, Erik Bray пише:

...

There's no index() alternative to int().

operator.index()

Erik Bray

29 Dec 29 Dec

12:58 p.m.

On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

28.12.17 12:10, Erik Bray пише:

...
There's no index() alternative to int().

operator.index()

Okay, and it's broken. That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.

Nick Coghlan

2:43 p.m.

On 29 December 2017 at 22:58, Erik Bray <erik.m.bray@gmail.com> wrote:

...

On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...
28.12.17 12:10, Erik Bray пише:

...
There's no index() alternative to int().

operator.index()

Okay, and it's broken.

Broken in what way? It has a fairly extensive test suite in https://github.com/python/cpython/blob/master/Lib/test/test_index.py (and some additional indirect testing in test_slice.py, which assumes that it works as advertised).

...

That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.

The size_t, ssize_t and void pointer conversions should only accept true integers (so either no fallback, or fall back to `__index__`). The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`. I'm less sure about the conversion to double, but allowing that to be used on float objects without reporting a type error seems like a bug magnet to me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

2:56 p.m.

On Sat, 30 Dec 2017 00:43:07 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
That doesn't change my other point that some functions that could previously take non-int arguments can no longer--if we agree on that at least then I can set about making a bug report and fixing it.

The size_t, ssize_t and void pointer conversions should only accept true integers (so either no fallback, or fall back to `__index__`).

The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`.

That is the statu quo indeed... but the destination C type shouldn't be used as a criterion of which __dunder__ method is called. For example, let's assume I'm writing a piece of code that expects a pid number. The C type is `pid_t`, which presumably translates either to a C `int` or `long` (*). But it's not right to accept a float there... I think we really need a bunch a `PyIndex_AsXXX` functions (`PyIndex_AsLong`, etc.) to complement the current `PyLong_AsXXX` functions. That way, every `PyLong_AsXXX` function can continue calling `__int__` (if they ever did so), while `PyIndex_AsXXX` would only call `__index__`. (*) I got curious and went through the maze of type definitions on GNU/Linux. Which gives: #define __S32_TYPEDEF __signed__ int #define __PID_T_TYPE __S32_TYPE __STD_TYPE __PID_T_TYPE __pid_t; typedef __pid_t pid_t; Regards Antoine.

Richard Damon

3:23 p.m.

On 12/29/17 9:56 AM, Antoine Pitrou wrote:

...

(*) I got curious and went through the maze of type definitions on GNU/Linux. Which gives:

#define __S32_TYPEDEF __signed__ int #define __PID_T_TYPE __S32_TYPE __STD_TYPE __PID_T_TYPE __pid_t; typedef __pid_t pid_t;

Regards

Antoine.

One quick side note, just because it mapped to signed int on that Linux, doesn't mean it will always map to signed int on all Linuxes. One of the reasons for the multiple levels of indirection in types is to allow a given distribution to configure some parameter types to be 'optimal' for that implementation. -- Richard Damon

Serhiy Storchaka

4:04 p.m.

29.12.17 16:43, Nick Coghlan пише:

...

On 29 December 2017 at 22:58, Erik Bray <erik.m.bray@gmail.com> wrote:

...
Okay, and it's broken.

Broken in what way? It has a fairly extensive test suite in https://github.com/python/cpython/blob/master/Lib/test/test_index.py (and some additional indirect testing in test_slice.py, which assumes that it works as advertised).

Unfortunately the pure Python implementation doesn't work correctly in corner cases (https://bugs.python.org/issue18712). But in CPython the C implementation is used. Maybe Erik means something other.

...

The unsigned long and unsigned long long conversions should likely be consistent with their signed counterparts and allow lossy conversions via `__int__`.

There is a code that relies on the atomicity of these functions. Calling __int__ or __index__ will introduce vulnerabilities in the existing code.

Nick Coghlan

9 Dec 9 Dec

7:09 a.m.

On 8 December 2017 at 22:33, Erik Bray <erik.m.bray@gmail.com> wrote:

...

In other words, there are probably countless other cases in the stdlib at all where it "doesn't make sense" to accept a float, but that otherwise should accept objects that can be coerced to int without having to manually wrap those objects with an int(o) call.

Updating these to call __index__ is fine (since that sets the expectation of a *lossless* conversion to an integer), but updating them to call __int__ generally isn't (since that conversion is allowed to be lossy, which may cause surprising behaviour). Indexing & slicing were the primary original use case for that approach (hence the method name), but it's also used for sequence repetition, and other operations. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2473

Age (days ago)

2494

Last active (days ago)

List overview

Download

17 comments

6 participants

participants (6)

Antoine Pitrou
Erik Bray
Ethan Furman
Nick Coghlan
Richard Damon
Serhiy Storchaka

Is there a reason some of the PyLong_As* functions don't call an object's __int__?

tags

participants (6)

Is there a reason some of the PyLong_As* functions don't call an object's int?