
I'm on NumPy 1.10.4 (mkl).
np.uint(3) // 2 # 1.0 3 // 2 # 1
Is this behavior expected? It's certainly not desired from my perspective. If this is not a bug, could someone explain the rationale to me. Thanks.

On 2016/04/04 9:23 AM, T J wrote:
I agree that it's almost always undesirable; one would reasonably expect some sort of int. Here's what I think is going on: The odd behavior occurs only with np.uint, which is np.uint64, and when the denominator is a signed int. The problem is that if the denominator is negative, the result will be negative, so it can't have the same type as the first numerator. Furthermore, if the denominator is -1, the result will be minus the numerator, and that can't be represented by np.uint or np.int. Therefore the result is returned as np.float64. The promotion rules are based on what *could* happen in an operation, not on what *is* happening in a given instance. Eric

Thanks Eric. Also relevant: https://github.com/numba/numba/issues/909 Looks like Numba has found a way to avoid this edge case. On Monday, April 4, 2016, Eric Firing <efiring@hawaii.edu> wrote:

This kind of issue (see also https://github.com/numpy/numpy/issues/3511) has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, after all even adding two (sized) ints may give a result that does not fit in the same size, but numpy does not upcast everything there: In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1) Out[17]: -2 In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1)) Out[18]: numpy.int32 I'd think that overflowing operations should just overflow (and possibly raise a warning via the seterr mechanism), but their possibility should not be an argument for modifying the output type. Antony 2016-04-12 17:57 GMT-07:00 T J <tjhnson@gmail.com>:

Whatever the C rules are (which I don't know off the top of my head, but I guess it must be one of uint64 or int64). It's not as if conversion to float64 was lossless: In [38]: 2**63 - (np.int64(2**62-1) + np.uint64(2**62-1)) Out[38]: 0.0 Note that the result of (np.int64(2**62-1) + np.uint64(2**62-1)) would actually fit in an int64 (or an uint64), so arguably the conversion to float makes things worse. Antony 2016-04-12 19:56 GMT-07:00 Nathaniel Smith <njs@pobox.com>:

On Wed, Apr 13, 2016 at 3:17 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, It shouldn't because that's not the rule that numpy follows. The range of the result is never considered. Both *inputs* are cast to the same type that can represent the full range of either input type (for that matter, the actual *values* of the inputs are also never considered). In the case of uint64 and int64, there is no really good common type (the integer hierarchy has to top out somewhere), but float64 merely loses resolution rather than cutting off half of the range of uint64. -- Robert Kern

On Apr 13, 2016 9:08 AM, "Robert Kern" <robert.kern@gmail.com> wrote:
Let me play devil's advocate for a moment, since I've just been playing out this debate in my own mind and you've done a good job of articulating the case for that side :-). The counter argument is: it doesn't really matter about having a common type or not; what matters is whether the operation can be defined sensibly. For uint64 <op> int64, this is actually not a problem: we provide 2s complement signed ints, so uint64 and int64 are both integers-mod-2**64, just choosing different representatives for the equivalence classes in the upper half of the ring. In particular, the uint64 and int64 ranges are isomorphic to each other. or with less jargon: casting between uint64 and int64 commutes with all arithmetic operations, so you actually get the same result performing the operation in infinite precision and then casting to uint64 or int64, or casting both operations to uint64 or int64 and then casting the result to uint64 or int64. Basically the operations are totally well-defined even if we stick within integers, and the casting is just another form of integer wraparound; we're already happy to tolerate wraparound for int64 <op> int64 or uint64 <op> uint64, so it's not entirely clear why we go all the way to float to avoid it for uint64 <op> int64. [On second thought... I'm actually not 100% sure that the all-operations-commute-with-casting thing is true in the case of //'s rounding behavior. I would have to squint a lot to figure that out. I guess comparison operations are another exception -- a < b != np.uint64(a) < np.uint64(b) in general.] -n

On Wed, Apr 13, 2016 at 2:48 PM, Nathaniel Smith <njs@pobox.com> wrote:
I looked this up once, `C` returns unsigned in the scalar case when both operands have the same width. See Usual Arithmetic Conversions <https://www.securecoding.cert.org/confluence/display/c/INT02-C.+Understand+i...>. I think that is not a bad choice, but there is the back compatibility problem, plus it is a bit exceptional. Chuck

On Wed, 13 Apr 2016 15:49:15 -0600 Charles R Harris <charlesr.harris@gmail.com> wrote:
It may be a worse choice for Python. In the original use case (indexing with an integer), losing the sign is a bug since negative indices have a well-defined meaning in Python. This is a far more likely issue than magnitude loss on a 64-bit integer. In Numba, we decided that combining signed and unsigned would return signed (see http://numba.pydata.org/numba-doc/dev/proposals/integer-typing.html#proposal...). Regards Antoine.

On 2016/04/04 9:23 AM, T J wrote:
I agree that it's almost always undesirable; one would reasonably expect some sort of int. Here's what I think is going on: The odd behavior occurs only with np.uint, which is np.uint64, and when the denominator is a signed int. The problem is that if the denominator is negative, the result will be negative, so it can't have the same type as the first numerator. Furthermore, if the denominator is -1, the result will be minus the numerator, and that can't be represented by np.uint or np.int. Therefore the result is returned as np.float64. The promotion rules are based on what *could* happen in an operation, not on what *is* happening in a given instance. Eric

Thanks Eric. Also relevant: https://github.com/numba/numba/issues/909 Looks like Numba has found a way to avoid this edge case. On Monday, April 4, 2016, Eric Firing <efiring@hawaii.edu> wrote:

This kind of issue (see also https://github.com/numpy/numpy/issues/3511) has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, after all even adding two (sized) ints may give a result that does not fit in the same size, but numpy does not upcast everything there: In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1) Out[17]: -2 In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1)) Out[18]: numpy.int32 I'd think that overflowing operations should just overflow (and possibly raise a warning via the seterr mechanism), but their possibility should not be an argument for modifying the output type. Antony 2016-04-12 17:57 GMT-07:00 T J <tjhnson@gmail.com>:

Whatever the C rules are (which I don't know off the top of my head, but I guess it must be one of uint64 or int64). It's not as if conversion to float64 was lossless: In [38]: 2**63 - (np.int64(2**62-1) + np.uint64(2**62-1)) Out[38]: 0.0 Note that the result of (np.int64(2**62-1) + np.uint64(2**62-1)) would actually fit in an int64 (or an uint64), so arguably the conversion to float makes things worse. Antony 2016-04-12 19:56 GMT-07:00 Nathaniel Smith <njs@pobox.com>:

On Wed, Apr 13, 2016 at 3:17 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, It shouldn't because that's not the rule that numpy follows. The range of the result is never considered. Both *inputs* are cast to the same type that can represent the full range of either input type (for that matter, the actual *values* of the inputs are also never considered). In the case of uint64 and int64, there is no really good common type (the integer hierarchy has to top out somewhere), but float64 merely loses resolution rather than cutting off half of the range of uint64. -- Robert Kern

On Apr 13, 2016 9:08 AM, "Robert Kern" <robert.kern@gmail.com> wrote:
Let me play devil's advocate for a moment, since I've just been playing out this debate in my own mind and you've done a good job of articulating the case for that side :-). The counter argument is: it doesn't really matter about having a common type or not; what matters is whether the operation can be defined sensibly. For uint64 <op> int64, this is actually not a problem: we provide 2s complement signed ints, so uint64 and int64 are both integers-mod-2**64, just choosing different representatives for the equivalence classes in the upper half of the ring. In particular, the uint64 and int64 ranges are isomorphic to each other. or with less jargon: casting between uint64 and int64 commutes with all arithmetic operations, so you actually get the same result performing the operation in infinite precision and then casting to uint64 or int64, or casting both operations to uint64 or int64 and then casting the result to uint64 or int64. Basically the operations are totally well-defined even if we stick within integers, and the casting is just another form of integer wraparound; we're already happy to tolerate wraparound for int64 <op> int64 or uint64 <op> uint64, so it's not entirely clear why we go all the way to float to avoid it for uint64 <op> int64. [On second thought... I'm actually not 100% sure that the all-operations-commute-with-casting thing is true in the case of //'s rounding behavior. I would have to squint a lot to figure that out. I guess comparison operations are another exception -- a < b != np.uint64(a) < np.uint64(b) in general.] -n

On Wed, Apr 13, 2016 at 2:48 PM, Nathaniel Smith <njs@pobox.com> wrote:
I looked this up once, `C` returns unsigned in the scalar case when both operands have the same width. See Usual Arithmetic Conversions <https://www.securecoding.cert.org/confluence/display/c/INT02-C.+Understand+i...>. I think that is not a bad choice, but there is the back compatibility problem, plus it is a bit exceptional. Chuck

On Wed, 13 Apr 2016 15:49:15 -0600 Charles R Harris <charlesr.harris@gmail.com> wrote:
It may be a worse choice for Python. In the original use case (indexing with an integer), losing the sign is a bug since negative indices have a well-defined meaning in Python. This is a far more likely issue than magnitude loss on a 64-bit integer. In Numba, we decided that combining signed and unsigned would return signed (see http://numba.pydata.org/numba-doc/dev/proposals/integer-typing.html#proposal...). Regards Antoine.
participants (7)
-
Antoine Pitrou
-
Antony Lee
-
Charles R Harris
-
Eric Firing
-
Nathaniel Smith
-
Robert Kern
-
T J