abs for max negative integers  desired behavior?
Hi, I recently ran into this: In [68]: arr = np.array(128, np.int8) In [69]: arr Out[69]: array(128, dtype=int8) In [70]: np.abs(arr) Out[70]: 128 Of course, I can see why this happens, but it is still surprising, and it seems to me that it would be a confusing source of bugs, because of course it only happens for the maximum negative integer. One particular confusing result was: In [71]: np.allclose(arr, arr) Out[71]: False I wanted to ask whether this is the desired behavior, and whether it might be worth planning a change in the long term? Best, Matthew
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious. <snip> Chuck
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
Ah  sorry  I think I missed the previous discussion. The conversion to unsigned seemed like an great improvement. Are you saying that the cost down the line is an increase in memory use for arrays which are then combined with a signed type? That seems like a reasonable tradeoff to me. Was that the main objection? See you, Matthew
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
octave3.2.3:1> a = int8([128, 127]) a = 128 127 octave3.2.3:2> abs(a) ans = 127 127 Matlab is the same. That is curious... See you, Matthew
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
Well, it _is_ only off by 0.78%. That should be good enough for government work, right? Ben Root
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
Well, it _is_ only off by 0.78%. That should be good enough for government work, right?
So, which government is using numpy, only off by 200% Josef
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
Well, it _is_ only off by 0.78%. That should be good enough for government work, right?
So, which government is using numpy, only off by 200%
Not government, but maybe LockheedMartin when they were doing that Mars probe? "What? It was negative? Well, that explains why it went down, not up!" ::rimshot:: Thank you folks! I will be here all week! Ben Root
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
In C, abs(INT_MIN) is undefined, so both 127 and 128 work :) David
From a pure user perspective, I would not expect the abs function to return a negative number. Returning +127 plus a warning the first time that happens seems to me a good compromise. Armando On 12/10/2011 09:46, David Cournapeau wrote:
In [68]: arr = np.array(128, np.int8)
In [69]: arr Out[69]: array(128, dtype=int8)
In [70]: np.abs(arr) Out[70]: 128
This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
In C, abs(INT_MIN) is undefined, so both 127 and 128 work :)
From a pure user perspective, I would not expect the abs function to return a negative number. Returning +127 plus a warning the first time that happens seems to me a good compromise.
I guess the question is what's the common context to use small integers in the first place. If it is to save memory, then upcasting may not be the best solution. I may be wrong, but if you decide to use those types in the first place, you need to know about overflows. Abs is just one of them (dividing by 1 is another, although this one actually raises an exception). Detecting it may be costly, but this would need benchmarking. That being said, without context, I don't find 127 a better solution than 128. cheers, David
From a pure user perspective, I would not expect the abs function to return a negative number. Returning +127 plus a warning the first time that happens seems to me a good compromise. I guess the question is what's the common context to use small integers in the first place. If it is to save memory, then upcasting may not be the best solution. I may be wrong, but if you decide to use
Detecting it may be costly, but this would need benchmarking.
That being said, without context, I don't find 127 a better solution than 128.
Well that choice is just based on getting the closest positive number to the true value (128). The context can be anything, for instance you could be using a look up table based on the result of an integer operation ... In terms of cost, it would imply to evaluate the cost of something like: a = abs(x); if (a < 0) {a = MIN_INT;} return a; Basically is the cost of the evaluation of an if condition since the content of the block (with or without warning) will bot be executed very often. I find that even raising an exception is better than returning a negative number as result of the abs function. Anyways, I have just tested numpy.array([129], dtype=numpy.int8) and I have got the array as [127] when I was expecting a sort of unsafe cast error/warning. I guess I will just stop here. In any case, I am very grateful to the mailing list and the original poster for exposing this behavior so that I can keep it in mind. Best regards, Armando
Well that choice is just based on getting the closest positive number to the true value (128). The context can be anything, for instance you could be using a look up table based on the result of an integer operation ...
In terms of cost, it would imply to evaluate the cost of something like:
a = abs(x); if (a < 0) {a = MIN_INT;} return a;
Yes, this is costly: it adds a branch to a trivial operation. I did some preliminary benchmarks (would need confirmation when I have more than one minute to spend on this): int8, 2**16 long array. Before check: 16 us. After check: 92 us. 56 times slower int8, 2**24 long array. Before check: 20ms. After check: 30ms. 30 % slower. There is also the issue of signaling the error in the ufunc machinery. I forgot whether this is possible at that level. cheers, David
Yes, this is costly: it adds a branch to a trivial operation. I did some preliminary benchmarks (would need confirmation when I have more than one minute to spend on this):
int8, 2**16 long array. Before check: 16 us. After check: 92 us. 56 times slower int8, 2**24 long array. Before check: 20ms. After check: 30ms. 30 % slower.
There is also the issue of signaling the error in the ufunc machinery. I forgot whether this is possible at that level.
I suppose that returning the equivalent uint type would be of zero cost though? I don't think the problem should be relegated to 'people should know about this' because this a problem for any signed integer type, and it can lead to nasty errors which people are unlikely to test for. See you, Matthew
What about a parameter that allow to select the option the user want? it would select between uint, upcasted_int, MAX and +MAX. This way, at least it will be documented and user who care will have the choose. Personally, when the option is available, I would prefer the safe version, uint, but I understand that is not all people position. Frédéric Bastien On Sat, Oct 15, 2011 at 3:00 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
I suppose that returning the equivalent uint type would be of zero cost though?
I don't think the problem should be relegated to 'people should know about this' because this a problem for any signed integer type, and it can lead to nasty errors which people are unlikely to test for.
What about a parameter that allow to select the option the user want? it would select between uint, upcasted_int, MAX and +MAX. This way, at least it will be documented and user who care will have the choose.
Personally, when the option is available, I would prefer the safe version, uint, but I understand that is not all people position.
Would there be any objection to the proposal to add a keyword to abs: always_positive=False or similar, which would have the effect, when True, of returning uints from an int? Best, Matthew
IIRC, matlab was said to return +127 as abs(128), which, if true, is quite curious.
I just checked and this is indeed the case in Matlab 7.10.0 R2010a:
abs(int8(128))
ans = 127 Cheers,  Daniele
