[Numpy-discussion] #2522 numpy.diff fails on unsigned integers

Sebastian sebix at sebix.at
Thu Nov 13 02:10:52 EST 2014


On 2014-11-04 19:44, Charles R Harris wrote:
> On Tue, Nov 4, 2014 at 11:19 AM, Sebastian <sebix at sebix.at> wrote:
>
>> On 2014-11-04 15:06, Todd wrote:
>>> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <sebix at sebix.at
>>
>>> <mailto:sebix at sebix.at>> wrote:
>>>
>>> Hello,
>>>
>>> I want to bring up Issue #2522 'numpy.diff fails on unsigned
>> integers
>>> (Trac #1929)' [1], as it was resonsible for an error in one
>> of our
>>> programs. Short explanation of the bug: np.diff performs a
>> subtraction
>>> on the input array. If this is of type uint and the data
>> contains
>>> falling data, it results in an artihmetic underflow.
>>>
>>> >>> np.diff(np.array([0,1,0], dtype=np.uint8))
>>> array([ 1, 255], dtype=uint8)
>>>
>>> @charris proposed either
>>> - a note to the doc string and maybe an example to clarify
>> things
>>> - or raise a warning
>>> but with a discussion on the list.
>>>
>>> I would like to start it now, as it is an error which is not
>> easily
>>> detectable (no errors or warnings are thrown). In our case
>> the
>>> type of a
>>> data sequence, with only zeros and ones, had type f8 as also
>> every
>>> other
>>> one, has been changed to u4. As the programs looked for
>> values ==1 and
>>> ==-1, it broke silently.
>>> In my opinion, a note in the docs is not enough and does not
>> help
>>> if the
>>> type changed or set after the program has been written.
>>> I'd go for automatic upcasting of uints by default and an
>> option
>>> to turn
>>> it off, if this behavior is explicitly wanted. This wouldn't
>> be
>>> correct
>>> from the point of view of a programmer, but as most of the
>> users
>>> have a
>>> scientific background who excpect it 'to work', instead of
>> sth is
>>> theoretically correct but not convenient. (I count myself to
>> the first
>>> group)
>>>
>>>
>>>
>>> When you say "automatic upcasting", that would be, for example
>> uint8
>>> to int16? What about for uint64? There is no int128.
>> The upcast should go to the next bigger, otherwise it would again
>> result
>> in wrong values. uint64 we can't do that, so it has to stay.
>>> Also, when you say "by default", is this only when an overflow is
>>> detected, or always?
>> I don't know how I could detect an overflow in the diff-function.
>> In
>> subtraction it should be possible, but that's very deep in the
>> numpy-internals.
>>> How would the option to turn it off be implemented? An argument
>> to
>>> np.diff or some sort of global option?
>> I thought of a parameter upcast_int=True for the function.
>
> Could check for non-decreasing sequence in the unsigned case. Note
> that differences of signed integers can also overflow. One way to
> check in general is to determine the expected sign using comparisons.

I think you mean a decreasing/non-increasing instead of non-decreasing
sequence?
It's also the same check as checking for a sorted sequence. But I
currently don't know how I could do that efficiently without np.diff in
Python, in Cython it should be easily possible.


np.gradient has the same problem:
>>> np.random.seed(89)
>>> d = np.random.randint(0,2,size=10).astype(np.uint8); d
array([1, 0, 0, 1, 0, 1, 1, 0, 0, 0], dtype=uint8)
>>> np.diff(d)
array([255,   0,   1, 255,   1,   0, 255,   0,   0], dtype=uint8)
>>> np.gradient(d)
array([ 255. ,  127.5,    0.5,    0. ,    0. ,    0.5,  127.5,  127.5,
          0. ,    0. ])

---
gpg --keyserver keys.gnupg.net --recv-key DC9B463B



More information about the NumPy-Discussion mailing list