New subject: #2522 numpy.diff fails on unsigned integers

13 Nov 2014


      On 2014-11-04 19:44, Charles R Harris wrote:
...
On Tue, Nov 4, 2014 at 11:19 AM, Sebastian  wrote:
...
On 2014-11-04 15:06, Todd wrote:
...
On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner 
...
mailto:sebix@sebix.at> wrote:
Hello,
I want to bring up Issue #2522 'numpy.diff fails on unsigned
integers
(Trac #1929)' [1], as it was resonsible for an error in one
of our
programs. Short explanation of the bug: np.diff performs a
subtraction
on the input array. If this is of type uint and the data
contains
falling data, it results in an artihmetic underflow.
...
...
...
np.diff(np.array([0,1,0], dtype=np.uint8))
array([ 1, 255], dtype=uint8)
@charris proposed either
- a note to the doc string and maybe an example to clarify
things
- or raise a warning
but with a discussion on the list.
I would like to start it now, as it is an error which is not
easily
detectable (no errors or warnings are thrown). In our case
the
type of a
data sequence, with only zeros and ones, had type f8 as also
every
other
one, has been changed to u4. As the programs looked for
values ==1 and
==-1, it broke silently.
In my opinion, a note in the docs is not enough and does not
help
if the
type changed or set after the program has been written.
I'd go for automatic upcasting of uints by default and an
option
to turn
it off, if this behavior is explicitly wanted. This wouldn't
be
correct
from the point of view of a programmer, but as most of the
users
have a
scientific background who excpect it 'to work', instead of
sth is
theoretically correct but not convenient. (I count myself to
the first
group)
When you say "automatic upcasting", that would be, for example
uint8
to int16? What about for uint64? There is no int128.
The upcast should go to the next bigger, otherwise it would again
result
in wrong values. uint64 we can't do that, so it has to stay.
Also, when you say "by default", is this only when an overflow is
detected, or always?
I don't know how I could detect an overflow in the diff-function.
In
subtraction it should be possible, but that's very deep in the
numpy-internals.
How would the option to turn it off be implemented? An argument
to
np.diff or some sort of global option?
I thought of a parameter upcast_int=True for the function.
Could check for non-decreasing sequence in the unsigned case. Note
that differences of signed integers can also overflow. One way to
check in general is to determine the expected sign using comparisons.
I think you mean a decreasing/non-increasing instead of non-decreasing
sequence?
It's also the same check as checking for a sorted sequence. But I
currently don't know how I could do that efficiently without np.diff in
Python, in Cython it should be easily possible.


np.gradient has the same problem:
...
...
...
np.random.seed(89)
d = np.random.randint(0,2,size=10).astype(np.uint8); d
array([1, 0, 0, 1, 0, 1, 1, 0, 0, 0], dtype=uint8)
np.diff(d)
array([255,   0,   1, 255,   1,   0, 255,   0,   0], dtype=uint8)
np.gradient(d)
array([ 255. ,  127.5,    0.5,    0. ,    0. ,    0.5,  127.5,  127.5,
          0. ,    0. ])
---
gpg --keyserver keys.gnupg.net --recv-key DC9B463B

Re: [Numpy-discussion] #2522 numpy.diff fails on unsigned integers

Sebastian

Todd

Jaime Fernández del Río

tags

participants (3)