[Numpy-discussion] #2522 numpy.diff fails on unsigned integers

Todd toddrjen at gmail.com
Thu Nov 13 08:57:03 EST 2014


On Thu, Nov 13, 2014 at 8:10 AM, Sebastian <sebix at sebix.at> wrote:

> On 2014-11-04 19:44, Charles R Harris wrote:
> > On Tue, Nov 4, 2014 at 11:19 AM, Sebastian <sebix at sebix.at> wrote:
> >
> >> On 2014-11-04 15:06, Todd wrote:
> >>> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <sebix at sebix.at
> >>
> >>> <mailto:sebix at sebix.at>> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I want to bring up Issue #2522 'numpy.diff fails on unsigned
> >> integers
> >>> (Trac #1929)' [1], as it was resonsible for an error in one
> >> of our
> >>> programs. Short explanation of the bug: np.diff performs a
> >> subtraction
> >>> on the input array. If this is of type uint and the data
> >> contains
> >>> falling data, it results in an artihmetic underflow.
> >>>
> >>> >>> np.diff(np.array([0,1,0], dtype=np.uint8))
> >>> array([ 1, 255], dtype=uint8)
> >>>
> >>> @charris proposed either
> >>> - a note to the doc string and maybe an example to clarify
> >> things
> >>> - or raise a warning
> >>> but with a discussion on the list.
> >>>
> >>> I would like to start it now, as it is an error which is not
> >> easily
> >>> detectable (no errors or warnings are thrown). In our case
> >> the
> >>> type of a
> >>> data sequence, with only zeros and ones, had type f8 as also
> >> every
> >>> other
> >>> one, has been changed to u4. As the programs looked for
> >> values ==1 and
> >>> ==-1, it broke silently.
> >>> In my opinion, a note in the docs is not enough and does not
> >> help
> >>> if the
> >>> type changed or set after the program has been written.
> >>> I'd go for automatic upcasting of uints by default and an
> >> option
> >>> to turn
> >>> it off, if this behavior is explicitly wanted. This wouldn't
> >> be
> >>> correct
> >>> from the point of view of a programmer, but as most of the
> >> users
> >>> have a
> >>> scientific background who excpect it 'to work', instead of
> >> sth is
> >>> theoretically correct but not convenient. (I count myself to
> >> the first
> >>> group)
> >>>
> >>>
> >>>
> >>> When you say "automatic upcasting", that would be, for example
> >> uint8
> >>> to int16? What about for uint64? There is no int128.
> >> The upcast should go to the next bigger, otherwise it would again
> >> result
> >> in wrong values. uint64 we can't do that, so it has to stay.
> >>> Also, when you say "by default", is this only when an overflow is
> >>> detected, or always?
> >> I don't know how I could detect an overflow in the diff-function.
> >> In
> >> subtraction it should be possible, but that's very deep in the
> >> numpy-internals.
> >>> How would the option to turn it off be implemented? An argument
> >> to
> >>> np.diff or some sort of global option?
> >> I thought of a parameter upcast_int=True for the function.
> >
> > Could check for non-decreasing sequence in the unsigned case. Note
> > that differences of signed integers can also overflow. One way to
> > check in general is to determine the expected sign using comparisons.
>
> I think you mean a decreasing/non-increasing instead of non-decreasing
> sequence?
> It's also the same check as checking for a sorted sequence. But I
> currently don't know how I could do that efficiently without np.diff in
> Python, in Cython it should be easily possible.
>
>
> np.gradient has the same problem:
> >>> np.random.seed(89)
> >>> d = np.random.randint(0,2,size=10).astype(np.uint8); d
> array([1, 0, 0, 1, 0, 1, 1, 0, 0, 0], dtype=uint8)
> >>> np.diff(d)
> array([255,   0,   1, 255,   1,   0, 255,   0,   0], dtype=uint8)
> >>> np.gradient(d)
> array([ 255. ,  127.5,    0.5,    0. ,    0. ,    0.5,  127.5,  127.5,
>           0. ,    0. ])
>
>

Consider it is generally an error, might it be good to have a general
warning built into the int dtypes regarding overflow errors?  That warning
can then be caught by the diff function.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141113/904d0f83/attachment.html>


More information about the NumPy-Discussion mailing list