<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 13, 2014 at 8:10 AM, Sebastian <span dir="ltr"><<a href="mailto:sebix@sebix.at" target="_blank">sebix@sebix.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2014-11-04 19:44, Charles R Harris wrote:<br>

> On Tue, Nov 4, 2014 at 11:19 AM, Sebastian <<a href="mailto:sebix@sebix.at">sebix@sebix.at</a>> wrote:<br>

><br>

>> On 2014-11-04 15:06, Todd wrote:<br>

>>> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <<a href="mailto:sebix@sebix.at">sebix@sebix.at</a><br>

>><br>

>>> <mailto:<a href="mailto:sebix@sebix.at">sebix@sebix.at</a>>> wrote:<br>

>>><br>

>>> Hello,<br>

>>><br>

>>> I want to bring up Issue #2522 'numpy.diff fails on unsigned<br>

>> integers<br>

>>> (Trac #1929)' [1], as it was resonsible for an error in one<br>

>> of our<br>

>>> programs. Short explanation of the bug: np.diff performs a<br>

>> subtraction<br>

>>> on the input array. If this is of type uint and the data<br>

>> contains<br>

>>> falling data, it results in an artihmetic underflow.<br>

>>><br>

>>> >>> np.diff(np.array([0,1,0], dtype=np.uint8))<br>

>>> array([ 1, 255], dtype=uint8)<br>

>>><br>

>>> @charris proposed either<br>

>>> - a note to the doc string and maybe an example to clarify<br>

>> things<br>

>>> - or raise a warning<br>

>>> but with a discussion on the list.<br>

>>><br>

>>> I would like to start it now, as it is an error which is not<br>

>> easily<br>

>>> detectable (no errors or warnings are thrown). In our case<br>

>> the<br>

>>> type of a<br>

>>> data sequence, with only zeros and ones, had type f8 as also<br>

>> every<br>

>>> other<br>

>>> one, has been changed to u4. As the programs looked for<br>

>> values ==1 and<br>

>>> ==-1, it broke silently.<br>

>>> In my opinion, a note in the docs is not enough and does not<br>

>> help<br>

>>> if the<br>

>>> type changed or set after the program has been written.<br>

>>> I'd go for automatic upcasting of uints by default and an<br>

>> option<br>

>>> to turn<br>

>>> it off, if this behavior is explicitly wanted. This wouldn't<br>

>> be<br>

>>> correct<br>

>>> from the point of view of a programmer, but as most of the<br>

>> users<br>

>>> have a<br>

>>> scientific background who excpect it 'to work', instead of<br>

>> sth is<br>

>>> theoretically correct but not convenient. (I count myself to<br>

>> the first<br>

>>> group)<br>

>>><br>

>>><br>

>>><br>

>>> When you say "automatic upcasting", that would be, for example<br>

>> uint8<br>

>>> to int16? What about for uint64? There is no int128.<br>

>> The upcast should go to the next bigger, otherwise it would again<br>

>> result<br>

>> in wrong values. uint64 we can't do that, so it has to stay.<br>

>>> Also, when you say "by default", is this only when an overflow is<br>

>>> detected, or always?<br>

>> I don't know how I could detect an overflow in the diff-function.<br>

>> In<br>

>> subtraction it should be possible, but that's very deep in the<br>

>> numpy-internals.<br>

>>> How would the option to turn it off be implemented? An argument<br>

>> to<br>

>>> np.diff or some sort of global option?<br>

>> I thought of a parameter upcast_int=True for the function.<br>

><br>

> Could check for non-decreasing sequence in the unsigned case. Note<br>

> that differences of signed integers can also overflow. One way to<br>

> check in general is to determine the expected sign using comparisons.<br>

<br>

I think you mean a decreasing/non-increasing instead of non-decreasing<br>

sequence?<br>

It's also the same check as checking for a sorted sequence. But I<br>

currently don't know how I could do that efficiently without np.diff in<br>

Python, in Cython it should be easily possible.<br>

<br>

<br>

np.gradient has the same problem:<br>

>>> np.random.seed(89)<br>

>>> d = np.random.randint(0,2,size=10).astype(np.uint8); d<br>

array([1, 0, 0, 1, 0, 1, 1, 0, 0, 0], dtype=uint8)<br>

>>> np.diff(d)<br>

array([255,   0,   1, 255,   1,   0, 255,   0,   0], dtype=uint8)<br>

>>> np.gradient(d)<br>

array([ 255. ,  127.5,    0.5,    0. ,    0. ,    0.5,  127.5,  127.5,<br>

          0. ,    0. ])<br>

<br></blockquote></div><br><br></div><div class="gmail_extra">Consider it is generally an error, might it be good to have a general warning built into the int dtypes regarding overflow errors?  That warning can then be caught by the diff function.<br><br></div></div>