<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 4, 2014 at 11:19 AM, Sebastian <span dir="ltr"><<a href="mailto:sebix@sebix.at" target="_blank">sebix@sebix.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 2014-11-04 15:06, Todd wrote:<br>

> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <<a href="mailto:sebix@sebix.at">sebix@sebix.at</a><br>

</span><div><div class="h5">> <mailto:<a href="mailto:sebix@sebix.at">sebix@sebix.at</a>>> wrote:<br>

><br>

>     Hello,<br>

><br>

>     I want to bring up Issue #2522 'numpy.diff fails on unsigned integers<br>

>     (Trac #1929)' [1], as it was resonsible for an error in one of our<br>

>     programs. Short explanation of the bug: np.diff performs a subtraction<br>

>     on the input array. If this is of type uint and the data contains<br>

>     falling data, it results in an artihmetic underflow.<br>

><br>

>     >>> np.diff(np.array([0,1,0], dtype=np.uint8))<br>

>     array([  1, 255], dtype=uint8)<br>

><br>

>     @charris proposed either<br>

>     - a note to the doc string and maybe an example to clarify things<br>

>     - or raise a warning<br>

>     but with a discussion on the list.<br>

><br>

>     I would like to start it now, as it is an error which is not easily<br>

>     detectable (no errors or warnings are thrown). In our case the<br>

>     type of a<br>

>     data sequence, with only zeros and ones, had type f8 as also every<br>

>     other<br>

>     one, has been changed to u4. As the programs looked for values ==1 and<br>

>     ==-1, it broke silently.<br>

>     In my opinion, a note in the docs is not enough and does not help<br>

>     if the<br>

>     type changed or set after the program has been written.<br>

>     I'd go for automatic upcasting of uints by default and an option<br>

>     to turn<br>

>     it off, if this behavior is explicitly wanted. This wouldn't be<br>

>     correct<br>

>     from the point of view of a programmer, but as most of the users<br>

>     have a<br>

>     scientific background who excpect it 'to work', instead of sth is<br>

>     theoretically correct but not convenient. (I count myself to the first<br>

>     group)<br>

><br>

><br>

><br>

> When you say "automatic upcasting", that would be, for example uint8<br>

> to int16?  What about for uint64?  There is no int128.<br>

</div></div>The upcast should go to the next bigger, otherwise it would again result<br>

in wrong values. uint64 we can't do that, so it has to stay.<br>

<span class="">> Also, when you say "by default", is this only when an overflow is<br>

> detected, or always?<br>

</span>I don't know how I could detect an overflow in the diff-function. In<br>

subtraction it should be possible, but that's very deep in the<br>

numpy-internals.<br>

<span class="">> How would the option to turn it off be implemented?  An argument to<br>

> np.diff or some sort of global option?<br>

</span>I thought of a parameter upcast_int=True for the function.<br></blockquote><div><br></div><div>Could check for non-decreasing sequence in the unsigned case. Note that differences of signed integers can also overflow. One way to check in general is to determine the expected sign using comparisons.<br><br></div><div>Chuck<br></div></div></div></div>