<div dir="ltr">I agree with Gael.Â <div><br></div><div>I would also add, it also seems to be solving a problem that doesn't seem to exist in the wild. There's never a report of hitting this issue, right? Users using sum() or other raw operations are likely familiar with numpy, and more ready/capable to debug their error.</div><div><br></div><div>Adding complexity can potentially introduce confusion for all users, including relatively numpy-naive ones. Then we potentially have to support new questions, and I'm not sure there's any practical benefit.</div><div><br></div><div>Ben</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 8:55 AM, Gael Varoquaux <span dir="ltr"><<a href="mailto:gael.varoquaux@normalesup.org" target="_blank">gael.varoquaux@normalesup.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p dir="ltr">I think that this is a very bad idea. First because it's "magic": changing things behind the back of the user. People will be surprised and it will lead to bugs in their code.Â  Second because it is a loss of semantics. Something like a mask or a label image is stored as integers for a good reason.</p>

<p dir="ltr">The right solution to the problem is to teach people to use float data when relevant, but not to force this decision in them. </p>

<p dir="ltr">GaÃ«l</p>

<p dir="ltr">Sent from my phone. Please forgive brevity and mis spelling</p>

<div class="gmail_quote"><span class="">On Jul 6, 2015, at 17:33, Matthew Brett <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<pre><span class="">Hi,<br><br>I wanted to ask y'all about an API change that I want to make to nibabel.<br><br>In summary, I want to default to returning floating point arrays from<br>nibabel images.<br><br>Problem - different returned data types from img.get_data()<br></span><hr><div><div class="h5"><br><br>At the moment, if you do this:<br><br>img = nib.load('my_image.nii')<br>data = img.get_data()<br><br>Then the data type (dtype) of the returned data array depends on the<br>values in the header of `my_image.nii`.   Specifically, if the raw<br>on-disk data type is 'np.int16' (it is often is) and the header<br>scalefactor values are default (1 for slope, 0 for intercept) then you<br>will get back an array of the on disk data type - here - np.int16.<br><br>This is very efficient on memory, but it it's a real trap unless you careful.<br><br>For example, let's say you had a pipeline where you did this:<br><br>sum = img.get_data().sum()<br><br>That would work fine most of the time, when the data on disk

is floating point, or the scalefactors are not default (1, 0).   Then one day, you get an image with int16 data type on disk and 1, 0 scalefactors, and your `sum` calculation silently overflows.    I ran into this when teaching - I had to cast some image arrays to floating point to get sensible answers. Solution ----------- I think that the default behavior of nibabel should be to do the thing least likely to trip you up by accident, so - I think in due course, nibabel should always return a floating point array from `get_data()` by default. I propose to add a keyword-only argument to `get_data()` - `to_float`, as in: data = img.get_data(to_float=False)  # The current default behavior data = img.get_data(to_float=True)  # Integer arrays automatically cast to float64 For this cycle (the nibabel 2.0 series), I propose to raise a warning if you don't pass in an explicit True or False, warning that the </div></div>def!

 ault

behavior for nibabel 3.0 will change from `to_float=False` to<span class=""><br>`to_float=True`.<br><br>The other, more fancy ways of getting the image data would continue as<br>they are, such as:<br><br>data = np.array(img.dataobj)<br>data = img.dataobj[:]<br><br>These will both return ints or floats depending on the raw data dtype<br>and the scalefactors.  This is on the basis that people using these<br>will be more advanced and so therefore more likely to want memory<br>efficiency at the expense of having to be careful about the returned<br>data dtype.<br><br>Does this seem reasonable to y'all?    Thoughts, suggestions?<br><br>Cheers,<br><br>Matthew<br><hr><br></span><span class="">Neuroimaging mailing list<br><a href="mailto:Neuroimaging@python.org" target="_blank">Neuroimaging@python.org</a><br><a href="https://mail.python.org/mailman/listinfo/neuroimaging" target="_blank">https://mail.python.org/mailman/listinfo/neuroimaging</a><br></span></pre></blockquote></div></div><br>_______________________________________________<br>

Neuroimaging mailing list<br>

<a href="mailto:Neuroimaging@python.org">Neuroimaging@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/neuroimaging" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/neuroimaging</a><br>

<br></blockquote></div><br></div>