<div dir="ltr">I agree with Gael. <div><br></div><div>I would also add, it also seems to be solving a problem that doesn't seem to exist in the wild. There's never a report of hitting this issue, right? Users using sum() or other raw operations are likely familiar with numpy, and more ready/capable to debug their error.</div><div><br></div><div>Adding complexity can potentially introduce confusion for all users, including relatively numpy-naive ones. Then we potentially have to support new questions, and I'm not sure there's any practical benefit.</div><div><br></div><div>Ben</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 8:55 AM, Gael Varoquaux <span dir="ltr"><<a href="mailto:gael.varoquaux@normalesup.org" target="_blank">gael.varoquaux@normalesup.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><p dir="ltr">I think that this is a very bad idea. First because it's "magic": changing things behind the back of the user. People will be surprised and it will lead to bugs in their code. Second because it is a loss of semantics. Something like a mask or a label image is stored as integers for a good reason.</p>
<p dir="ltr">The right solution to the problem is to teach people to use float data when relevant, but not to force this decision in them. </p>
<p dir="ltr">Gaël</p>
<p dir="ltr">Sent from my phone. Please forgive brevity and mis spelling</p>
<div class="gmail_quote"><span class="">On Jul 6, 2015, at 17:33, Matthew Brett <<a href="mailto:matthew.brett@gmail.com" target="_blank">matthew.brett@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<pre><span class="">Hi,<br><br>I wanted to ask y'all about an API change that I want to make to nibabel.<br><br>In summary, I want to default to returning floating point arrays from<br>nibabel images.<br><br>Problem - different returned data types from img.get_data()<br></span><hr><div><div class="h5"><br><br>At the moment, if you do this:<br><br>img = nib.load('my_image.nii')<br>data = img.get_data()<br><br>Then the data type (dtype) of the returned data array depends on the<br>values in the header of `my_image.nii`. Specifically, if the raw<br>on-disk data type is 'np.int16' (it is often is) and the header<br>scalefactor values are default (1 for slope, 0 for intercept) then you<br>will get back an array of the on disk data type - here - np.int16.<br><br>This is very efficient on memory, but it it's a real trap unless you careful.<br><br>For example, let's say you had a pipeline where you did this:<br><br>sum = img.get_data().sum()<br><br>That would work fine most of the time, when the data on disk
is<br>floating point, or the scalefactors are not default (1, 0). Then one<br>day, you get an image with int16 data type on disk and 1, 0<br>scalefactors, and your `sum` calculation silently overflows. I ran<br>into this when teaching - I had to cast some image arrays to floating<br>point to get sensible answers.<br><br>Solution<br>-----------<br><br>I think that the default behavior of nibabel should be to do the thing<br>least likely to trip you up by accident, so - I think in due course,<br>nibabel should always return a floating point array from `get_data()`<br>by default.<br><br>I propose to add a keyword-only argument to `get_data()` - `to_float`, as in:<br><br>data = img.get_data(to_float=False) # The current default behavior<br>data = img.get_data(to_float=True) # Integer arrays automatically<br>cast to float64<br><br>For this cycle (the nibabel 2.0 series), I propose to raise a warning<br>if you don't pass in an explicit True or False, warning that the<br></div></div>def!
ault
behavior for nibabel 3.0 will change from `to_float=False` to<span class=""><br>`to_float=True`.<br><br>The other, more fancy ways of getting the image data would continue as<br>they are, such as:<br><br>data = np.array(img.dataobj)<br>data = img.dataobj[:]<br><br>These will both return ints or floats depending on the raw data dtype<br>and the scalefactors. This is on the basis that people using these<br>will be more advanced and so therefore more likely to want memory<br>efficiency at the expense of having to be careful about the returned<br>data dtype.<br><br>Does this seem reasonable to y'all? Thoughts, suggestions?<br><br>Cheers,<br><br>Matthew<br><hr><br></span><span class="">Neuroimaging mailing list<br><a href="mailto:Neuroimaging@python.org" target="_blank">Neuroimaging@python.org</a><br><a href="https://mail.python.org/mailman/listinfo/neuroimaging" target="_blank">https://mail.python.org/mailman/listinfo/neuroimaging</a><br></span></pre></blockquote></div></div><br>_______________________________________________<br>
Neuroimaging mailing list<br>
<a href="mailto:Neuroimaging@python.org">Neuroimaging@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/neuroimaging" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/neuroimaging</a><br>
<br></blockquote></div><br></div>