[Neuroimaging] Nibabel API change - always read as float

Mon Jul 6 18:03:24 CEST 2015

I agree with Gael.

I would also add, it also seems to be solving a problem that doesn't seem
to exist in the wild. There's never a report of hitting this issue, right?
Users using sum() or other raw operations are likely familiar with numpy,
and more ready/capable to debug their error.

Adding complexity can potentially introduce confusion for all users,
including relatively numpy-naive ones. Then we potentially have to support
new questions, and I'm not sure there's any practical benefit.

Ben

On Mon, Jul 6, 2015 at 8:55 AM, Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> I think that this is a very bad idea. First because it's "magic": changing
> things behind the back of the user. People will be surprised and it will
> lead to bugs in their code.  Second because it is a loss of semantics.
> Something like a mask or a label image is stored as integers for a good
> reason.
>
> The right solution to the problem is to teach people to use float data
> when relevant, but not to force this decision in them.
>
> Gaël
>
> Sent from my phone. Please forgive brevity and mis spelling
> On Jul 6, 2015, at 17:33, Matthew Brett <matthew.brett at gmail.com> wrote:
>>
>> Hi,
>>
>> I wanted to ask y'all about an API change that I want to make to nibabel.
>>
>> In summary, I want to default to returning floating point arrays from
>> nibabel images.
>>
>> Problem - different returned data types from img.get_data()
>> ------------------------------
>>
>>
>> At the moment, if you do this:
>>
>> img = nib.load('my_image.nii')
>> data = img.get_data()
>>
>> Then the data type (dtype) of the returned data array depends on the
>> values in the header of `my_image.nii`.   Specifically, if the raw
>> on-disk data type is 'np.int16' (it is often is) and the header
>> scalefactor values are default (1 for slope, 0 for intercept) then you
>> will get back an array of the on disk data type - here - np.int16.
>>
>> This is very efficient on memory, but it it's a real trap unless you careful.
>>
>> For example, let's say you had a pipeline where you did this:
>>
>> sum = img.get_data().sum()
>>
>> That would work fine most of the time, when the data on disk
>> is
>> floating point, or the scalefactors are not default (1, 0).   Then one
>> day, you get an image with int16 data type on disk and 1, 0
>> scalefactors, and your `sum` calculation silently overflows.    I ran
>> into this when teaching - I had to cast some image arrays to floating
>> point to get sensible answers.
>>
>> Solution
>> -----------
>>
>> I think that the default behavior of nibabel should be to do the thing
>> least likely to trip you up by accident, so - I think in due course,
>> nibabel should always return a floating point array from `get_data()`
>> by default.
>>
>> I propose to add a keyword-only argument to `get_data()` - `to_float`, as in:
>>
>> data = img.get_data(to_float=False)  # The current default behavior
>> data = img.get_data(to_float=True)  # Integer arrays automatically
>> cast to float64
>>
>> For this cycle (the nibabel 2.0 series), I propose to raise a warning
>> if you don't pass in an explicit True or False, warning that the
>> def!
>>  ault
>> behavior for nibabel 3.0 will change from `to_float=False` to
>> `to_float=True`.
>>
>> The other, more fancy ways of getting the image data would continue as
>> they are, such as:
>>
>> data = np.array(img.dataobj)
>> data = img.dataobj[:]
>>
>> These will both return ints or floats depending on the raw data dtype
>> and the scalefactors.  This is on the basis that people using these
>> will be more advanced and so therefore more likely to want memory
>> efficiency at the expense of having to be careful about the returned
>> data dtype.
>>
>> Does this seem reasonable to y'all?    Thoughts, suggestions?
>>
>> Cheers,
>>
>> Matthew
>> ------------------------------
>>
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20150706/908b7f78/attachment-0001.html>