[Neuroimaging] Nibabel API change - always read as float
alexis.roche at gmail.com
Sun Jul 19 17:02:21 CEST 2015
Sorry for jumping late into this discussion.
A case where I believe it would be confusing to have a float array by
default is resampling an image according to a spatial transform using cubic
spline or sinc interpolation (or any order>1 spline interpolation). While
such interpolation methods are arguably more accurate than trilinear
interpolation, they have the potential to produce negative intensities even
though the input image intensities are all positive and are expected to
remain positive after resampling. In such case, standard practice is to low
threshold interpolated values at zero, however that's something the user
would have to bear in mind (and be enforce himself) if dealing with a float
array as opposed to unsigned int.
In that case, using the native image dtype would only help if it's unsigned
(which is not necessarily the case), but the point is: the user always has
to think about his/her dtype, there is no "universal" confusion-free dtype.
When people ask me if there are drawbacks switching from matlab to python
to do image processing, I usually say that the only drawback I see is that
you have to be aware of your array type in python (as a consequence of
using numpy), which also has great advantages...
Hence I am not very keen to get float arrays by default.
Le 18 juil. 2015 17:25, "Matthew Brett" <matthew.brett at gmail.com> a écrit :
> Sorry to be late to reply to this one.
> On Mon, Jul 6, 2015 at 4:55 PM, Yaroslav Halchenko <lists at onerussian.com>
> > On Mon, 06 Jul 2015, Matthew Brett wrote:
> >> Hi,
> >> I wanted to ask y'all about an API change that I want to make to
> >> In summary, I want to default to returning floating point arrays from
> >> nibabel images.
> >> Problem - different returned data types from img.get_data()
> >> At the moment, if you do this:
> >> img = nib.load('my_image.nii')
> >> data = img.get_data()
> >> Then the data type (dtype) of the returned data array depends on the
> >> values in the header of `my_image.nii`. Specifically, if the raw
> >> on-disk data type is 'np.int16' (it is often is) and the header
> >> scalefactor values are default (1 for slope, 0 for intercept) then you
> >> will get back an array of the on disk data type - here - np.int16.
> >> This is very efficient on memory, but it it's a real trap unless you
> >> For example, let's say you had a pipeline where you did this:
> >> sum = img.get_data().sum()
> >> That would work fine most of the time, when the data on disk is
> >> floating point, or the scalefactors are not default (1, 0). Then one
> >> day, you get an image with int16 data type on disk and 1, 0
> >> scalefactors, and your `sum` calculation silently overflows. I ran
> >> into this when teaching - I had to cast some image arrays to floating
> >> point to get sensible answers.
> >> Solution
> >> -----------
> >> I think that the default behavior of nibabel should be to do the thing
> >> least likely to trip you up by accident, so - I think in due course,
> >> nibabel should always return a floating point array from `get_data()`
> >> by default.
> >> I propose to add a keyword-only argument to `get_data()` - `to_float`,
> as in:
> >> data = img.get_data(to_float=False) # The current default behavior
> >> data = img.get_data(to_float=True) # Integer arrays automatically
> >> cast to float64
> >> For this cycle (the nibabel 2.0 series), I propose to raise a warning
> >> if you don't pass in an explicit True or False, warning that the
> >> default behavior for nibabel 3.0 will change from `to_float=False` to
> >> `to_float=True`.
> >> The other, more fancy ways of getting the image data would continue as
> >> they are, such as:
> >> data = np.array(img.dataobj)
> >> data = img.dataobj[:]
> >> These will both return ints or floats depending on the raw data dtype
> >> and the scalefactors. This is on the basis that people using these
> >> will be more advanced and so therefore more likely to want memory
> >> efficiency at the expense of having to be careful about the returned
> >> data dtype.
> >> Does this seem reasonable to y'all? Thoughts, suggestions?
> > Overall I am all for reducing a possibility for users shotting
> > themselves in a foot. But this API change would still conflate two
> > (related) issues here: memory mapping and casting. May be there is a
> > way to clear it up? Sorry for not providing an answer but at
> > least let me share the use case(s):
> > with to_float=True it would then keep memory mapping float .nii
> > while int .nii would get casted. So it would remain somewhat obfuscated
> > when data gets memmapped and when not. In our (PyMVPA) case we don't
> > really care about correct offset/scale in many cases (data gets zscored
> > anyways per each voxel later on with appropriate casting if necessary).
> Yes, that's true, you would have to know the algorithm to know whether
> nibabel was going to memmap or not. That's true now though - for
> example, it never memory maps compressed files. The difference for
> memory mapping is that, the memory mapping no longer depends on the
> scalefactors for the int case.
> You can always specify no memory mapping with ``mmap=False``, if you
> need it to be predictable.
> > Now, even with "explicit" to_float=False I would then get a warning.
> No, my idea was that you only get a warning if you do not give a value
> for "to_float". It will be a bit annoying though, because you either
> have to live with the warning, or depend on nibabel > 2.1 .
> > I
> > guess we would just need to switch to your explicit img.dataobj way to
> > access the data?
> If you like the current behavior and you don't want a warning, that's
> the best way, I think.
> Neuroimaging mailing list
> Neuroimaging at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Neuroimaging