[Neuroimaging] Nibabel API change - always read as float

Wed Jul 8 16:05:06 CEST 2015

Hi,

On Wed, Jul 8, 2015 at 3:29 AM, Satrajit Ghosh <satra at mit.edu> wrote:
> hi matthew,
>
>>
>> We have to address ourselves to the standard as it is actually used.
>> As the standard is used, there is almost never a reason to assume that
>> an image with slope = 1, intercept = 0 is really intended to be used
>> as integers in memory.
>
>
> to me this would always be float in memory. since slope == 1 and != 0.

Sorry - I should have been more specific.  Nibabel treats NaN or 0
slope as being equivalent to slope == 1, and intercept = NaN is
equivalent to intercept = 0. For slope equivalent to 1, intercept
equivalent to 0, nibabel currently returns the data as on-disk data
type, whatever that is.

I think Brendan quoted the only part of the NIfTI standard that is
relevant, and I don't think that has any bearing on the in-memory data
type, so even if people were using the standard exactly as written, I
don't think that would help us.

>> To emphasize, there is currently no guarantee that an image will be
>> identical if round tripped, and in general, it will not be identical
>> now, if slope != 1 and intercept != 0.
>
>
> what if scl_slope == 0, shouldn't we expect roundtrip identity?

You will get roundtrip identity for slope, intercept equivalent to 1,
0, but the fact that we are having this discussion means that it
hasn't thus far been obvious when or whether to expect round-trip
identity.

>> I realize that the default change will use more memory, but I don't
>> think we should be increasing the risk of silent generation of
>> entirely wrong results in order to optimize memory, in the default
>> case.
>
>
> i agree, but is there a way to allow for keeping the datatype intact? are we
> agreeing that a keyword is necessary, and dtype=None will keep the original
> datatype.

There is no current way of returning data with the on-disk datatype
when then slope, intercept are not equivalent to (1, 0).

I don't much like the dtype keyword, because it involves deciding how
to cast the data to all possible numpy types, which I think the user
should do explicitly if they want something other than the proposed
default float, or the int-if-on-disk-dtype-is-int-and-scaling-allows
default that we have now.

> at least the common scripts we use won't give a wrong answer. but many of
> our workflows will now crash because they would require additional memory
> for specific pieces. now that's several layers embedded from a user point of
> view.

Don't forget that, at the moment, your workflows will work sometimes
(when your data sources have null scaling in the headers) and crash at
other times (when your data sources use slope, inter).

With the proposed change, if you want the current default behavior,
you can either use whatever keyword we agree on, or `data =
np.array(img.dataobj)` for full-backwards compatibility.

> is the proposed change the augmented proposal to include a dtype keyword?

The options in my list of preference so far are:

* as_float={True, False}, where False results in the current
memory-saving algorithm;
* dtype={None or dtype specifier} with None giving the current
memory-saving algorithm;

I don't think there's any benefit for an extra keyword argument if we
aren't changing the default.

Cheers,

Matthew