[Neuroimaging] iteraxis API - we need feedback

Sat Sep 5 01:48:27 CEST 2015

Hi Matthew,

On Fri, Sep 4, 2015 at 4:06 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> Over at nibabel gh-344 [1], we found ourselves discussing how to write
> an iterator that will allow you to efficiently iterate over slices
> from the image array.   We'd love some feedback on where we got to.
>
> As some of you may know, images now have a `dataobj` attribute, that
> can contain one of two things:
>
> * an array proxy (if you loaded the image from a file);
> * a numpy array (if you created the image with data from an array);
>
> The array proxy object has some fancy slicing syntax that means that
> something like ``arr.dataobj[..., 0]`` will only read the data for the
> first slice on the last axis.  This can be a lot more efficient that
> loading all the data at once with `get_data` [2].
>
> We're currently thinking of a good iterator syntax, something like this:
>
> for vol in img.iteraxis(3):  # iterate over 4th axis
>     # do something with vol
>

Cool! Is it possible to also accept "x", "y", "z", "y" as the axis?

> where `iteraxis` would use `databobj` slicing under the hood.
>
> The questions are:
>
> * should this be a method on the image (`img.iteraxis`), the dataobj
> (`img.dataobj.iteraxis`) or should it be a standalone function that
> knows about arrays and array proxies? (`nibabel.iteraxis`);
> * how should the iterator optimize speed or memory?   Should this be
> configurable?  For example, if you are iterating over the first axis
> of a Nifti, then it will probably be most efficient to read all the
> data into memory and return the slices from the numpy array.   This
> will be very expensive in memory.   If a file is compressed, it may be
> most efficient to uncompress the file and use the uncompressed version
> with `dataobj` file slicing - but this will involve a temporary file
> that may be very large.   Options are:
>
>     * find some heuristic to chose joint optimization for memory and speed;
>     * always optimize for memory;
>     * always optimize for speed, saving memory where possible;
>     * have a tuning kwarg selecting between these options.
>

I think I would lean towards optimizing memory. You can always just wait
longer if things are running slow, but if your RAM fills up, you're stuck.
I do like the idea of a tuning parameter, though.

> The upside of image.iteraxis would be to embed knowledge we've gained
> on these objects and simplify the interface for users. The downside is
> it's more work for us and the right choice is system-dependent. To
> address this, Ben C proposed a benchmark method, which outputs which
> optimize method is best for the given image on the current system.
>
> Any thoughts?   Use-cases?
>
> Cheers,
>
> Matthew
>
>
> [1] https://github.com/nipy/nibabel/issues/344
> [2] http://nipy.org/nibabel/images_and_memory.html#saving-time-and-memory
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20150904/7498e43d/attachment.html>