lazy loading ndarrays

I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach? Thanks, and cheers! -Craig

Hi, On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka <craigyk@me.com> wrote:
I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach?
What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats: http://nipy.sourceforge.net/nibabel/ - but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like: import nibabel as nib img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows If you think you might have similar needs, I'd be very happy to help you get going in nibabel... Best, Matthew

Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array. For example, something like: class DataFormat(object): def __init__(self, filename): self.filename = filename for key, value in self._read_header().iteritems(): setattr(self, key, value) @property def data(self): try: return self._data except AttributeError: self._data = self._read_data() return self._data Hope that helps, -Joe On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate
On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka <craigyk@me.com> wrote: header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach?
What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats:
http://nipy.sourceforge.net/nibabel/
- but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like:
import nibabel as nib
img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows
If you think you might have similar needs, I'd be very happy to help you get going in nibabel...
Best,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

ok, that was an alternative strategy I was going to try... but not my favorite as I'd have to explicitly perform all operations on the data portion of the object, and given numpy's mechanics, assignment would also have to be explicit, and creating new image objects implicitly would be trickier: image3 = Image(image1) image3.data = ( image1.data + 19.0 ) * image2.data vs. image3 = ( image1 + 19 ) * image2 I suppose option A isn't that bad though and getting lazy loading would be very straightforward.... -- On a side note, I prefer this construct for lazy operations... curious to see what people's reactions are, ie: that's horrible! class lazy_property(object): ''' meant to be used for lazy evaluation of object attributes. should represent non-mutable return value, as whatever is returned replaces itself permanently. ''' def __init__(self,fget): self.fget = fget def __get__(self,obj,cls): value = self.fget(obj) setattr(obj,self.fget.func_name,value) return value class DataFormat(object): def __init__(self,loader): self.loadData = loader @lazy_property def data(self): return self.loadData() On Jul 26, 2011, at 5:45 PM, Joe Kington wrote:
Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array.
For example, something like:
class DataFormat(object): def __init__(self, filename): self.filename = filename for key, value in self._read_header().iteritems(): setattr(self, key, value)
@property def data(self): try: return self._data except AttributeError: self._data = self._read_data() return self._data
Hope that helps, -Joe
On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett <matthew.brett@gmail.com> wrote: Hi,
On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka <craigyk@me.com> wrote:
I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach?
What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats:
http://nipy.sourceforge.net/nibabel/
- but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like:
import nibabel as nib
img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows
If you think you might have similar needs, I'd be very happy to help you get going in nibabel...
Best,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

For lazy data loading I use memory-mapped array (numpy.memmap): I use it to process multi-image files that are much larger than the available RAM. Nadav. ________________________________ From: numpy-discussion-bounces@scipy.org [numpy-discussion-bounces@scipy.org] On Behalf Of Craig Yoshioka [craigyk@me.com] Sent: 27 July 2011 05:41 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] lazy loading ndarrays ok, that was an alternative strategy I was going to try... but not my favorite as I'd have to explicitly perform all operations on the data portion of the object, and given numpy's mechanics, assignment would also have to be explicit, and creating new image objects implicitly would be trickier: image3 = Image(image1) image3.data = ( image1.data + 19.0 ) * image2.data vs. image3 = ( image1 + 19 ) * image2 I suppose option A isn't that bad though and getting lazy loading would be very straightforward.... -- On a side note, I prefer this construct for lazy operations... curious to see what people's reactions are, ie: that's horrible! class lazy_property(object): ''' meant to be used for lazy evaluation of object attributes. should represent non-mutable return value, as whatever is returned replaces itself permanently. ''' def __init__(self,fget): self.fget = fget def __get__(self,obj,cls): value = self.fget(obj) setattr(obj,self.fget.func_name,value) return value class DataFormat(object): def __init__(self,loader): self.loadData = loader @lazy_property def data(self): return self.loadData() On Jul 26, 2011, at 5:45 PM, Joe Kington wrote: Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array. For example, something like: class DataFormat(object): def __init__(self, filename): self.filename = filename for key, value in self._read_header().iteritems(): setattr(self, key, value) @property def data(self): try: return self._data except AttributeError: self._data = self._read_data() return self._data Hope that helps, -Joe On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett <matthew.brett@gmail.com<mailto:matthew.brett@gmail.com>> wrote: Hi, On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka <craigyk@me.com<mailto:craigyk@me.com>> wrote:
I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach?
What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats: http://nipy.sourceforge.net/nibabel/ - but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like: import nibabel as nib img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows If you think you might have similar needs, I'd be very happy to help you get going in nibabel... Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org<mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org<mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Craig Yoshioka
-
Joe Kington
-
Matthew Brett
-
Nadav Horesh