
Is a memory mapped file is a viable solution to your problem? Nadav -----Original Message----- From: numpy-discussion-bounces@scipy.org on behalf of Jean-Baptiste Rudant Sent: Mon 17-May-10 14:03 To: Numpy Discussion Subject: [Numpy-discussion] Saving an array on disk to free memory - Pickling Hello, I tried to create an object : - which behave just like a numpy array ; - which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ; - which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ; - which unloads itself before being pickled (values are already saved and don't have to be pickled). It can't, at least I think so, inherit from ndarray because sometimes (for example juste after being unpickled and before being used) it is juste an empty shell. I don't think memmap can be helpful (I want to use pytables to save it on disk and I want it to be flexible : if I use it in a temporary way, I just need it in memory and I will never save it on disk). My problems are : - this code is ugly ; - I have to define explicitely all special methods (__add__, __mul__...) of ndarrays because: * __getattr__ don't retrieve them ; * even if it does, I have to define explicitely the type of the return value (if I well understand, if it inherits from ndarray __array_wrap__ do all the stuff). Thank you for the help. Regards. import numpy import numpy class PersistentArray(object): def __init__(self, values): ''' values is a numpy array ''' self.values = values self.filename = None self.is_loaded = True self.is_saved = False def save(self, filename): self.filename = filename numpy.save(self.filename, self.values) self.is_saved = True def load(self): self.values = numpy.load(self.filename) self.is_loaded = True def unload(self): if not self.is_saved: raise Exception, "PersistentArray must be saved before being unloaded" del self.values self.is_loaded = False def __getitem__(self, index): return self.values[index] def __getattr__(self, key): if key == 'values': if not self.is_loaded: self.load() return self.values elif key == '__array_interface__': #I can't remember why I wrote this code, but I think it's necessary to make pickling work properly raise AttributeError, key else: try: #to emulate ndarray inheritance return self.values.__getattribute__(key) except AttributeError: raise AttributeError, key def __setstate__(self, dict): self.__dict__.update(dict) if self.is_loaded and self.is_saved: self.load() def __getstate__(self): if not self.is_saved: raise Exception, "persistent array must be saved before being pickled" odict = self.__dict__.copy() if self.is_saved: if self.is_loaded: odict['is_loaded'] = False del odict['values'] return odict filename = 'persistent_test.npy' a = PersistentArray(numpy.arange(10e6)) a.save(filename) a.sum() a.unload() # a still exists, knows how to retrieve values if needed, but don't use space in memory