Saving an array on disk to free memory - Pickling
Hello, I tried to create an object : - which behave just like a numpy array ; - which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ; - which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ; - which unloads itself before being pickled (values are already saved and don't have to be pickled). It can't, at least I think so, inherit from ndarray because sometimes (for example juste after being unpickled and before being used) it is juste an empty shell. I don't think memmap can be helpful (I want to use pytables to save it on disk and I want it to be flexible : if I use it in a temporary way, I just need it in memory and I will never save it on disk). My problems are : - this code is ugly ; - I have to define explicitely all special methods (__add__, __mul__...) of ndarrays because: * __getattr__ don't retrieve them ; * even if it does, I have to define explicitely the type of the return value (if I well understand, if it inherits from ndarray __array_wrap__ do all the stuff). Thank you for the help. Regards. import numpy import numpy class PersistentArray(object): def __init__(self, values): ''' values is a numpy array ''' self.values = values self.filename = None self.is_loaded = True self.is_saved = False def save(self, filename): self.filename = filename numpy.save(self.filename, self.values) self.is_saved = True def load(self): self.values = numpy.load(self.filename) self.is_loaded = True def unload(self): if not self.is_saved: raise Exception, "PersistentArray must be saved before being unloaded" del self.values self.is_loaded = False def __getitem__(self, index): return self.values[index] def __getattr__(self, key): if key == 'values': if not self.is_loaded: self.load() return self.values elif key == '__array_interface__': #I can't remember why I wrote this code, but I think it's necessary to make pickling work properly raise AttributeError, key else: try: #to emulate ndarray inheritance return self.values.__getattribute__(key) except AttributeError: raise AttributeError, key def __setstate__(self, dict): self.__dict__.update(dict) if self.is_loaded and self.is_saved: self.load() def __getstate__(self): if not self.is_saved: raise Exception, "persistent array must be saved before being pickled" odict = self.__dict__.copy() if self.is_saved: if self.is_loaded: odict['is_loaded'] = False del odict['values'] return odict filename = 'persistent_test.npy' a = PersistentArray(numpy.arange(10e6)) a.save(filename) a.sum() a.unload() # a still exists, knows how to retrieve values if needed, but don't use space in memory
Is a memory mapped file is a viable solution to your problem? Nadav -----Original Message----- From: numpy-discussion-bounces@scipy.org on behalf of Jean-Baptiste Rudant Sent: Mon 17-May-10 14:03 To: Numpy Discussion Subject: [Numpy-discussion] Saving an array on disk to free memory - Pickling Hello, I tried to create an object : - which behave just like a numpy array ; - which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ; - which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ; - which unloads itself before being pickled (values are already saved and don't have to be pickled). It can't, at least I think so, inherit from ndarray because sometimes (for example juste after being unpickled and before being used) it is juste an empty shell. I don't think memmap can be helpful (I want to use pytables to save it on disk and I want it to be flexible : if I use it in a temporary way, I just need it in memory and I will never save it on disk). My problems are : - this code is ugly ; - I have to define explicitely all special methods (__add__, __mul__...) of ndarrays because: * __getattr__ don't retrieve them ; * even if it does, I have to define explicitely the type of the return value (if I well understand, if it inherits from ndarray __array_wrap__ do all the stuff). Thank you for the help. Regards. import numpy import numpy class PersistentArray(object): def __init__(self, values): ''' values is a numpy array ''' self.values = values self.filename = None self.is_loaded = True self.is_saved = False def save(self, filename): self.filename = filename numpy.save(self.filename, self.values) self.is_saved = True def load(self): self.values = numpy.load(self.filename) self.is_loaded = True def unload(self): if not self.is_saved: raise Exception, "PersistentArray must be saved before being unloaded" del self.values self.is_loaded = False def __getitem__(self, index): return self.values[index] def __getattr__(self, key): if key == 'values': if not self.is_loaded: self.load() return self.values elif key == '__array_interface__': #I can't remember why I wrote this code, but I think it's necessary to make pickling work properly raise AttributeError, key else: try: #to emulate ndarray inheritance return self.values.__getattribute__(key) except AttributeError: raise AttributeError, key def __setstate__(self, dict): self.__dict__.update(dict) if self.is_loaded and self.is_saved: self.load() def __getstate__(self): if not self.is_saved: raise Exception, "persistent array must be saved before being pickled" odict = self.__dict__.copy() if self.is_saved: if self.is_loaded: odict['is_loaded'] = False del odict['values'] return odict filename = 'persistent_test.npy' a = PersistentArray(numpy.arange(10e6)) a.save(filename) a.sum() a.unload() # a still exists, knows how to retrieve values if needed, but don't use space in memory
A Monday 17 May 2010 13:03:19 Jean-Baptiste Rudant escrigué:
Hello,
I tried to create an object : - which behave just like a numpy array ; - which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ; - which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ; - which unloads itself before being pickled (values are already saved and don't have to be pickled). [clip]
Well, if you are using Linux, you can make use of /dev/shm in order to save your files in-memory instead of disk. There are some considerations to have in mind for doing this though: http://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when- should-i-use-tmp However, I don't know the equivalent to this in Win, Mac OSX or other UNICES. -- Francesc Alted
Thank you very much for the help. But I was more looking for some coding solution (furthermore, I'm not using Linux). My point in not to make some real arrays looking like they are saved on files (and use for it some files in memory), but at the contrary, to make some "fake" arrays, saved on disk, to look like real arrays. In other words, to make them behave like if they were inherited from ndarrays. In pytables, if you use my_node[:] it returns an array. It's part of what I want to do. Plus : - if I call my_node[:] ten times, only the first call will read to disk (it matters, even if pytables is very very fast) ; - the same class can represent a node or a numpy array. Jean-Baptiste Rudant ________________________________ De : Francesc Alted <faltet@pytables.org> À : Discussion of Numerical Python <numpy-discussion@scipy.org> Envoyé le : Lun 17 mai 2010, 20h 13min 35s Objet : Re: [Numpy-discussion] Saving an array on disk to free memory - Pickling A Monday 17 May 2010 13:03:19 Jean-Baptiste Rudant escrigué:
Hello,
I tried to create an object : - which behave just like a numpy array ; - which can be saved on disk in an efficient way (numpy.save in my example but with pytables in my real program) ; - which can be "unloaded" (if it is saved) to free memory : it can exsit has an empty stuff which knows how to retrieve real values ; it will be loaded only when we need to work with it ; - which unloads itself before being pickled (values are already saved and don't have to be pickled). [clip]
Well, if you are using Linux, you can make use of /dev/shm in order to save your files in-memory instead of disk. There are some considerations to have in mind for doing this though: http://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when- should-i-use-tmp However, I don't know the equivalent to this in Win, Mac OSX or other UNICES. -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
A Tuesday 18 May 2010 08:57:47 Jean-Baptiste Rudant escrigué:
Thank you very much for the help.
But I was more looking for some coding solution (furthermore, I'm not using Linux). My point in not to make some real arrays looking like they are saved on files (and use for it some files in memory), but at the contrary, to make some "fake" arrays, saved on disk, to look like real arrays. In other words, to make them behave like if they were inherited from ndarrays.
In pytables, if you use my_node[:] it returns an array. It's part of what I want to do. Plus : - if I call my_node[:] ten times, only the first call will read to disk (it matters, even if pytables is very very fast) ;
Well, the second time that you read from disk, pytables will also will read from memory instead. Indeed, it is the OS filesystem cache who will do the trick, and the speed may be not exactly the same as a pure numpy array, but hey, it should be very close and with zero implementation cost. Moreover, if you have to do arithmetic computations with your arrays, you can make use of `tables.Expr()` module that can perform them generally faster than using pure numpy (for example, see http://pytables.org/moin/ComputingKernel). -- Francesc Alted
participants (3)
-
Francesc Alted
-
Jean-Baptiste Rudant
-
Nadav Horesh