[Numpy-discussion] saving incrementally numpy arrays

David Warde-Farley dwf at cs.toronto.edu
Wed Aug 12 19:32:17 EDT 2009


On 12-Aug-09, at 7:11 PM, Juan Fiol wrote:

> Hi, I finally decided by the pytables approach because will be  
> easier later to work with the data. Now, I know is not the right  
> place but may be I can get some quick pointers. I've calculated a  
> numpy array of about 20 columns and a few thousands rows at each  
> time. I'd like to append all the rows without iterating over the  
> numpy array. Someone knows what would be the "right" approach? I am  
> looking for something simple, I do not need to keep the piece of  
> table after I put into the h5file. Thanks in advance and regards, Juan


You'll probably want the EArray. createEArray() on a new h5file, then  
append to it.

http://www.pytables.org/docs/manual/ch04.html#EArrayMethodsDescr

If your chunks are always the same size it might be best to try and do  
your work in-place and not allocate a new NumPy array each time. In  
theory 'del' ing the object when you're done with it should work but  
the garbage collector may not act quickly enough for your liking/the  
allocation step may start slowing you down.

What do I mean? Well, you could clear the array when you're done with  
it using foo[:] = 0 (or nan, or whatever) and when you're "building it  
up" use the inplace augmented assignment operators as much as possible  
(+=, /=, -=, *=, %=, etc.).

David



More information about the NumPy-Discussion mailing list