Hi all,

I‘ve made the Pip/Conda module npy-append-array for exactly this purpose, see

https://github.com/xor2k/npy-append-array

It works with one dimensional arrays, too, of course. The key challange is to properly initialize and update the header accordingly as the array grows which my module takes care of. I‘d like to integrate this functionality directly into Numpy, see PR

https://github.com/numpy/numpy/pull/20321/


but I have been busy and did have not received any feedback recently. A more direct integration into Numpy would allow to skip or ease the header update part, e.g. by introducing a new file format version. This could turn .npy into a sort of binary CSV equivalent where the size of the array is determined by the file size.


Best, Michael


On 24. Aug 2022, at 03:04, Robert Kern <robert.kern@gmail.com> wrote:


On Tue, Aug 23, 2022 at 8:47 PM <bross_phobrain@sonic.net> wrote:
I want to calc multiple ndarrays at once and lack memory, so want to write in chunks (here sized to GPU batch capacity). It seems there should be an interface to write the header, then write a number of elements cyclically, then add any closing rubric and close the file.

Is it as simple as lib.format.write_array_header_2_0(fp, d)
then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())?
 
`item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There is no footer after the data.

The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', dtype=dtype, shape=shape)`, then assign slices sequentially to the returned memory-mapped array. A memory-mapped array is usually going to be friendlier to whatever memory limits you are running into than a nominally "in-memory" array.

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: michael.siebert2k@gmail.com