Hi all, I‘ve made the Pip/Conda module npy-append-array for exactly this purpose, see https://github.com/xor2k/npy-append-array It works with one dimensional arrays, too, of course. The key challange is to properly initialize and update the header accordingly as the array grows which my module takes care of. I‘d like to integrate this functionality directly into Numpy, see PR https://github.com/numpy/numpy/pull/20321/ but I have been busy and did have not received any feedback recently. A more direct integration into Numpy would allow to skip or ease the header update part, e.g. by introducing a new file format version. This could turn .npy into a sort of binary CSV equivalent where the size of the array is determined by the file size. Best, Michael
On 24. Aug 2022, at 03:04, Robert Kern
wrote: On Tue, Aug 23, 2022 at 8:47 PM wrote: I want to calc multiple ndarrays at once and lack memory, so want to write in chunks (here sized to GPU batch capacity). It seems there should be an interface to write the header, then write a number of elements cyclically, then add any closing rubric and close the file.
Is it as simple as lib.format.write_array_header_2_0(fp, d) then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())?
`item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There is no footer after the data.
The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', dtype=dtype, shape=shape)`, then assign slices sequentially to the returned memory-mapped array. A memory-mapped array is usually going to be friendlier to whatever memory limits you are running into than a nominally "in-memory" array.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: michael.siebert2k@gmail.com