np.savez not multi-processing safe, alternatives?

I have a process that stores a number of sets of 3 arrays output which can either be stored as a few .npy files or an .npz file with the same keys in each file (let's say, writing roughly 10,000 npz files, all containing the same keys 'a', 'b', 'c'). If I run multiple processes on the same machine (desirable, since they heavily database-IO-bound), over a period of hours some of the npz-writes will collide and fail due to the use of tempfile and tempfile.gettempdir() (either one of the .npy subfiles will be locked for writing or will get os.remove'd while the zip file is being written). So my question-- recommendations for a way around this, or possible to change the savez function to make it less likely to happen? (I am on Win32) Thanks, Wes

Mon, 30 Mar 2009 09:03:56 -0400, Wes McKinney wrote:
I have a process that stores a number of sets of 3 arrays output which can either be stored as a few .npy files or an .npz file with the same keys in each file (let's say, writing roughly 10,000 npz files, all containing the same keys 'a', 'b', 'c'). If I run multiple processes on the same machine (desirable, since they heavily database-IO-bound), over a period of hours some of the npz-writes will collide and fail due to the use of tempfile and tempfile.gettempdir() (either one of the .npy subfiles will be locked for writing or will get os.remove'd while the zip file is being written).
This is bug #852, it's fixed in trunk. As a workaround for the present, you may want to grab the `savez` function from http://projects.scipy.org/numpy/browser/trunk/numpy/lib/io.py#L243 and use a copy of it in your code temporarily. The function is fairly small. -- Pauli Virtanen
participants (2)
-
Pauli Virtanen
-
Wes McKinney