sorry, for the top-post, but should we add this as an issue on the
github tracker? I'd like to revisit it this summer.
V-
* Julian Taylor
On 18.04.2014 18:29, Valentin Haenel wrote:
Hi,
* Valentin Haenel
[2014-04-17]: * Valentin Haenel
[2014-04-17]: * Julian Taylor
[2014-04-17]: On 17.04.2014 21:30, onefire wrote:
Thanks for the suggestion. I did profile the program before, just not using Python.
one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to cStringIO instances and then use ``ZipFile.writestr`` with the ``getvalue()`` of the cStringIO object. However that approach may require some memory. In python 2.7, for each array: one copy inside the cStringIO instance and then another copy of when calling getvalue on the cString, I believe.
There is a proof-of-concept implementation here:
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Anybody interested in me fixing this up (unit tests, API, etc..) for inclusion?
I wonder if it would be better to instead use a fifo to avoid the memory doubling. Windows probably hasn't got them (exposed via python) but one can slap a platform check in front. attached a proof of concept without proper error handling (which is unfortunately the tricky part)
From 472b4c0a44804b65d0774147010ec7a931a1c52d Mon Sep 17 00:00:00 2001 From: Julian Taylor
Date: Thu, 17 Apr 2014 23:01:47 +0200 Subject: [PATCH] use a pipe for savez --- numpy/lib/npyio.py | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py index 98b4b6e..baafa9d 100644 --- a/numpy/lib/npyio.py +++ b/numpy/lib/npyio.py @@ -585,22 +585,19 @@ def _savez(file, args, kwds, compress): zipf = zipfile_factory(file, mode="w", compression=compression)
# Stage arrays in a temporary file on disk, before writing to zip. - fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy') - os.close(fd) - try: + import threading + with tempfile.TemporaryDirectory() as td: + fifoname = os.path.join(td, "fifo") + os.mkfifo(fifoname) for key, val in namedict.items(): fname = key + '.npy' - fid = open(tmpfile, 'wb') - try: - format.write_array(fid, np.asanyarray(val)) - fid.close() - fid = None - zipf.write(tmpfile, arcname=fname) - finally: - if fid: - fid.close() - finally: - os.remove(tmpfile) + def mywrite(pipe, val): + with open(pipe, "wb") as wpipe: + format.write_array(wpipe, np.asanyarray(val)) + t = threading.Thread(target=mywrite, args=(fifoname, val)) + t.start() + zipf.write(fifoname, arcname=fname) + t.join()
zipf.close()
-- 1.9.1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion