* Valentin Haenel
* Valentin Haenel
[2014-04-17]: Hi,
* Julian Taylor
[2014-04-17]: On 17.04.2014 21:30, onefire wrote:
Hi Nathaniel,
Thanks for the suggestion. I did profile the program before, just not using Python.
one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage.
As a workaround may also be possible to write the temporary NPY files to cStringIO instances and then use ``ZipFile.writestr`` with the ``getvalue()`` of the cStringIO object. However that approach may require some memory. In python 2.7, for each array: one copy inside the cStringIO instance and then another copy of when calling getvalue on the cString, I believe.
There is a proof-of-concept implementation here:
https://github.com/esc/numpy/compare/feature;npz_no_temp_file
Here are the timings, again using ``sync()`` from bloscpack (but it's just a ``os.system('sync')``, in case you want to run your own benchmarks):
In [1]: import numpy as np
In [2]: import bloscpack.sysutil as bps
In [3]: x = np.linspace(1, 10, 50000000)
In [4]: %timeit np.save("x.npy", x) ; bps.sync() 1 loops, best of 3: 1.93 s per loop
In [5]: %timeit np.savez("x.npz", x) ; bps.sync() 1 loops, best of 3: 7.88 s per loop
In [6]: %timeit np._savez_no_temp("x.npy", [x], {}, False) ; bps.sync() 1 loops, best of 3: 3.22 s per loop
Not too bad, but still slower than plain NPY, memory copies would be my guess.
PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master
Also, in cae you were wondering, here is the profiler output: In [2]: %prun -l 10 np._savez_no_temp("x.npy", [x], {}, False) 943 function calls (917 primitive calls) in 1.139 seconds Ordered by: internal time List reduced from 99 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.386 0.386 0.386 0.386 {zlib.crc32} 8 0.234 0.029 0.234 0.029 {method 'write' of 'file' objects} 27 0.162 0.006 0.162 0.006 {method 'write' of 'cStringIO.StringO' objects} 1 0.158 0.158 0.158 0.158 {method 'getvalue' of 'cStringIO.StringO' objects} 1 0.091 0.091 0.091 0.091 {method 'close' of 'file' objects} 24 0.064 0.003 0.064 0.003 {method 'tobytes' of 'numpy.ndarray' objects} 1 0.022 0.022 1.119 1.119 npyio.py:608(_savez_no_temp) 1 0.019 0.019 1.139 1.139 <string>:1(<module>) 1 0.002 0.002 0.227 0.227 format.py:362(write_array) 1 0.001 0.001 0.001 0.001 zipfile.py:433(_GenerateCRCTable) V-