[Numpy-discussion] About the npz format

David Palao dpalao.python at gmail.com
Thu Apr 17 05:17:37 EDT 2014


2014-04-16 20:26 GMT+02:00 onefire <onefire.myself at gmail.com>:
> Hi all,
>
> I have been playing with the idea of using Numpy's binary format as a
> lightweight alternative to HDF5 (which I believe is the "right" way to do if
> one does not have a problem with the dependency).
>
> I am pretty happy with the npy format, but the npz format seems to be broken
> as far as performance is concerned (or I am missing obvious!). The following
> ipython session illustrates the issue:
>
> ln [1]: import numpy as np
>
> In [2]: x = np.linspace(1, 10, 50000000)
>
> In [3]: %time np.save("x.npy", x)
> CPU times: user 40 ms, sys: 230 ms, total: 270 ms
> Wall time: 488 ms
>
> In [4]: %time np.savez("x.npz", data = x)
> CPU times: user 657 ms, sys: 707 ms, total: 1.36 s
> Wall time: 7.7 s
>

Hi,
In my case (python-2.7.3, numpy-1.6.1):

In [23]: %time save("xx.npy", x)
CPU times: user 0.00 s, sys: 0.23 s, total: 0.23 s
Wall time: 4.07 s

In [24]: %time savez("xx.npz", data = x)
CPU times: user 0.42 s, sys: 0.61 s, total: 1.02 s
Wall time: 4.26 s

In my case I don't see the "unbelievable amount of overhead" of the npz thing.

Best



More information about the NumPy-Discussion mailing list