[Numpy-discussion] About the npz format

onefire onefire.myself at gmail.com
Thu Apr 17 15:30:59 EDT 2014


Hi Nathaniel,

Thanks for the suggestion. I did profile the program before, just not using
Python.

But following your suggestion, I used %prun. Here's (part of) the output
(when I use savez):

 195503 function calls in 4.466 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    2.284    1.142    2.284    1.142 {method 'close' of
'_io.BufferedWriter' objects}
        1    0.918    0.918    0.918    0.918 {built-in method remove}
    48841    0.568    0.000    0.568    0.000 {method 'write' of
'_io.BufferedWriter' objects}
    48829    0.379    0.000    0.379    0.000 {built-in method crc32}
    48830    0.148    0.000    0.148    0.000 {method 'read' of
'_io.BufferedReader' objects}
        1    0.090    0.090    0.993    0.993 zipfile.py:1315(write)
        1    0.072    0.072    0.072    0.072 {method 'tostring' of
'numpy.ndarray' objects}
    48848    0.005    0.000    0.005    0.000 {built-in method len}
        1    0.001    0.001    0.270    0.270 format.py:362(write_array)
        3    0.000    0.000    0.000    0.000 {built-in method open}
        1    0.000    0.000    4.466    4.466 npyio.py:560(_savez)
        2    0.000    0.000    0.000    0.000 zipfile.py:1459(close)
        1    0.000    0.000    4.466    4.466 {built-in method exec}

Here's the output when I use save to save to a npy file:

 39 function calls in 0.266 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.196    0.049    0.196    0.049 {method 'write' of
'_io.BufferedWriter' objects}
        1    0.069    0.069    0.069    0.069 {method 'tostring' of
'numpy.ndarray' objects}
        1    0.001    0.001    0.266    0.266 format.py:362(write_array)
        1    0.000    0.000    0.000    0.000 {built-in method open}
        1    0.000    0.000    0.266    0.266 npyio.py:406(save)
        1    0.000    0.000    0.000    0.000
format.py:261(write_array_header_1_0)
        1    0.000    0.000    0.000    0.000 {method 'close' of
'_io.BufferedWriter' objects}
        1    0.000    0.000    0.266    0.266 {built-in method exec}
        1    0.000    0.000    0.000    0.000 format.py:154(magic)
        1    0.000    0.000    0.000    0.000
format.py:233(header_data_from_array_1_0)
        1    0.000    0.000    0.266    0.266 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 numeric.py:462(asanyarray)
        1    0.000    0.000    0.000    0.000 py3k.py:28(asbytes)

The calls to close and the built-in method remove seem to be the
responsible for the inefficiency of  the Numpy implementation (compared to
the Julia package that I mentioned before). This was tested using Python
3.4 and Numpy 1.8.1.
However if I do the tests with Python 3.3.5 and Numpy 1.8.0, savez becomes
much faster, so I think there is something wrong with this combination
Python 3.4/Numpy 1.8.1.
Also, if I use Python 2.4 and Numpy 1.2 (from my school's cluster) I get
that np.save takes about 3.5 seconds and np.savez takes about 7 seconds, so
all these timings seem to be hugely dependent on the system/version (maybe
this explain David Palao's results?).

However, they all point out that a significant amount of time is spent
computing the crc32. Notice that prun reports that it takes 0.379 second to
compute the crc32 of an array that takes 0.2 seconds to save to a npy file.
I believe this is too much! And it get worse if you try to save bigger
arrays.


On Thu, Apr 17, 2014 at 5:23 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On 17 Apr 2014 01:57, "onefire" <onefire.myself at gmail.com> wrote:
> >
> > What I cannot understand is why savez takes more than 10 times longer
> than saving the data to a npy file. The only reason that I could come up
> with was the computation of the crc32.
>
> We can all make guesses but the solution is just to profile it :-). %prun
> in ipython (and then if you need more granularity installing line_profiler
> is useful).
>
> -n
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140417/fb96dbbb/attachment.html>


More information about the NumPy-Discussion mailing list