[Numpy-discussion] performance of numpy.array()

Ryan Nelson rnelsonchem at gmail.com
Thu Apr 30 10:24:40 EDT 2015


I have had good luck with Continuum's Miniconda Python distributions on
Linux.
http://conda.pydata.org/miniconda.html
The `conda` command makes it very easy to create specific testing
environments for Python 2 and 3 with many different packages. Everything is
precompiled, so you won't have to worry about system library differences
between the two clusters.

Hope that helps.

Ryan

On Thu, Apr 30, 2015 at 10:03 AM, simona bellavista <afylot at gmail.com>
wrote:

> I have seen a big improvement in performance with  numpy 1.9.2 with python
> 2.7.8, numpy.array takes 5 s instead of 300s.
>
> On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4
> and the results are terrible: numpy.array takes 20s, but the other routines
> are slowed down, for example concatenate and astype and copy and uniform.
> Most of all, the sort function of numpy.dnarray is slowed down by a factor
> at least 10.
>
> On the other cluster I am using python 3.3 with numpy 1.9.0 and it is
> working very well (but I think it is so also because of the hardware). I
> was trying to install python 3.3 on this cluster, but because of other
> issues (error at compile time of h5py library and bug at runtime in the
> dill library) I cannot test it right now.
>
> 2015-04-29 17:47 GMT+02:00 Sebastian Berg <sebastian at sipsolutions.net>:
>
>> There was a major improvement to np.array in some cases.
>>
>> You can probably work around this by using np.concatenate instead of
>> np.array in your case (depends on the usecase, but I will guess you have
>> code doing:
>>
>> np.array([arr1, arr2, arr3])
>>
>> or similar. If your use case is different, you may be out of luck and
>> only an upgrade would help.
>>
>>
>> On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote:
>> > You could try and install your own numpy to check whether that
>> > resolves the problem.
>> >
>> > 2015-04-29 17:40 GMT+02:00 simona bellavista <afylot at gmail.com>:
>> >         on cluster A 1.9.0 and on cluster B 1.8.2
>> >
>> >         2015-04-29 17:18 GMT+02:00 Nick Papior Andersen
>> >         <nickpapior at gmail.com>:
>> >                 Compile it yourself to know the limitations/benefits
>> >                 of the dependency libraries.
>> >
>> >
>> >                 Otherwise, have you checked which versions of numpy
>> >                 they are, i.e. are they the same version?
>> >
>> >
>> >                 2015-04-29 17:05 GMT+02:00 simona bellavista
>> >                 <afylot at gmail.com>:
>> >
>> >                         I work on two distinct scientific clusters. I
>> >                         have run the same python code on the two
>> >                         clusters and I have noticed that one is faster
>> >                         by an order of magnitude than the other (1min
>> >                         vs 10min, this is important because I run this
>> >                         function many times).
>> >
>> >
>> >                         I have investigated with a profiler and I have
>> >                         found that the cause of this is that (same
>> >                         code and same data) is the function
>> >                         numpy.array that is being called 10^5 times.
>> >                         On cluster A it takes 2 s in total, whereas on
>> >                         cluster B it takes ~6 min.  For what regards
>> >                         the other functions, they are generally faster
>> >                         on cluster A. I understand that the clusters
>> >                         are quite different, both as hardware and
>> >                         installed libraries. It strikes me that on
>> >                         this particular function the performance is so
>> >                         different. I would have though that this is
>> >                         due to a difference in the available memory,
>> >                         but actually by looking with `top` the memory
>> >                         seems to be used only at 0.1% on cluster B. In
>> >                         theory numpy is compiled with atlas on cluster
>> >                         B, and on cluster A it is not clear, because
>> >                         numpy.__config__.show() returns NOT AVAILABLE
>> >                         for anything.
>> >
>> >
>> >                         Does anybody has any insight on that, and if I
>> >                         can improve the performance on cluster B?
>> >
>> >
>> >                         _______________________________________________
>> >                         NumPy-Discussion mailing list
>> >                         NumPy-Discussion at scipy.org
>> >
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> >                 --
>> >                 Kind regards Nick
>> >
>> >                 _______________________________________________
>> >                 NumPy-Discussion mailing list
>> >                 NumPy-Discussion at scipy.org
>> >                 http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >         _______________________________________________
>> >         NumPy-Discussion mailing list
>> >         NumPy-Discussion at scipy.org
>> >         http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Kind regards Nick
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150430/5513c30f/attachment.html>


More information about the NumPy-Discussion mailing list