Re: [Numpy-discussion] performance of numpy.array()

29 Apr 2015

      On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista  wrote:
...
I work on two distinct scientific clusters. I have run the same python
code on the two clusters and I have noticed that one is faster by an order
of magnitude than the other (1min vs 10min, this is important because I run
this function many times).
...
I have investigated with a profiler and I have found that the cause of
this is that (same code and same data) is the function numpy.array that is
being called 10^5 times. On cluster A it takes 2 s in total, whereas on
cluster B it takes ~6 min.  For what regards the other functions, they are
generally faster on cluster A. I understand that the clusters are quite
different, both as hardware and installed libraries. It strikes me that on
this particular function the performance is so different. I would have
though that this is due to a difference in the available memory, but
actually by looking with `top` the memory seems to be used only at 0.1% on
cluster B. In theory numpy is compiled with atlas on cluster B, and on
cluster A it is not clear, because numpy.__config__.show() returns NOT
AVAILABLE for anything.
...
Does anybody has any insight on that, and if I can improve the
performance on cluster B?

Check to see if you have the "Transparent Hugepages" (THP) Linux kernel
feature enabled on each cluster. You may want to try turning it off. I have
recently run into a problem with a large-memory multicore machine with THP
for programs that had many large numpy.array() memory allocations. Usually,
THP helps memory-hungry applications (you can Google for the reasons), but
it does require defragmenting the memory space to get contiguous hugepages.
The system can get into a state where the memory space is so fragmented
such that trying to get each new hugepage requires a lot of extra work to
create the contiguous memory regions. In my case, a perfectly
well-performing program would suddenly slow down immensely during it's
memory-allocation-intensive actions. When I turned THP off, it started
working normally again.

If you have root, try using `perf top` to see what C functions in user
space and kernel space are taking up the most time in your process. If you
see anything like `do_page_fault()`, this, or a similar issue, is your
problem.

--
Robert Kern