[Numpy-discussion] Array vectorization in numpy

Wed Jul 20 12:08:18 EDT 2011

Wed, 20 Jul 2011 11:31:41 +0000, Pauli Virtanen wrote:
[clip]
> There is a sharp order-of-magnitude change of speed in malloc+memset of
> an array, which is not present in memset itself. (This is then also
> reflected in the Numpy performance -- floating point operations probably
> don't cost much compared to memory access speed.) It seems that either
> the kernel or the C library changes the way it handles allocation at
> that point.

The explanation seems to be the following:

(a) When the process adjusts the size of its heap, the kernel must zero
    new pages it gives to the process (because they might contain
    sensitive information from other processes) [1]

(b) GNU libc hangs onto some memory even after free() is called,
    so that the heap size doesn't need to be adjusted continuously.
    This is controlled by parameters that can be tuned with the
    mallopt() function. [2]

Because of (a), there is a performance hit probably equivalent 
to `memset(buf, 0, size)` or more (kernel overheads?) for using
newly allocated memory the first time. But because of (b), this
hit mainly applies to buffers larger than some threshold.

Preallocating can get rid of this overhead, but it probably only matters
in places where you reuse the same memory many times, and the operations
done are not much more expensive than whatever the kernel needs to do.

Alternatively, you can call

    mallopt(M_TRIM_THRESHOLD, N);
    mallopt(M_TOP_PAD, N);
    mallopt(M_MMAP_MAX, 0);

with large enough `N`, and let libc manage the memory reuse for you.

.. [1] http://stackoverflow.com/questions/1327261

.. [2] http://www.gnu.org/s/hello/manual/libc/Malloc-Tunable-Parameters.html

-- 
Pauli Virtanen