[Numpy-discussion] Memory allocation cleanup

Frédéric Bastien nouiz at nouiz.org
Fri Jan 10 09:52:23 EST 2014

On Fri, Jan 10, 2014 at 4:18 AM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > [...]
>> After a bit more research, some further points to keep in mind:
>> Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free,
>> and PyDataMem_* is an alias for malloc/free with some extra tracing
>> hooks wrapped around it. (AFAIK, these tracing hooks are not used by
>> anyone anywhere -- at least, if they are I haven't heard about it, and
>> there is no code on github that uses them.)
>> There is one substantial difference between the PyMem_* and PyObject_*
>> interfaces as compared to malloc(), which is that the Py* interfaces
>> require that the GIL be held when they are called. (@Julian -- I think
>> your PR we just merged fulfills this requirement, is that right?)
> I only replaced object allocation which should always be called under GIL,
> not sure about nditer construction, but it does uses python exceptions for
> errors which I think also require the GIL.
>  [...]
>> Also, none of the Py* interfaces implement calloc(), which is annoying
>> because it messes up our new optimization of using calloc() for
>> np.zeros. [...]
> Another thing that is not directly implemented in Python is aligned
> allocation. This is going to get increasingly important with the advent
> heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
> malloc being optimized for the oldish SSE (16 bytes). I want to change the
> array buffer allocation to make use of posix_memalign and C11 aligned_malloc
> if available to avoid some penalties when loading from non 32 byte aligned
> buffers. I could imagine it might also help coprocessors and gpus to have
> higher alignments, but I'm not very familiar with that type of hardware.
> The allocator used by the Python3.4 is plugable, so we could implement our
> special allocators with the new API, but only when 3.4 is more widespread.

About the co-processor and GPUs, it could help, but as NumPy is CPU
only and that there is other problem in directly using it, I dought
that this change would help code around co-processor/GPUs.


More information about the NumPy-Discussion mailing list