[Numpy-discussion] Memory allocation cleanup
jtaylor.debian at googlemail.com
Fri Jan 10 04:18:05 EST 2014
On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > [...]
> After a bit more research, some further points to keep in mind:
> Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free,
> and PyDataMem_* is an alias for malloc/free with some extra tracing
> hooks wrapped around it. (AFAIK, these tracing hooks are not used by
> anyone anywhere -- at least, if they are I haven't heard about it, and
> there is no code on github that uses them.)
> There is one substantial difference between the PyMem_* and PyObject_*
> interfaces as compared to malloc(), which is that the Py* interfaces
> require that the GIL be held when they are called. (@Julian -- I think
> your PR we just merged fulfills this requirement, is that right?)
I only replaced object allocation which should always be called under GIL,
not sure about nditer construction, but it does uses python exceptions for
errors which I think also require the GIL.
> Also, none of the Py* interfaces implement calloc(), which is annoying
> because it messes up our new optimization of using calloc() for
> np.zeros. [...]
Another thing that is not directly implemented in Python is aligned
allocation. This is going to get increasingly important with the advent
heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
malloc being optimized for the oldish SSE (16 bytes). I want to change the
array buffer allocation to make use of posix_memalign and C11
aligned_malloc if available to avoid some penalties when loading from non
32 byte aligned buffers. I could imagine it might also help coprocessors
and gpus to have higher alignments, but I'm not very familiar with that
type of hardware.
The allocator used by the Python3.4 is plugable, so we could implement our
special allocators with the new API, but only when 3.4 is more widespread.
For this reason and missing calloc I don't think we should use the Python
API for data buffers just yet. Any benefits are relatively small anyway.
> I'm pretty sure that the vast majority of our allocations do occur
> with GIL protection, so we might want to switch to using PyObject_*
> for most cases to take advantage of the small-object optimizations,
> and use PyRawMem_* for any non-GIL cases (like possibly ufunc
> buffers), with a compatibility wrapper to replace PyRawMem_* with
> malloc() on pre-3.4 pythons. Of course this will need some profiling
> to see if PyObject_* is actually better than malloc() in practice.
I don't think its required to replace everything with PyObject_* just
because it can be faster. We should do it only in places where it really
makes a difference and there are not that many of them.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion