[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

Nathaniel Smith njs at pobox.com
Thu Jul 18 09:36:50 EDT 2013

On Wed, Jul 17, 2013 at 5:57 PM, Frédéric Bastien <nouiz at nouiz.org> wrote:
> On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> >
>> > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> It's entirely possible I misunderstood, so let's see if we can work it
>> out. I know that you want to assign to the ->data pointer in a
>> PyArrayObject, right? That's what caused some trouble with the 1.7 API
>> deprecations, which were trying to prevent direct access to this
>> field? Creating a new array given a pointer to a memory region is no
>> problem, and obviously will be supported regardless of any
>> optimizations. But if that's all you were doing then you shouldn't
>> have run into the deprecation problem. Or maybe I'm misremembering!
> What is currently done at only 1 place is to create a new PyArrayObject with
> a given ptr. So NumPy don't do the allocation. We later change that ptr to
> another one.

Hmm, OK, so that would still work. If the array has the OWNDATA flag
set (or you otherwise know where the data came from), then swapping
the data pointer would still work.

The change would be that in most cases when asking numpy to allocate a
new array from scratch, the OWNDATA flag would not be set. That's
because the OWNDATA flag really means "when this object is
deallocated, call free(self->data)", but if we allocate the array
struct and the data buffer together in a single memory region, then
deallocating the object will automatically cause the data buffer to be
deallocated as well, without the array destructor having to take any
special effort.

> It is the change to the ptr of the just created PyArrayObject that caused
> problem with the interface deprecation. I fixed all other problem releated
> to the deprecation (mostly just rename of function/macro). But I didn't
> fixed this one yet. I would need to change the logic to compute the final
> ptr before creating the PyArrayObject object and create it with the final
> data ptr. But in call cases, NumPy didn't allocated data memory for this
> object, so this case don't block your optimization.


> One thing in our optimization "wish list" is to reuse allocated
> PyArrayObject between Theano function call for intermediate results(so
> completly under Theano control). This could be useful in particular for
> reshape/transpose/subtensor. Those functions are pretty fast and from
> memory, I already found the allocation time was significant. But in those
> cases, it is on PyArrayObject that are views, so the metadata and the data
> would be in different memory region in all cases.
> The other cases of optimization "wish list"  is if  we want to reuse the
> PyArrayObject when the shape isn't the good one (but the number of
> dimensions is the same). If we do that for operation like addition, we will
> need to use PyArray_Resize(). This will be done on PyArrayObject whose data
> memory was allocated by NumPy. So if you do one memory allowcation for
> metadata and data, just make sure that PyArray_Resize() will handle that
> correctly.

I'm not sure I follow the details here, but it does turn out that a
really surprising amount of time in PyArray_NewFromDescr is spent in
just calculating and writing out the shape and strides buffers, so for
programs that e.g. use hundreds of small 3-element arrays to represent
points in space, re-using even these buffers might be a big win...

> On the usefulness of doing only 1 memory allocation, on our old gpu ndarray,
> we where doing 2 alloc on the GPU, one for metadata and one for data. I
> removed this, as this was a bottleneck. allocation on the CPU are faster the
> on the GPU, but this is still something that is slow except if you reuse
> memory. Do PyMem_Malloc, reuse previous small allocation?

Yes, at least in theory PyMem_Malloc is highly-optimized for small
buffer re-use. (For requests >256 bytes it just calls malloc().) And
it's possible to define type-specific freelists; not sure if there's
any value in doing that for PyArrayObjects. See Objects/obmalloc.c in
the Python source tree.


More information about the NumPy-Discussion mailing list