[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array
nouiz at nouiz.org
Tue Jul 16 14:53:30 EDT 2013
On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma <arinkverma at gmail.com> wrote:
>> >Each ndarray does two mallocs, for the obj and buffer. These could be
>> combined into 1 - just allocate the total size and do some pointer
>> >arithmetic, then set OWNDATA to false.
>> So, that two mallocs has been mentioned in project introduction. I got
>> that wrong.
> On further thought/reading the code, it appears to be more complicated
> than that, actually.
> It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1
> for the array object itself, and one for the shapes + strides. And, one
> call to regular-old malloc: for the data buffer.
> (Mysteriously, shapes + strides together have 2*ndim elements, but to hold
> them we allocate a memory region sized to hold 3*ndim elements. I'm not
> sure why.)
> And contrary to what I said earlier, this is about as optimized as it can
> be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc,
> because the shapes+strides may need to be resized without affecting the
> much larger data area. But it's tempting to allocate the array object and
> the data buffer in a single memory region, like I suggested earlier. And
> this would ALMOST work. But, it turns out there is code out there which
> assumes (whether wisely or not) that you can swap around which data buffer
> a given PyArrayObject refers to (hi Theano!). And supporting this means
> that data buffers and PyArrayObjects need to be in separate memory regions.
Are you sure that Theano "swap" the data ptr of an ndarray? When we play
with that, it is on a newly create ndarray. So a node in our graph, won't
change the input ndarray structure. It will create a new ndarray structure
with new shape/strides and pass a data ptr and we flag the new ndarray with
own_data correctly to my knowledge.
If Theano pose a problem here, I'll suggest that I fix Theano. But
currently I don't see the problem. So if this make you change your mind
about this optimization, tell me. I don't want Theano to prevent
optimization in NumPy.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion