On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
Hi,
On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma <arinkverma@gmail.com> wrote:
arithmetic, then set OWNDATA to false. So, that two mallocs has been mentioned in project introduction. I got
Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer that wrong.
On further thought/reading the code, it appears to be more complicated than that, actually.
It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 for the array object itself, and one for the shapes + strides. And, one call to regular-old malloc: for the data buffer.
(Mysteriously, shapes + strides together have 2*ndim elements, but to hold them we allocate a memory region sized to hold 3*ndim elements. I'm not sure why.)
And contrary to what I said earlier, this is about as optimized as it can be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, because the shapes+strides may need to be resized without affecting the much larger data area. But it's tempting to allocate the array object and the data buffer in a single memory region, like I suggested earlier. And this would ALMOST work. But, it turns out there is code out there which assumes (whether wisely or not) that you can swap around which data buffer a given PyArrayObject refers to (hi Theano!). And supporting this means that data buffers and PyArrayObjects need to be in separate memory regions.
Are you sure that Theano "swap" the data ptr of an ndarray? When we play with that, it is on a newly create ndarray. So a node in our graph, won't change the input ndarray structure. It will create a new ndarray structure with new shape/strides and pass a data ptr and we flag the new ndarray with own_data correctly to my knowledge.
If Theano pose a problem here, I'll suggest that I fix Theano. But currently I don't see the problem. So if this make you change your mind about this optimization, tell me. I don't want Theano to prevent optimization in NumPy.
It's entirely possible I misunderstood, so let's see if we can work it out. I know that you want to assign to the ->data pointer in a PyArrayObject, right? That's what caused some trouble with the 1.7 API deprecations, which were trying to prevent direct access to this field? Creating a new array given a pointer to a memory region is no problem, and obviously will be supported regardless of any optimizations. But if that's all you were doing then you shouldn't have run into the deprecation problem. Or maybe I'm misremembering! The problem is if one wants to (a) create a PyArrayObject, which will by default allocate a new memory region and assign a pointer to it to the ->data field, and *then* (b) "steal" that memory region and replace it with another one, while keeping the same PyArrayObject. This is technically possible right now (though I wouldn't say it was necessarily a good idea!), but it would become impossible if we allocated the PyArrayObject and data into a single region. The profiles suggest that this would only make allocation of arrays maybe 15% faster, with probably a similar effect on deallocation. And I'm not sure how often array allocation per se is actually a bottleneck -- usually you also do things with the arrays, which is more expensive :-). But hey, 15% is nothing to sneeze at. -n