On 16 Jul 2013 11:35, "Arink Verma" <arinkverma@gmail.com> wrote:
>
> Hi,
>
> I am working on performance parity between numpy scalar/small array and python array as GSOC mentored By Charles. 
>
> Currently I am looking at PyArray_Return, which allocate separate memory just for scalar return. Unlike python which allocate memory once  for returning result of  scalar operations; numpy calls malloc twice once for the array object itself, and a second time for the array data. 
>
> These memory allocations are happening in PyArray_NewFromDescr and PyArray_Scalar. Stashing both within a single allocation would be more efficient.
> In, PyArray_Scalar, new struct (PyLongScalarObject) need allocation in case of scalar arrays.  Instead, can we just some how convert/cast PyArrayObject to 
> PyLongScalarObject.??

I think there are more than 2 mallocs you're talking about here?

Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer arithmetic, then set OWNDATA to false.

Converting array to scalar does more allocations. I doubt there's a way to avoid these, but can't say for sure (on my phone now). In any case the idea of the project is to make scalars obsolete by making arrays competitive, right? So no need to go optimizing the competition ;-). (And more seriously, this slowdown *only* exists because of the array/scalar split, so ignoring it is fair.)

In the bigger picture, these are pretty tiny optimizations, aren't they? In the quick profiling I did a while ago, it looked like there was a lot of much bigger low-hanging fruit, and fiddling around with one malloc versus two isn't going to do much if we're still wasting an order of magnitude more time in inefficient loop selection and unnecessary writes to the FP control word?

-n