Hi,

I am working on performance parity between numpy scalar/small array and python array as GSOC mentored By Charles. 

Currently I am looking at PyArray_Return, which allocate separate memory just for scalar return. Unlike python which allocate memory once  for returning result of  scalar operations; numpy calls malloc twice once for the array object itself, and a second time for the array data. 

These memory allocations are happening in PyArray_NewFromDescr and PyArray_Scalar. Stashing both within a single allocation would be more efficient.
In, PyArray_Scalar, new struct (PyLongScalarObject) need allocation in case of scalar arrays.  Instead, can we just some how convert/cast PyArrayObject to 
PyLongScalarObject.??

--

Arink Verma
www.arinkverma.in