I am working on performance parity between numpy scalar/small array and python array as GSOC mentored By Charles.
Currently I am looking at PyArray_Return, which allocate separate memory just for scalar return. Unlike python which allocate memory once for returning result of scalar operations; numpy calls malloc twice once for the array object itself, and a second time for the array data.
These memory allocations are happening in PyArray_NewFromDescr and PyArray_Scalar. Stashing both within a single allocation would be more efficient. In, PyArray_Scalar, new struct (PyLongScalarObject) need allocation in case of scalar arrays. Instead, can we just some how convert/cast PyArrayObject to PyLongScalarObject.??