[Numpy-discussion] NumPy re-factoring project

Sebastian Walter sebastian.walter at gmail.com
Sat Jun 12 16:12:50 EDT 2010


On Sat, Jun 12, 2010 at 3:57 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, Jun 12, 2010 at 10:27 PM, Sebastian Walter
> <sebastian.walter at gmail.com> wrote:
>> On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden <sturla at molden.no> wrote:
>>>
>>> I have a few radical suggestions:
>>>
>>> 1. Use ctypes as glue to the core DLL, so we can completely forget about
>>> refcounts and similar mess. Why put manual reference counting and error
>>> handling in the core? It's stupid.
>>
>> I totally agree, I  thought that the refactoring was supposed to provide
>> simple data structures and simple algorithms to perform the C equivalents of
>> sin,cos,exp, dot, +,-,*,/, dot, inv, ...
>>
>> Let me explain at an example what I expected:
>>
>> In the core C numpy library there would be new  "numpy_array" struct
>> with attributes
>>
>>  numpy_array->buffer
>>  numpy_array->dtype
>>  numpy_array->ndim
>>  numpy_array->shape
>>  numpy_array->strides
>>  numpy_array->owndata
>> etc.
>>
>> that replaces the current  PyArrayObject which contains Python C API stuff:
>>
>> typedef struct PyArrayObject {
>>        PyObject_HEAD
>>        char *data;             /* pointer to raw data buffer */
>>        int nd;                 /* number of dimensions, also called ndim */
>>        npy_intp *dimensions;       /* size in each dimension */
>>        npy_intp *strides;          /* bytes to jump to get to the
>>                                   next element in each dimension */
>>        PyObject *base;         /* This object should be decref'd
>>                                   upon deletion of array */
>>                                /* For views it points to the original array */
>>                                /* For creation from buffer object it points
>>                                   to an object that shold be decref'd on
>>                                   deletion */
>>                                /* For UPDATEIFCOPY flag this is an array
>>                                   to-be-updated upon deletion of this one */
>>        PyArray_Descr *descr;   /* Pointer to type structure */
>>        int flags;              /* Flags describing array -- see below*/
>>        PyObject *weakreflist;  /* For weakreferences */
>>        void *buffer_info;      /* Data used by the buffer interface */
>> } PyArrayObject;
>>
>>
>>
>> Example:
>> --------------
>>
>> If one calls the following Python code
>> x = numpy.zeros((N,M,K), dtype=float)
>> the memory allocation would be done on the Python side.
>>
>> Calling a ufunc like
>> y = numpy.sin(x)
>> would first allocate the memory for y on the Python side
>> and then call a C function a la
>> numpy_unary_ufunc( double (*fcn_ptr)(double), numpy_array *x, numpy_array *y)
>>
>> If y is already allocated, one would call
>> y = numpy.sin(x, out = y)
>>
>> Similarly z = x*y
>> would first allocate the memory for z and then call a C function a la
>> numpy_binary_ufunc( double (*fcn_ptr)(double, double), numpy_array *x,
>> numpy_array *y, numpy_array *z)
>>
>>
>> similarly other functions like dot:
>> z = dot(x,y, out = z)
>>
>> would simply call a C function a la
>> numpy_dot( numpy_array *x, numpy_array *y, numpy_array *z)
>>
>>
>> If one wants to use numpy functions on the C side only, one would use
>> the numpy_array struct manually.
>> I.e. one has to do the memory management oneself in C. Which is
>> perfectly ok since one is just interested in using
>> the algorithms.
>
> Anything non trivial will require memory allocation and object
> ownership conventions. If the goal is interoperation with other
> languages and vm, you may want to use something else than plain
> malloc, to interact better with the allocation strategies of the host
> platform (reference counting, garbage collection).

I'm just saying that the "host platform" could do the memory
management and not libnumpy.
I.e. libnumpy could be just a collection of algorithms.
Reimplementing half of the Python C API somehow doesn't feel right to me.

Those users who like to use C++ could write a class with methods that
internally call the
libnumpy functions:

-------------- example code -----------------
class Array{
numpy_array *_array;

public:
const Array operator+(Array &rhs) const {


    Array retval( ... arguments for the right type and dimensions of
the output...);
    numpy_add((*this)->_array, rhs->_array, retval->_array);
   return retval;
}
};
-------------- end code -----------------

I.e. let C++ do all the memory management and type inference but the
numpy core C API does the number crunching.
In other languages (Python, Ruby, R,  whatever) one would implement a
similar class.

I cannot speak for others, but something about these lines is what I'd
love to see since it would make it
relatively easy to use numpy functionality even in existing C/C++/R/Ruby codes.


Sebastian



>
>
>> The only reason I see for C++ is the possibility to use meta programming which
>> is very ill-designed. I'd rather like to see some simple code
>> preprocessing on C code than
>> C++ template meta programming.
>
> I don't think anyone is seriously considering changing languages.
> Especially if interoperation is desired, C is miles ahead of C++
> anyway.
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list