[Numpy-discussion] NumPy re-factoring project

Sat Jun 12 15:35:44 EDT 2010

On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn <
dagss at student.matnat.uio.no> wrote:

> Christopher Barker wrote:
> > David Cournapeau wrote:
> >>> In the core C numpy library there would be new  "numpy_array" struct
> >>> with attributes
> >>>
> >>>  numpy_array->buffer
> >
> >> Anything non trivial will require memory allocation and object
> >> ownership conventions.
> >
> > I totally agree -- I've been thinking for a while about a core array
> > data structure that you could use from C/C++, and would interact well
> > with numpy and Python -- it would be even better if it WAS numpy.
> >
> > I was thinking that at the root of it would be a "data_block" object
> > (the buffer in the above), that would have a reference counting system.
> > It would be its own system, but hopefully be able to link to Python's
> > easily when used with Python.
>
> I think taking PEP 3118, strip out the Python-specific things, and then
> add memory management conventions, would be a good starting point.
>
> Simply a simple header file/struct definition and specification, which
> could in time become a de facto way of exporting multidimensional array
> data between C libraries, between Fortran and C and so on (Kurt Smith's
> fwrap could easily be adapted to support it). The C-NumPy would then be a
> library on top of this spec (mainly ufuncs operating on such structs).
>
> The memory management conventions needs some thought, as you say, because
> of slices -- but a central memory allocator is not good enough because one
> would often be accessing memory that's allocated with other purposes in
> mind (and should not be deallocated, or deallocated in a special way). So
> refcounts + deallocator callback seems reasonable.
>
> (Not that I'm involved in this, just my 2 cents.)
>
>
This is more the way I see things, except I would divide the bottom layer
into two parts, views and memory. The memory can come from many places --
memmaps, user supplied buffers, etc. -- but we should provide a simple
reference counted allocator for the default. The views correspond more to
PEP 3118 and simply provide data types, dimensions, and strides, much as
arrays do now. However, I would confine the data types to those available in
C with a bit extra information as to precision, because.  Object arrays
would be a special case of pointer arrays (void pointer arrays?) and
structured arrays/Unicode might be a special case of char arrays. The more
complicated dtypes would then be built on top of those. Some things just
won't be portable, pointers in particular, but such is life.

As to languages, I think we should stay with C. C++ has much to offer for
this sort of thing but would be quite a big jump and maybe not as universal
as C. FORTRAN is harder to come by than C and older versions didn't have
such things as unsigned integers. I really haven't used FORTRAN since the 77
version, so haven't much idea what the modern version looks like, but I do
suspect we have more C programmers than FORTRAN programmers, and adding a
language translation on top of a design refactoring is just going to
complicate things.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100612/c345fdd4/attachment.html>