[Numpy-discussion] NumPy re-factoring project

Sat Jun 12 15:41:37 EDT 2010

On Sat, Jun 12, 2010 at 1:35 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn <
> dagss at student.matnat.uio.no> wrote:
>
>> Christopher Barker wrote:
>> > David Cournapeau wrote:
>> >>> In the core C numpy library there would be new  "numpy_array" struct
>> >>> with attributes
>> >>>
>> >>>  numpy_array->buffer
>> >
>> >> Anything non trivial will require memory allocation and object
>> >> ownership conventions.
>> >
>> > I totally agree -- I've been thinking for a while about a core array
>> > data structure that you could use from C/C++, and would interact well
>> > with numpy and Python -- it would be even better if it WAS numpy.
>> >
>> > I was thinking that at the root of it would be a "data_block" object
>> > (the buffer in the above), that would have a reference counting system.
>> > It would be its own system, but hopefully be able to link to Python's
>> > easily when used with Python.
>>
>> I think taking PEP 3118, strip out the Python-specific things, and then
>> add memory management conventions, would be a good starting point.
>>
>> Simply a simple header file/struct definition and specification, which
>> could in time become a de facto way of exporting multidimensional array
>> data between C libraries, between Fortran and C and so on (Kurt Smith's
>> fwrap could easily be adapted to support it). The C-NumPy would then be a
>> library on top of this spec (mainly ufuncs operating on such structs).
>>
>> The memory management conventions needs some thought, as you say, because
>> of slices -- but a central memory allocator is not good enough because one
>> would often be accessing memory that's allocated with other purposes in
>> mind (and should not be deallocated, or deallocated in a special way). So
>> refcounts + deallocator callback seems reasonable.
>>
>> (Not that I'm involved in this, just my 2 cents.)
>>
>>
> This is more the way I see things, except I would divide the bottom layer
> into two parts, views and memory. The memory can come from many places --
> memmaps, user supplied buffers, etc. -- but we should provide a simple
> reference counted allocator for the default. The views correspond more to
> PEP 3118 and simply provide data types, dimensions, and strides, much as
> arrays do now. However, I would confine the data types to those available in
> C with a bit extra information as to precision, because.  Object arrays
> would be a special case of pointer arrays (void pointer arrays?) and
> structured arrays/Unicode might be a special case of char arrays. The more
> complicated dtypes would then be built on top of those. Some things just
> won't be portable, pointers in particular, but such is life.
>
> As to languages, I think we should stay with C. C++ has much to offer for
> this sort of thing but would be quite a big jump and maybe not as universal
> as C. FORTRAN is harder to come by than C and older versions didn't have
> such things as unsigned integers. I really haven't used FORTRAN since the 77
> version, so haven't much idea what the modern version looks like, but I do
> suspect we have more C programmers than FORTRAN programmers, and adding a
> language translation on top of a design refactoring is just going to
> complicate things.
>
>
Oh, and we should have iterators for the views. So the base would be memory
+ views + iterators.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100612/780333ce/attachment.html>