[Numpy-discussion] NumPy re-factoring project
Dag Sverre Seljebotn
dagss at student.matnat.uio.no
Sat Jun 12 16:56:02 EDT 2010
Charles Harris wrote:
> On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn <
> dagss at student.matnat.uio.no> wrote:
>
>> Christopher Barker wrote:
>> > David Cournapeau wrote:
>> >>> In the core C numpy library there would be new "numpy_array" struct
>> >>> with attributes
>> >>>
>> >>> numpy_array->buffer
>> >
>> >> Anything non trivial will require memory allocation and object
>> >> ownership conventions.
>> >
>> > I totally agree -- I've been thinking for a while about a core array
>> > data structure that you could use from C/C++, and would interact well
>> > with numpy and Python -- it would be even better if it WAS numpy.
>> >
>> > I was thinking that at the root of it would be a "data_block" object
>> > (the buffer in the above), that would have a reference counting
>> system.
>> > It would be its own system, but hopefully be able to link to Python's
>> > easily when used with Python.
>>
>> I think taking PEP 3118, strip out the Python-specific things, and then
>> add memory management conventions, would be a good starting point.
>>
>> Simply a simple header file/struct definition and specification, which
>> could in time become a de facto way of exporting multidimensional array
>> data between C libraries, between Fortran and C and so on (Kurt Smith's
>> fwrap could easily be adapted to support it). The C-NumPy would then be
>> a
>> library on top of this spec (mainly ufuncs operating on such structs).
>>
>> The memory management conventions needs some thought, as you say,
>> because
>> of slices -- but a central memory allocator is not good enough because
>> one
>> would often be accessing memory that's allocated with other purposes in
>> mind (and should not be deallocated, or deallocated in a special way).
>> So
>> refcounts + deallocator callback seems reasonable.
>>
>> (Not that I'm involved in this, just my 2 cents.)
>>
>>
> This is more the way I see things, except I would divide the bottom layer
> into two parts, views and memory. The memory can come from many places --
> memmaps, user supplied buffers, etc. -- but we should provide a simple
> reference counted allocator for the default. The views correspond more to
> PEP 3118 and simply provide data types, dimensions, and strides, much as
> arrays do now. However, I would confine the data types to those available
> in
> C with a bit extra information as to precision, because. Object arrays
> would be a special case of pointer arrays (void pointer arrays?) and
> structured arrays/Unicode might be a special case of char arrays. The more
> complicated dtypes would then be built on top of those. Some things just
> won't be portable, pointers in particular, but such is life.
>
> As to languages, I think we should stay with C. C++ has much to offer for
> this sort of thing but would be quite a big jump and maybe not as
> universal
> as C. FORTRAN is harder to come by than C and older versions didn't have
> such things as unsigned integers. I really haven't used FORTRAN since the
> 77
> version, so haven't much idea what the modern version looks like, but I do
> suspect we have more C programmers than FORTRAN programmers, and adding a
> language translation on top of a design refactoring is just going to
> complicate things.
I'm not sure how Fortran got into this, but if it was from what I wrote: I
certainly didn't suggest any part of NumPy is written in Fortran. Sorry
for causing confusion.
What I meant: If there's an "ndarray.h" which basically just contains the
minimum C version of PEP 3118 for passing around and accessing arrays.
Without any runtime dependencies -- of course, writing code using it will
be much easier when using the C-NumPy runtime library, but for simply
exporting data from an existing library one could do without the runtime
dependency.
(To be more precise about fwrap: In fwrap there's a need to pass
multidimensional, flexible-size array data back and forth between
(autogenerated) Fortran and (autogenerated) C. It currently defines its
own structs (or extra arguments, but boils down to the same thing...) for
this purpose, but if an "ndarray.h" could create a standard for passing
array data then that would be a natural choice instead.)
Dag Sverre
More information about the NumPy-Discussion
mailing list