Torgil Svensson wrote:
On 1/11/07, Charles R Harris
wrote: On 1/11/07, Torgil Svensson
wrote: Sure. I'm not objecting the memory model, what I mean is that data access between modules has a wider scope than just a memory model. Maybe i'm completely out-of-scope here, I thought this was worth considering for the inter-module-data-sharing - scope.
This is where separating the memory block from the API starts to show advantages. OTOH, we should try to keep this all as simple and basic as possible. Trying to design for every potential use will lead to over design, it is a fine line to walk.
I Agree. I'm trying to look after a use case of my own here where I have a huge array (won't fit memory) with data that is very easy to compress (easily fit in memory). OTOH, I have yet no need to share this between modules but a simple data access API opens up a variety of options.
I think this is a good idea generally. I think the PIL would be much more open to this kind of API becauase the memory model of the PIL is different than ours. On the other hand, I think it would be a shame to not provide a basic N-d array memory model like NumPy has because it is used so often.
I my mindset, I can slice and dice my huge array and the implementation behind the data access API will choose between having the views represented internally as intervals or lists of indexes.
So i'm +1 for having all information concerning nd-array access on a logical level (shapes) in one API and let the memory-layout-details (strides, FORTRAN, C etc) live in another API and a module that wants to try to skip the api overhead (numpy) can always do something like:
I had originally thought to separate these out in to multiple calls anyway. Perhaps we could propose the same thing. Have a full struct interface as one option and a multiple-call interface like you propose as another.
memory_interface=array_interface->get_memory_layout() if (memory_interface) { ... use memory_interface->strides ... etc } else { ... use array_interface->get_item_fom_index() ... etc }
I'm guessing that most of the modules trying to access an array will choose to go through numpy for fast operations.
Another use of an api is to do things like give an "RGB"-view of an image regardless of which weird image format lying below without having to convert the whole image in-memory and loose precision or memory. This is true. So at what level do we propose the API. Single-item access for sure, but what about
array_interface->get_block_from_slice() ? Such a thing would be very useful for all kinds of large data-sets, from images, and videos, to scientific data-sets.
If we want the whole in-memory-RGB-copy we could just take the RGB-view, pass it to numpy and force numpy to do a copy. The module can then, in either case, operate on the image through numpy or return a numpy object to the user. (numpy is of course integrated in python by then)
Getting this array_interface into Python goes a long way into making that happen, I think. -Travis