On 1/11/07, Travis Oliphant
Torgil Svensson wrote:
Example1: We have a very large amount of data with a compressed internal representation
Example2: We might want to generate data "on the fly" as it's needed
Example3: If module creators to deal with different byte alignments, contiguousness etc it'll lead to lots of code duplication and unnecessarily much work
Is it possible to add a data access API to this PEP?> Could you give an example of what you mean? I have no problem with such a concept. I'm mainly interested in getting the NumPy memory model into Python some-how. I know it's not the "only" way to think about memory, but it is a widely-used and useful way.
Sure. I'm not objecting the memory model, what I mean is that data access between modules has a wider scope than just a memory model. Maybe i'm completely out-of-scope here, I thought this was worth considering for the inter-module-data-sharing - scope. Say we want to access a huge array with 1 million text-strings from another module that has a compressed representation in memory. Here's a pseudo-code-example with most of the details completely made up: buffer = AnotherModule_GetBigArrayAsBuffer() aview=buffer->bf_getarrayview() indexes=NewList() for(i=0; i<aview->shape[0] ; ++i) for(j=0; j<aview->shape[1] ; ++j) { item=aview->get_from_index(i) /* item represents the data described by the PyDataFormatObject */ if (is_interesting_item(item)) ListAdd(indexes,NewList(i,j)) } indexarr=Numpy_ArrayFromLists(indexes) Here, we don't have to care about any data layout-issues; called module could even produce data on-the-fly. If I want direct memory access we could use a function that returns data, strides and flags.