Martin v. Löwis wrote:
Travis E. Oliphant schrieb:
What if we look at this from the angle of trying to communicate data-formats between different libraries (not change the way anybody internally deals with data-formats).
ISTM that this is not the right approach. If the purpose of the datatype object is just to communicate the layout in the extended buffer interface, then it should be specified in that PEP, rather than being stand-alone, and it should not pretend to serve any other purpose.
I'm actually quite fine with that. If that is the consensus, then I will just go that direction. ISTM though that since we are putting forth the trouble inside the extended buffer protocol we might as well be as complete as we know how to be.
Or, if it does have uses independent of the buffer extension: what are those uses?
So that NumPy and ctypes and audio libraries and video libraries and database libraries and image-file format libraries can communicate about data-formats using the same expressions (in Python).
Maybe we decide that ctypes-based expressions are a very good way to communicate about those things in Python for all other packages. If that is the case, then I argue that we ought to change the array module, and the struct module to conform (of course keeping the old ways for backward compatibility) and set the standard for other packages to follow.
What problem do you have in defining a standard way to communicate about binary data-formats (not just images)? I still can't figure out why you are so resistant to the idea. MPI had to do it.
- We could define a special string-syntax (or list syntax) that covers
every special case. The array interface specification goes this direction and it requires no new Python types. This could also be seen as an extension of the "struct" module to allow for nested structures, etc.
- We could define a Python object that specifically carries data-format
To distinguish between these, convenience of usage (and of construction) should have to be taken into account. At least for the preferred alternative, but better for the runners-up, too, there should be a demonstration on how existing modules have to be changed to support it (e.g. for the struct and array modules as producers; not sure what good consumer code would be).
Absolutely --- if something is to be made useful across packages and from Python. This is where the discussion should take place. The struct module and array modules would both be consumers also so that in the struct module you could specify your structure in terms of the standard data-represenation and in the array module you could specify your array in terms of the standard representation instead of using "character codes".
Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B), what would the C code look like that does that? (it seems now that the primary purpose of this machinery is image manipulation)
For me it is definitely not image manipulation that is the only purpose (or even the primary purpose). It's just an easy one to explain --- most people understand images). But, I think this question is actually irrelevant (IMHO). To me, how you change all RGB values to gray would depend on the library you are using not on how data-formats are expressed.
Maybe we are still mis-understanding each other.
If you really want to know. In NumPy it might look like this:
img['r'] = img['g'] img['b'] = img['g']
use the Python C-API to do essentially the same thing as above or
to do img['r'] = img['g']
dtype = img->descr; r_field = PyDict_GetItemString(dtype,'r'); g_field = PyDict_GetItemString(dtype,'g'); r_field_dtype = PyTuple_GET_ITEM(r_field, 0); r_field_offset = PyTuple_GET_ITEM(r_field, 1); g_field_dtype = PyTuple_GET_ITEM(g_field, 0); g_field_offset = PyTuple_GET_ITEM(g_field, 1); obj = PyArray_GetField(img, g_field, g_field_offset); Py_INCREF(r_field) PyArray_SetField(img, r_field, r_field_offset, obj);
But, I still don't see how that is relevant to the question of how to represent the data-format to share that information across two extensions.
The problem with 2b is that what works inside an extension module may not be the best option when it comes to communicating across multiple extension modules. Certainly none of the extension modules have argued that case effectively.
I think there are two ways in which one option could be "better" than the other: it might be more expressive, and it might be easier to use. For the second aspect (ease of use), there are two subways: it might be easier to produce, or it might be easier to consume.
I like this as a means to judge a data-format representation. Let me summarize to see if I understand:
1) Expressive (does it express every data-format you might want or need) 2) Ease of use a) Production: How easy is it to create the representation. b) Consumption: How easy is it to interpret the representation.