[Python-Dev] idea for data-type (data-format) PEP
oliphant at ee.byu.edu
Wed Nov 1 22:41:38 CET 2006
Martin v. Löwis wrote:
>Travis Oliphant schrieb:
>>>>r_field = PyDict_GetItemString(dtype,'r');
>>Actually it should read PyDict_GetItemString(dtype->fields). The
>>r_field is a tuple (data-type object, offset). The fields attribute is
>>(currently) a Python dictionary.
>Ok. This seems to be missing in the PEP.
Yeah, actually quite a bit is missing. Because I wanted to float the
idea for discussion before "getting the details perfect" (which of
course they wouldn't be if it was just my input producing them).
>In this code, where is PyArray_GetField coming from?
This is a NumPy Specific C-API. That's why I was confused about why
you wanted me to show how I would do it.
But, what you are actually asking is how would another application use
the data-type information to do the same thing using the data-type
object and a pointer to memory. Is that correct?
This is a reasonable thing to request. And your example is a good one.
I will use the PEP to explain it.
Ultimately, the code you are asking for will have to have some kind of
dispatch table for different binary code depending on the actual
data-types being shared (unless all that is needed is a copy in which
case just the size of the element area can be used). In my experience,
the dispatch table must be present for at least the "simple"
data-types. The data-types built up from there can depend on those.
In NumPy, the data-type objects have function pointers to accomplish all
the things NumPy does quickly. So, each data-type object in NumPy
points to a function-pointer table and the NumPy code defers to it to
actually accomplish the task (much like Python really).
Not all libraries will support working with all data-types. If they
don't support it, they just raise an error indicating that it's not
possible to share that kind of data.
> What does
>it do? If I wanted to write this code from scratch, what
>should I write instead? Since this is all about a flat
>memory block, I'm surprised I need "true" Python objects
>for the field values in there.
Well, actually, the block could be "strided" as well.
So, you would write something that gets the pointer to the memory and
then gets the extended information (dimensionality, shape, and strides,
and data-format object). Then, you would get the offset of the field
you are interested in from the start of the element (it's stored in the
Then do a memory copy from the right place (using the array iterator in
NumPy you can actually do it without getting the shape and strides
information first but I'm holding off on that PEP until an N-d array is
proposed for Python). I'll write something like that as an example and
put it in the PEP for the extended buffer protocol.
>>But, the other option (especially for code already written) would be to
>>just convert the data-format specification into it's own internal
>Ok, so your assumption is that consumers already have their own
>machinery, in which case ease-of-use would be the question how
>difficult it is to convert datatype objects into the internal
More information about the Python-Dev