[Python-Dev] PEP: Adding data-type objects to Python
oliphant.travis at ieee.org
Wed Nov 1 19:50:16 CET 2006
Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes. To make this
I think the only thing missing from ctypes "expressiveness" as far as I
can tell in terms of what you "can" do is the byte-order representation.
What is missing is ease-of use for producers and consumers in
interpreting the data-type. When I speak of Producers and consumers,
I'm largely talking about C-code (or Java or .NET) code writers.
Producers must basically use Python code to create classes of various
types. This is going to be slow in 'C'. Probably slower than the
array interface (which is what we have no informally).
Consumers are going to have a hard time interpreting the result. I'm
not even sure how to do that, in fact. I'd like NumPy to be able to
understand ctypes as a means to specify data. Would I have to check
against all the sub-types of CDataType, pull out the fields, check the
tp_name of the type object? I'm not sure.
It seems like a string with the C-structure would be better as a
data-representation, but then a third-party library would want to parse
that so that Python might as well have it's own parser for data-types.
So, Python might as well have it's own way to describe data. My claim
is this default way should *not* be overloaded by using Python
type-objects (the ctypes way). I'm making a claim that the NumPy way of
using a different Python object to describe data-types. I'm not saying
the NumPy object should be used. I'm saying we should come up with a
singe DataFormatType whose instances express the data formats in ways
that other packages can produce and consume (or even use internally).
It would be easy for NumPy to "use" the default Python object in it's
PyArray_Descr * structure. It would also be easy for ctypes to "use"
the default Python object in its StgDict object that is the tp_dict of
every ctypes type object.
It would be easy for the struct module to allow for this data-format
object (instead of just strings) in it's methods.
It would be easy for the array module to accept this data-format object
(instead of just typecodes) in it's constructor.
Lot's of things would suddenly be more consistent throughout both the
Python and C-Python user space.
Perhaps after discussion, it becomes clear that the ctypes approach is
sufficient to be "that thing" that all modules use to share data-format
information. It's definitely expressive enough. But, my argument is
that NumPy data-type objects are also "pretty close." so why should they
be rejected. We could also make a "string-syntax" do it.
> You have said that creating whole classes is too much overhead, and
> the description should only be an instance. To me, that particular
> class (arrays of 500 structs) still looks pretty lightweight. So
> please clarify when it starts to be a problem.
> (1) For simple types -- mapping
> char name; ==> ("name", c_char*30)
> Do you object to using the c_char type?
> Do you object to the array-of-length-30 class, instead of just having
> a repeat or shape attribute?
> Do you object to naming the field?
> (2) For the complex types, nested and struct
> Do you object to creating these two classes even once? For example,
> are you expecting to need different classes for each buffer, and to
> have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface.
It's much to slow to be used on the C-level for sharing many small
> Is creating that new class a royal pain, but frequent (and slow)
> enough that you can't just make a call into python (or ctypes)?
> (3) Given that you will describe X, is X*500 (==> a type describing
> an array of 500 Xs) a royal pain in C? If so, are you expecting to
> have to do it dynamically for many sizes, and quickly enough that you
> can't just let ctypes do it for you?
That pretty much sums it up (plus the pain of having to basically write
Python code from "C").
More information about the Python-Dev