Jim Jewett wrote:
I'm still not sure exactly what is missing from ctypes. To make this concrete:
I think the only thing missing from ctypes "expressiveness" as far as I can tell in terms of what you "can" do is the byte-order representation.
What is missing is ease-of use for producers and consumers in interpreting the data-type. When I speak of Producers and consumers, I'm largely talking about C-code (or Java or .NET) code writers.
Producers must basically use Python code to create classes of various types. This is going to be slow in 'C'. Probably slower than the array interface (which is what we have no informally).
Consumers are going to have a hard time interpreting the result. I'm not even sure how to do that, in fact. I'd like NumPy to be able to understand ctypes as a means to specify data. Would I have to check against all the sub-types of CDataType, pull out the fields, check the tp_name of the type object? I'm not sure.
It seems like a string with the C-structure would be better as a data-representation, but then a third-party library would want to parse that so that Python might as well have it's own parser for data-types.
So, Python might as well have it's own way to describe data. My claim is this default way should *not* be overloaded by using Python type-objects (the ctypes way). I'm making a claim that the NumPy way of using a different Python object to describe data-types. I'm not saying the NumPy object should be used. I'm saying we should come up with a singe DataFormatType whose instances express the data formats in ways that other packages can produce and consume (or even use internally).
It would be easy for NumPy to "use" the default Python object in it's PyArray_Descr * structure. It would also be easy for ctypes to "use" the default Python object in its StgDict object that is the tp_dict of every ctypes type object.
It would be easy for the struct module to allow for this data-format object (instead of just strings) in it's methods.
It would be easy for the array module to accept this data-format object (instead of just typecodes) in it's constructor.
Lot's of things would suddenly be more consistent throughout both the Python and C-Python user space.
Perhaps after discussion, it becomes clear that the ctypes approach is sufficient to be "that thing" that all modules use to share data-format information. It's definitely expressive enough. But, my argument is that NumPy data-type objects are also "pretty close." so why should they be rejected. We could also make a "string-syntax" do it.
You have said that creating whole classes is too much overhead, and the description should only be an instance. To me, that particular class (arrays of 500 structs) still looks pretty lightweight. So please clarify when it starts to be a problem.
(1) For simple types -- mapping char name; ==> ("name", c_char*30)
Do you object to using the c_char type? Do you object to the array-of-length-30 class, instead of just having a repeat or shape attribute? Do you object to naming the field?
(2) For the complex types, nested and struct
Do you object to creating these two classes even once? For example, are you expecting to need different classes for each buffer, and to have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface. It's much to slow to be used on the C-level for sharing many small buffers quickly.
Is creating that new class a royal pain, but frequent (and slow) enough that you can't just make a call into python (or ctypes)?
(3) Given that you will describe X, is X*500 (==> a type describing an array of 500 Xs) a royal pain in C? If so, are you expecting to have to do it dynamically for many sizes, and quickly enough that you can't just let ctypes do it for you?
That pretty much sums it up (plus the pain of having to basically write Python code from "C").