[Python-Dev] PEP: Adding data-type objects to Python

Travis Oliphant oliphant.travis at ieee.org
Wed Nov 1 19:50:16 CET 2006


Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this 
> concrete:

I think the only thing missing from ctypes "expressiveness" as far as I 
can tell in terms of what you "can" do is the byte-order representation.

What is missing is ease-of use for producers and consumers in 
interpreting the data-type.   When I speak of Producers and consumers, 
I'm largely talking about C-code (or Java or .NET) code writers.

Producers must basically use Python code to create classes of various 
types.   This is going to be slow in 'C'.  Probably slower than the 
array interface (which is what we have no informally).

Consumers are going to have a hard time interpreting the result.  I'm 
not even sure how to do that, in fact.  I'd like NumPy to be able to 
understand ctypes as a means to specify data.  Would I have to check 
against all the sub-types of CDataType, pull out the fields, check the 
tp_name of the type object?  I'm not sure.

It seems like a string with the C-structure would be better as a 
data-representation, but then a third-party library would want to parse 
that so that Python might as well have it's own parser for data-types. 

So, Python might as well have it's own way to describe data.  My claim 
is this default way should *not* be overloaded by using Python 
type-objects (the ctypes way).  I'm making a claim that the NumPy way of 
using a different Python object to describe data-types.  I'm not saying 
the NumPy object should be used.  I'm saying we should come up with a 
singe DataFormatType whose instances express the data formats in ways 
that other packages can produce and consume (or even use internally).  

It would be easy for NumPy to "use" the default Python object in it's 
PyArray_Descr * structure.  It would also be easy for ctypes to "use" 
the default Python object in its StgDict object that is the tp_dict of 
every ctypes type object.

It would be easy for the struct module to allow for this data-format 
object (instead of just strings) in it's methods. 

It would be easy for the array module to accept this data-format object 
(instead of just typecodes) in it's constructor.

Lot's of things would suddenly be more consistent throughout both the 
Python and C-Python user space.

Perhaps after discussion, it becomes clear that the ctypes approach is 
sufficient to be "that thing" that all modules use to share data-format 
information.  It's definitely expressive enough.   But, my argument is 
that NumPy data-type objects are also "pretty close." so why should they 
be rejected.  We could also make a "string-syntax" do it.

>
> You have said that creating whole classes is too much overhead, and
> the description should only be an instance.  To me, that particular
> class (arrays of 500 structs) still looks pretty lightweight.  So
> please clarify when it starts to be a problem.
>

> (1)  For simple types -- mapping
>           char name[30];  ==> ("name", c_char*30)
>
> Do you object to using the c_char type?
> Do you object to the array-of-length-30 class, instead of just having
> a repeat or shape attribute?
> Do you object to naming the field?
>
> (2)  For the complex types, nested and struct
>
> Do you object to creating these two classes even once?   For example,
> are you expecting to need different classes for each buffer, and to
> have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface.  
It's much to slow to be used on the C-level for sharing many small 
buffers quickly.
>
> Is creating that new class a royal pain, but frequent (and slow)
> enough that you can't just make a call into python (or ctypes)?
>
> (3)  Given that you will describe X, is X*500 (==> a type describing
> an array of 500 Xs) a royal pain in C?  If so, are you expecting to
> have to do it dynamically for many sizes, and quickly enough that you
> can't just let ctypes do it for you?

That pretty much sums it up (plus the pain of having to basically write 
Python code from "C").

-Travis



More information about the Python-Dev mailing list