[Python-Dev] PEP: Adding data-type objects to Python

Sun Oct 29 11:04:04 CET 2006

Travis E. Oliphant schrieb:
> I'm proposing to add this object to Python so that the buffer protcol 
> has a fast and efficient way to share #3.   That's really all I'm after.

I admit that I don't understand this objective. Why is it desirable to
support such an extended buffer protocol? What specific application
would be made possible if it was available and implemented in the
relevant modules and data types? What are the relevant modules and data
types that should implement it?

> It also bothers me that so many ways to describe binary data are being 
> used out there.  This is a problem that deserves being solved.  And, no, 
> ctypes hasn't solved it (we can't directly use the ctypes solution). 
> Perhaps this PEP doesn't hit all the corners, but a data-format object 
> *is* a useful thing to consider.

IMO, it is only useful if it realistically can support all the use cases
that it intends to support. If this PEP is about defining the elements
of arrays, I doubt it can realistically support everything you can
express in ctypes. There is no support for pointers (except for
PyObject*), no support for incomplete (recursive) types, no support
for function pointers, etc.

Vice versa: why exactly can't you use the data type system of ctypes?
If I want to say "int[10]", I do

py> ctypes.c_long * 10
<class '__main__.c_long_Array_10'>

To rewrite the examples from the PEP:

datatype(float) => ctypes.c_double
datatype(int)   => ctypes.c_long
datatype((int, 5)) => ctypes.c_long * 5
datatype((float, (3,2)) => (ctypes.c_double * 3) * 2

struct {
      int  simple;
      struct nested {
           char name[30];
           char addr[45];
           int  amount;
      }
=>
py> from ctypes import *
py> class nested(Structure):
...  _fields_ = [("name", c_char*30), ("addr", c_char*45), ("amount",
c_long)]
...
py> class struct(Structure):
...   _fields_ = [("simple", c_int), ("nested", nested)]
...

> Guido seemed to think the data-type objects were nice when he saw them 
> at SciPy 2006, and so I'm presenting a PEP.

I have no objection to including NumArray as-is into Python. I just
wonder were the rationale for this PEP comes from, i.e. why do you
need to exchange this information across different modules?

> Without the data-format object, I'm don't know how to extend the buffer 
> protocol to communicate data-format information.  Do you have a better 
> idea?

See above: I can't understand where the need for an extended buffer
protocol comes from. I can see why NumArray needs reflection, and
needs to keep information to interpret the bytes in the array.
But why is it important that the same information is exposed by
other data types?

>> Is it the intent of this PEP to support such data structures,
>> and allow the user to fill in a Unicode object, and then the
>> processing is automatic? (i.e. in ID3v1, the string gets
>> automatically Latin-1-encoded and zero-padded, in ID3v2, it
>> gets automatically UTF-8 encoded, and null-terminated)
>>
> 
> No, the point of the data-format object is to communicate information 
> about data-formats not to encode or decode anything.   Users of the 
> data-format object could decide what they wanted to do with that 
> information.   We just need a standard way to communicate it through the 
> buffer protocol.

This was actually a different sub-thread: why do you need to support
the 'U' code (or the 'S' code, for that matter)? In what application
do you have fixed size Unicode arrays, as opposed to Unicode strings?

Regards,
Martin