Re: [Python-Dev] PEP: Adding data-type objects to Python

29 Oct 2006

      Martin v. Löwis wrote:
...
Travis E. Oliphant schrieb:
...
How to handle unicode data-formats could definitely be improved.
As before, I'm doubtful what the actual needs are. For example, is
it desired to support generation of ID3v2 tags with such a data
format? The tag is specified here:
Perhaps I was not clear enough about what I'm try to do.   For a long 
time a lot of people have wanted something like Numeric in Python 
itself.  There have been many hurdles to that goal.

After discussions at SciPy 2006 with Guido, we decided that the best way 
to proceed at this point was to extend the buffer protocol to allow 
packages to share array-like information with each-other.

There are several things missing from the buffer protocol that NumPy 
needs in order to be able to really understand the (fixed-size) memory 
another package has allocated and is sharing.

The most important of these is

1) Shape information
2) Striding information
3) Data-format information  (how is each element perceived).

Shape and striding information can be shared with a C-array of integers.

How is data-format information supposed to be shared?

We've come up with a very flexible way to do this in NumPy using a 
single Python object.  This Python object supports describing the layout 
of any fixed-size chunk of memory (right now in units of bytes --- bit 
fields could be added, though).

I'm proposing to add this object to Python so that the buffer protcol 
has a fast and efficient way to share #3.   That's really all I'm after.

It also bothers me that so many ways to describe binary data are being 
used out there.  This is a problem that deserves being solved.  And, no, 
ctypes hasn't solved it (we can't directly use the ctypes solution). 
Perhaps this PEP doesn't hit all the corners, but a data-format object 
*is* a useful thing to consider.

The array object in Python already has a PyArray_Descr * structure that 
is a watered-down version of what I'm talking about.   In fact, this is 
what Numeric built from (or vice-versa actually).  And NumPy has greatly 
enhanced this object for any conceivable structure.

Guido seemed to think the data-type objects were nice when he saw them 
at SciPy 2006, and so I'm presenting a PEP.

Without the data-format object, I'm don't know how to extend the buffer 
protocol to communicate data-format information.  Do you have a better 
idea?

I have no trouble limiting the data-type object to the buffer protocol 
extension PEP, but I do think it could gain wider use.
...
Is it the intent of this PEP to support such data structures,
and allow the user to fill in a Unicode object, and then the
processing is automatic? (i.e. in ID3v1, the string gets
automatically Latin-1-encoded and zero-padded, in ID3v2, it
gets automatically UTF-8 encoded, and null-terminated)
No, the point of the data-format object is to communicate information 
about data-formats not to encode or decode anything.   Users of the 
data-format object could decide what they wanted to do with that 
information.   We just need a standard way to communicate it through the 
buffer protocol.

-Travis