[Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant
oliphant.travis at ieee.org
Sun Oct 29 09:53:09 CET 2006
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> How to handle unicode data-formats could definitely be improved.
>
> As before, I'm doubtful what the actual needs are. For example, is
> it desired to support generation of ID3v2 tags with such a data
> format? The tag is specified here:
>
Perhaps I was not clear enough about what I'm try to do. For a long
time a lot of people have wanted something like Numeric in Python
itself. There have been many hurdles to that goal.
After discussions at SciPy 2006 with Guido, we decided that the best way
to proceed at this point was to extend the buffer protocol to allow
packages to share array-like information with each-other.
There are several things missing from the buffer protocol that NumPy
needs in order to be able to really understand the (fixed-size) memory
another package has allocated and is sharing.
The most important of these is
1) Shape information
2) Striding information
3) Data-format information (how is each element perceived).
Shape and striding information can be shared with a C-array of integers.
How is data-format information supposed to be shared?
We've come up with a very flexible way to do this in NumPy using a
single Python object. This Python object supports describing the layout
of any fixed-size chunk of memory (right now in units of bytes --- bit
fields could be added, though).
I'm proposing to add this object to Python so that the buffer protcol
has a fast and efficient way to share #3. That's really all I'm after.
It also bothers me that so many ways to describe binary data are being
used out there. This is a problem that deserves being solved. And, no,
ctypes hasn't solved it (we can't directly use the ctypes solution).
Perhaps this PEP doesn't hit all the corners, but a data-format object
*is* a useful thing to consider.
The array object in Python already has a PyArray_Descr * structure that
is a watered-down version of what I'm talking about. In fact, this is
what Numeric built from (or vice-versa actually). And NumPy has greatly
enhanced this object for any conceivable structure.
Guido seemed to think the data-type objects were nice when he saw them
at SciPy 2006, and so I'm presenting a PEP.
Without the data-format object, I'm don't know how to extend the buffer
protocol to communicate data-format information. Do you have a better
idea?
I have no trouble limiting the data-type object to the buffer protocol
extension PEP, but I do think it could gain wider use.
>
> Is it the intent of this PEP to support such data structures,
> and allow the user to fill in a Unicode object, and then the
> processing is automatic? (i.e. in ID3v1, the string gets
> automatically Latin-1-encoded and zero-padded, in ID3v2, it
> gets automatically UTF-8 encoded, and null-terminated)
>
No, the point of the data-format object is to communicate information
about data-formats not to encode or decode anything. Users of the
data-format object could decide what they wanted to do with that
information. We just need a standard way to communicate it through the
buffer protocol.
-Travis
More information about the Python-Dev
mailing list