PEP: <unassigned>
Title: Adding data-type objects to the standard library
Version: $Revision: $
Last-Modified: $Date: $
Author: Travis Oliphant
Status: Draft
Type: Standards Track
Created: 05-Sep-2006
Python-Version: 2.6
Abstract
This PEP proposes adapting the data-type objects from NumPy for
inclusion in standard Python, to provide a consistent and standard
way to discuss the format of binary data.
Rationale
There are many situations crossing multiple areas where an
interpretation is needed of binary data in terms of fundamental
data-types such as integers, floating-point, and complex
floating-point values. Having a common object that carries
information about binary data would be beneficial to many
people. The creation of data-type objects in NumPy to carry the
load of describing what each element of the array contains
represents an evolution of a solution that began with the
PyArray_Descr structure in Python's own array object. These
data-type objects can represent arbitrary byte data. Currently
such information is usually constructed using strings and
character codes which is unwieldy when a data-type consists of
nested structures.
Proposal
Add a PyDatatypeObject in Python (adapted from NumPy's dtype
object which evolved from the PyArray_Descr structure in Python's
array module) that holds information about a data-type. This object
will allow packages to exchange information about binary data in
a uniform way (see the extended buffer protocol PEP for an application
to exchanging information about array data).
Specification
The datatype is an object that specifies how a certain block of
memory should be interpreted as a basic data-type. In addition to
being able to describe basic data-types, the data-type object can
describe a data-type that is itself an array of other data-types
as well as a data-type that contains arbitrary "fields" (structure
members) which are located at specific offsets. In its most basic
form, however, a data-type is of a particular kind (bit, bool,
int, uint, float, complex, object, string, unicode, void) and size.
Datatype objects can be created using either a type-object, a
string, a tuple, a list, or a dictionary according to the following
constructors:
Type-object:
For a select set of type-objects a data-type object describing that
basic type can be described:
Examples:
>>> datatype(float)
datatype('float64')
>>> datatype(int)
datatype('int32') # on 32-bit platform (64 if c-long is 64-bits)
Tuple-object
A tuple of length 2 can be used to specify a data-type that is
an array of another kind of basic data-type (this array always
describes a C-contiguous array).
Examples:
>>> datatype((int, 5))
datatype(('int32', (5,)))
# describes a 5*4=20-byte block of memory laid out as
# a[0], a[1], a[2], a[3], a[4]
>>> datatype((float, (3,2))
datatype(('float64', (3,2))
# describes a 3*2*8=48 byte block of memory that should be
# interpreted as 6 doubles laid out as arr[0,0], arr[0,1],
# ... a[2,0], a[1,2]
String-object:
The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize)
kind : one of the basic array kinds given below.
itemsize : the nubmer of bytes (or bits for 't' kind) for
this data-type.
endian : either '', '=' (native), '|' (doesn't matter),
'>' (big-endian) or '<' (little-endian).
shape : either '', or a shape-tuple describing a data-type that
is an array of the given shape.
A string can also be a comma-separated sequence of basic
formats. The result will be a data-type with default field
names: 'f0', 'f1', ..., 'fn'.
Examples:
>>> datatype('u4')
datatype('uint32')
>>> datatype('f4')
datatype('float32')
>>> datatype('(3,2)f4')
datatype(('float32', (3,2))
>>> datatype('(5,)i4, (3,2)f4, S5')
datatype([('f0', '