PEP: Title: Adding data-type objects to the standard library Version: $Revision: $ Last-Modified: $Date: $ Author: Travis Oliphant Status: Draft Type: Standards Track Created: 05-Sep-2006 Python-Version: 2.6 Abstract This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. Rationale There are many situations crossing multiple areas where an interpretation is needed of binary data in terms of fundamental data-types such as integers, floating-point, and complex floating-point values. Having a common object that carries information about binary data would be beneficial to many people. The creation of data-type objects in NumPy to carry the load of describing what each element of the array contains represents an evolution of a solution that began with the PyArray_Descr structure in Python's own array object. These data-type objects can represent arbitrary byte data. Currently such information is usually constructed using strings and character codes which is unwieldy when a data-type consists of nested structures. Proposal Add a PyDatatypeObject in Python (adapted from NumPy's dtype object which evolved from the PyArray_Descr structure in Python's array module) that holds information about a data-type. This object will allow packages to exchange information about binary data in a uniform way (see the extended buffer protocol PEP for an application to exchanging information about array data). Specification The datatype is an object that specifies how a certain block of memory should be interpreted as a basic data-type. In addition to being able to describe basic data-types, the data-type object can describe a data-type that is itself an array of other data-types as well as a data-type that contains arbitrary "fields" (structure members) which are located at specific offsets. In its most basic form, however, a data-type is of a particular kind (bit, bool, int, uint, float, complex, object, string, unicode, void) and size. Datatype objects can be created using either a type-object, a string, a tuple, a list, or a dictionary according to the following constructors: Type-object: For a select set of type-objects a data-type object describing that basic type can be described: Examples: >>> datatype(float) datatype('float64') >>> datatype(int) datatype('int32') # on 32-bit platform (64 if c-long is 64-bits) Tuple-object A tuple of length 2 can be used to specify a data-type that is an array of another kind of basic data-type (this array always describes a C-contiguous array). Examples: >>> datatype((int, 5)) datatype(('int32', (5,))) # describes a 5*4=20-byte block of memory laid out as # a[0], a[1], a[2], a[3], a[4] >>> datatype((float, (3,2)) datatype(('float64', (3,2)) # describes a 3*2*8=48 byte block of memory that should be # interpreted as 6 doubles laid out as arr[0,0], arr[0,1], # ... a[2,0], a[1,2] String-object: The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize) kind : one of the basic array kinds given below. itemsize : the nubmer of bytes (or bits for 't' kind) for this data-type. endian : either '', '=' (native), '|' (doesn't matter), '>' (big-endian) or '<' (little-endian). shape : either '', or a shape-tuple describing a data-type that is an array of the given shape. A string can also be a comma-separated sequence of basic formats. The result will be a data-type with default field names: 'f0', 'f1', ..., 'fn'. Examples: >>> datatype('u4') datatype('uint32') >>> datatype('f4') datatype('float32') >>> datatype('(3,2)f4') datatype(('float32', (3,2)) >>> datatype('(5,)i4, (3,2)f4, S5') datatype([('f0', '