[Python-Dev] PEP: Adding data-type objects to Python

27 Oct 2006


      PEP: <unassigned>
Title: Adding data-type objects to the standard library
Version: $Revision: $
Last-Modified: $Date:  $
Author: Travis Oliphant 
Status: Draft
Type: Standards Track
Created: 05-Sep-2006
Python-Version: 2.6

Abstract

    This PEP proposes adapting the data-type objects from NumPy for
    inclusion in standard Python, to provide a consistent and standard
    way to discuss the format of binary data. 

Rationale

    There are many situations crossing multiple areas where an
    interpretation is needed of binary data in terms of fundamental
    data-types such as integers, floating-point, and complex
    floating-point values.  Having a common object that carries
    information about binary data would be beneficial to many
    people. The creation of data-type objects in NumPy to carry the
    load of describing what each element of the array contains
    represents an evolution of a solution that began with the
    PyArray_Descr structure in Python's own array object.  These
    data-type objects can represent arbitrary byte data.  Currently
    such information is usually constructed using strings and
    character codes which is unwieldy when a data-type consists of
    nested structures.

Proposal

    Add a PyDatatypeObject in Python (adapted from NumPy's dtype
    object which evolved from the PyArray_Descr structure in Python's
    array module) that holds information about a data-type.  This object
    will allow packages to exchange information about binary data in
    a uniform way (see the extended buffer protocol PEP for an application
    to exchanging information about array data). 

Specification

    The datatype is an object that specifies how a certain block of
    memory should be interpreted as a basic data-type. In addition to
    being able to describe basic data-types, the data-type object can
    describe a data-type that is itself an array of other data-types
    as well as a data-type that contains arbitrary "fields" (structure
    members) which are located at specific offsets. In its most basic
    form, however, a data-type is of a particular kind (bit, bool,
    int, uint, float, complex, object, string, unicode, void) and size.

    Datatype objects can be created using either a type-object, a
    string, a tuple, a list, or a dictionary according to the following
    constructors:

    Type-object: 

      For a select set of type-objects a data-type object describing that
      basic type can be described:

      Examples: 

      >>> datatype(float)
      datatype('float64')
      
      >>> datatype(int)
      datatype('int32')  # on 32-bit platform (64 if c-long is 64-bits)

    Tuple-object
   
      A tuple of length 2 can be used to specify a data-type that is
      an array of another kind of basic data-type (this array always
      describes a C-contiguous array).

      Examples: 

      >>> datatype((int, 5))
      datatype(('int32', (5,)))
      # describes a 5*4=20-byte block of memory laid out as 
      #  a[0], a[1], a[2], a[3], a[4]

      >>> datatype((float, (3,2))
      datatype(('float64', (3,2))   
      # describes a 3*2*8=48 byte block of memory that should be
      # interpreted as 6 doubles laid out as arr[0,0], arr[0,1],
      # ... a[2,0], a[1,2]


    String-object:
 
      The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize) 

         kind     : one of the basic array kinds given below. 
         
         itemsize : the nubmer of bytes (or bits for 't' kind) for 
                     this data-type.  

         endian   : either '', '=' (native), '|' (doesn't matter),
                     '>' (big-endian) or '<' (little-endian).

         shape    : either '', or a shape-tuple describing a data-type that
                     is an array of the given shape.

      A string can also be a comma-separated sequence of basic
      formats. The result will be a data-type with default field
      names: 'f0', 'f1', ..., 'fn'.

      Examples: 

      >>> datatype('u4')
      datatype('uint32')

      >>> datatype('f4')
      datatype('float32')

      >>> datatype('(3,2)f4')
      datatype(('float32', (3,2))

      >>> datatype('(5,)i4, (3,2)f4, S5')
      datatype([('f0', '

[Python-Dev] PEP: Adding data-type objects to Python

Travis E. Oliphant