[Python-Dev] PEP: Adding data-type objects to Python

Nick Coghlan ncoghlan at gmail.com
Sat Oct 28 06:31:29 CEST 2006

Greg Ewing wrote:
> Travis E. Oliphant wrote:
>> PEP: <unassigned>
>> Title: Adding data-type objects to the standard library

I've used 'datatype' below for consistency, but can we please call them 
something other than data types? Data layouts? Data formats? Binary layouts? 
Binary formats? 'type' is already a meaningful term in Python, and having to 
check whether 'data type' meant a type definition or a data format definition 
could get annoying.

> Not sure about having 3 different ways to specify
> the structure -- it smacks of Too Many Ways To Do
> It to me.

There are actually 5 ways, but the different mechanisms all have different use 
case (and I'm going to suggest getting rid of the dictionary form).

   Simple conversion of the builtin types (would be good for instances to be 
able to hook this as with other type conversion functions).

   Makes it easy to specify a contiguous C-style array of a given data type. 
However, rather than doing type-based dispatch here, I would prefer to see 
this version handled via an optional 'shape' argument, so that all sequences 
can be handled consistently (more on that below).
       >>> datatype(int, 5) # short for datatype([(int, 5)])
       datatype('int32', (5,))
       # describes a 5*4=20-byte block of memory laid out as
       #  a[0], a[1], a[2], a[3], a[4]

   The basic formatting definition (I'd be interested in the differences 
between this definition scheme and the struct definition scheme - one definite 
goal for an implementation would be an update to the struct module to accept 
datatype objects, or at least a conversion mechanism for creating a struct 
layout description from a datatype definition)

List object:
   As for string object, but permits naming of each of the fields. I don't 
like treating tuples differently from lists, so I'd prefer for this handling 
applied to be applied to all iterables that don't meet one of the other 
special cases (direct conversion, string, dictionary).

   I'd also prefer the meta-information to come *after* the name, and for the 
name to be completely optional (leaving the corresponding field unnamed). So 
the possible sequence entries would be:
     (name, datatype)
     (name, datatype, shape)
   where name must be a string or 2-tuple, datatype must be acceptable as a 
constructor argument, and the shape must be an integer or tuple.
    For example:
       datatype(([(('coords', [1,2]), 'f4')),
                  ('address', 'S30'),

       datatype([('simple', 'i4'),
                 ('nested', [('name', 'S30'),
                             ('addr', 'S45'),
                             ('amount', 'i4')

       >>> datatype(['V8', ('var2', 'i1'), 'V3', ('var3', 'f8')]
       datatype([('', '|V8'), ('var2', '|i1'), ('', '|V3'), ('var3', '<f8')])

Dictionary object:

   This allows a tailored object where the information you have (e.g. from a 
file format specification) provides offsets and data types. Instead of having 
to define them manually the constructor will insert the necessary padding 
fields for you.

   Given an existing datatype object, you can create a new datatype which only 
names a few of the original fields by doing:
     from operator import itemgetter
     wanted = 'field1', 'field10', 'field15'
     new_names = 'attr1', 'attr2', 'attr3'
     field_defs = itemgetter(wanted)(orig_fmt.fields)
     new_fmt = datatype(dict(zip(new_names, field_defs))

> Also, what if I want to refer to fields by name
> but don't want to have to work out all the offsets
> (which is tedious, error-prone and hostile to
> modification)?

Use the list definition form. In the current PEP, you would need to define 
names for all of the uninteresting fields. With the changes I've suggested 
above, you wouldn't even have to name the fields you don't care about - just 
describe them.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list