[Python-Dev] PEP: Adding data-type objects to Python
ncoghlan at gmail.com
Sat Oct 28 06:31:29 CEST 2006
Greg Ewing wrote:
> Travis E. Oliphant wrote:
>> PEP: <unassigned>
>> Title: Adding data-type objects to the standard library
I've used 'datatype' below for consistency, but can we please call them
something other than data types? Data layouts? Data formats? Binary layouts?
Binary formats? 'type' is already a meaningful term in Python, and having to
check whether 'data type' meant a type definition or a data format definition
could get annoying.
> Not sure about having 3 different ways to specify
> the structure -- it smacks of Too Many Ways To Do
> It to me.
There are actually 5 ways, but the different mechanisms all have different use
case (and I'm going to suggest getting rid of the dictionary form).
Simple conversion of the builtin types (would be good for instances to be
able to hook this as with other type conversion functions).
Makes it easy to specify a contiguous C-style array of a given data type.
However, rather than doing type-based dispatch here, I would prefer to see
this version handled via an optional 'shape' argument, so that all sequences
can be handled consistently (more on that below).
>>> datatype(int, 5) # short for datatype([(int, 5)])
# describes a 5*4=20-byte block of memory laid out as
# a, a, a, a, a
The basic formatting definition (I'd be interested in the differences
between this definition scheme and the struct definition scheme - one definite
goal for an implementation would be an update to the struct module to accept
datatype objects, or at least a conversion mechanism for creating a struct
layout description from a datatype definition)
As for string object, but permits naming of each of the fields. I don't
like treating tuples differently from lists, so I'd prefer for this handling
applied to be applied to all iterables that don't meet one of the other
special cases (direct conversion, string, dictionary).
I'd also prefer the meta-information to come *after* the name, and for the
name to be completely optional (leaving the corresponding field unnamed). So
the possible sequence entries would be:
(name, datatype, shape)
where name must be a string or 2-tuple, datatype must be acceptable as a
constructor argument, and the shape must be an integer or tuple.
datatype(([(('coords', [1,2]), 'f4')),
('nested', [('name', 'S30'),
>>> datatype(['V8', ('var2', 'i1'), 'V3', ('var3', 'f8')]
datatype([('', '|V8'), ('var2', '|i1'), ('', '|V3'), ('var3', '<f8')])
This allows a tailored object where the information you have (e.g. from a
file format specification) provides offsets and data types. Instead of having
to define them manually the constructor will insert the necessary padding
fields for you.
Given an existing datatype object, you can create a new datatype which only
names a few of the original fields by doing:
from operator import itemgetter
wanted = 'field1', 'field10', 'field15'
new_names = 'attr1', 'attr2', 'attr3'
field_defs = itemgetter(wanted)(orig_fmt.fields)
new_fmt = datatype(dict(zip(new_names, field_defs))
> Also, what if I want to refer to fields by name
> but don't want to have to work out all the offsets
> (which is tedious, error-prone and hostile to
Use the list definition form. In the current PEP, you would need to define
names for all of the uninteresting fields. With the changes I've suggested
above, you wouldn't even have to name the fields you don't care about - just
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev