Greg Ewing wrote:
Travis E. Oliphant wrote:
PEP: <unassigned> Title: Adding data-type objects to the standard library
I've used 'datatype' below for consistency, but can we please call them something other than data types? Data layouts? Data formats? Binary layouts? Binary formats? 'type' is already a meaningful term in Python, and having to check whether 'data type' meant a type definition or a data format definition could get annoying.
Not sure about having 3 different ways to specify the structure -- it smacks of Too Many Ways To Do It to me.
There are actually 5 ways, but the different mechanisms all have different use case (and I'm going to suggest getting rid of the dictionary form).
Type-object: Simple conversion of the builtin types (would be good for instances to be able to hook this as with other type conversion functions).
2-tuple: Makes it easy to specify a contiguous C-style array of a given data type. However, rather than doing type-based dispatch here, I would prefer to see this version handled via an optional 'shape' argument, so that all sequences can be handled consistently (more on that below). >>> datatype(int, 5) # short for datatype([(int, 5)]) datatype('int32', (5,)) # describes a 5*4=20-byte block of memory laid out as # a, a, a, a, a
String-object: The basic formatting definition (I'd be interested in the differences between this definition scheme and the struct definition scheme - one definite goal for an implementation would be an update to the struct module to accept datatype objects, or at least a conversion mechanism for creating a struct layout description from a datatype definition)
List object: As for string object, but permits naming of each of the fields. I don't like treating tuples differently from lists, so I'd prefer for this handling applied to be applied to all iterables that don't meet one of the other special cases (direct conversion, string, dictionary).
I'd also prefer the meta-information to come *after* the name, and for the name to be completely optional (leaving the corresponding field unnamed). So the possible sequence entries would be: datatype (name, datatype) (name, datatype, shape) where name must be a string or 2-tuple, datatype must be acceptable as a constructor argument, and the shape must be an integer or tuple. For example: datatype(([(('coords', [1,2]), 'f4')), ('address', 'S30'), ])
datatype([('simple', 'i4'), ('nested', [('name', 'S30'), ('addr', 'S45'), ('amount', 'i4') ] ), ])
>>> datatype(['V8', ('var2', 'i1'), 'V3', ('var3', 'f8')] datatype([('', '|V8'), ('var2', '|i1'), ('', '|V3'), ('var3', '<f8')])
This allows a tailored object where the information you have (e.g. from a file format specification) provides offsets and data types. Instead of having to define them manually the constructor will insert the necessary padding fields for you.
Given an existing datatype object, you can create a new datatype which only names a few of the original fields by doing: from operator import itemgetter wanted = 'field1', 'field10', 'field15' new_names = 'attr1', 'attr2', 'attr3' field_defs = itemgetter(wanted)(orig_fmt.fields) new_fmt = datatype(dict(zip(new_names, field_defs))
Also, what if I want to refer to fields by name but don't want to have to work out all the offsets (which is tedious, error-prone and hostile to modification)?
Use the list definition form. In the current PEP, you would need to define names for all of the uninteresting fields. With the changes I've suggested above, you wouldn't even have to name the fields you don't care about - just describe them.