[Numpy-discussion] Adding the ability to "clone" a few fields from a data-type

Thu Oct 30 05:33:00 EDT 2008

A Thursday 30 October 2008, Robert Kern escrigué:
> On Wed, Oct 29, 2008 at 19:05, Travis E. Oliphant
>
> <oliphant at enthought.com> wrote:
> > Hi all,
> >
> > I'd like to add to NumPy the ability to clone a data-type object so
> > that only a view fields are copied over but that it retains the
> > same total size.
> >
> > This would allow, for example, the ability to "select out a few
> > records" from a structured array using
> >
> > subarr = arr.view(cloned_dtype)
> >
> > Right now, it is hard to do this because you have to at least add a
> > "dummy" field at the end.  A simple method on the dtype class
> > (fromfields or something) would be easy to add.
>
> I'm not sure what this accomplishes. Would the dummy fields that fill
> in the space be inaccessible? E.g. tuple(subarr[i,j,k]) gives a tuple
> with no numpy.void scalars? That would be a novel feature, but I'm
> not
>
> sure it fits the problem. On the contrary:
> > It was thought in the past to do this with indexing
> >
> > arr['field1', 'field2']
> >
> > And that would still be possible (and mostly implemented) if this
> > feature is added.
>
> This appears more like the interface that people want. Except that I
> think people were thinking that it would follow fancy indexing
> syntax:
>
>   arr[['field1', 'field2']]

I've thought about that too.  That would be a great thing to have, IMO.

>
> I guess there are two ways to implement this. One is to make a new
> array that just contains the desired fields. Another is to make a
> view that just points to the desired fields in the original array
> provided that we have a new feature for inaccessible dummy fields.
> One point for the former approach is that it is closer to fancy
> indexing which must always make a copy. The latter approach breaks
> that connection.

Yeah.  I'd vote for avoid the copy.

> OTOH, now that I think about it, I don't think there is really any
> coherent way to mix field selection with any other indexing
> operations. At least, not within the same brackets. Hmm. So maybe the
> link to fancy indexing can be ignored as, ahem, fanciful.

Well, one can always check that fields in the fancy list are either 
strings (map to name fields) or integers (map to positional fields).  
However, I'm not sure if this check would be too expensive.

> Overall, I guess, I would present the feature slightly differently.
> Provide a kind of inaccessible and invisible dtype for implementing
> dummy fields. This is useful in other places like file parsing. At
> the same time, implement a function that uses this capability to make
> views with a subset of the fields of a structured array. I'm not sure
> that people need an API for replacing the fields of a dtype like
> this.

Mmh, not sure on what you are proposing there.  You mean something like:

In [21]: t = numpy.dtype([('f0','i4'),('f1', 'f8'), ('f2', 'S20')])

In [22]: nt = t.astype(['f2', 'f0'])

In [23]: ra = numpy.zeros(10, dtype=t)

In [24]: nra = ra.view(nt)

In [25]: ra
Out[25]:
array([(0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
       (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
       (0, 0.0, ''), (0, 0.0, '')],
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S20')])

In [26]: nra
Out[26]:
array([('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('', 0),
       ('', 0), ('', 0), ('', 0)],
      dtype=[('f2', '|S20'), ('f0', '<i4')])

?

In that case, that would be a great feature to add.

-- 
Francesc Alted