[Numpy-discussion] [Suggestion] Labelled Array

Allan Haldane allanhaldane at gmail.com
Sat Feb 13 12:11:07 EST 2016


I've had a pretty similar idea for a new indexing function 
'split_classes' which would help in your case, which essentially does

     def split_classes(c, v):
         return [v[c == u] for u in unique(c)]

Your example could be coded as

     >>> [sum(c) for c in split_classes(label, data)]
     [9, 12, 15]

I feel I've come across the need for such a function often enough that 
it might be generally useful to people as part of numpy. The 
implementation of split_classes above has pretty poor performance 
because it creates many temporary boolean arrays, so my plan for a PR 
was to have a speedy version of it that uses a single pass through v.
(I often wanted to use this function on large datasets).

If anyone has any comments on the idea (good idea. bad idea?) I'd love 
to hear.

I have some further notes and examples here: 
https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21

Allan

On 02/12/2016 09:40 AM, Sérgio wrote:
> Hello,
>
> This is my first e-mail, I will try to make the idea simple.
>
> Similar to masked array it would be interesting to use a label array to
> guide operations.
>
> Ex.:
>  >>> x
> labelled_array(data =
>   [[0 1 2]
>   [3 4 5]
>   [6 7 8]],
>                          label =
>   [[0 1 2]
>   [0 1 2]
>   [0 1 2]])
>
>  >>> sum(x)
> array([9, 12, 15])
>
> The operations would create a new axis for label indexing.
>
> You could think of it as a collection of masks, one for each label.
>
> I don't know a way to make something like this efficiently without a
> loop. Just wondering...
>
> Sérgio.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>




More information about the NumPy-Discussion mailing list