[Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Sun Jan 26 12:36:21 EST 2014


An object of type GroupBy.

So a call to group_by does not return any consumable output directly. If
you want for instance the unique keys, or groups if you will, you can call
GroupBy.unique. In this case, for a tuple of input keys, youd get a tuple
of unique keys back. If you want to compute several reductions over the
same set of keys, you can hang on to the GroupBy object, and the
precomputations it encapsulates.

To expand on that example: reduction operations also return the unique keys
which the reduced elements belong to:


(unique1, unique2), median = group_by((key1, key2)).median(values)
print unique1
print unique2
print median


 yields something like


['a' 'a' 'b' 'b' 'a']
[[0 0]
 [0 1]
 [0 1]
 [1 0]
 [1 1]]
[[ 0.34041782  0.78579254  0.91494441]
 [ 0.59422888  0.67915262  0.04327812]
 [ 0.45045529  0.45049761  0.49633574]
 [ 0.71623235  0.95760152  0.85137696]
 [ 0.96299801  0.27639574  0.70519413]]

Note that the elements of unique1 and unique2 are not themselves unique,
but rather their elements zipped together are unique.



On Sun, Jan 26, 2014 at 6:02 PM, Stéfan van der Walt <stefan at sun.ac.za>wrote:

> Hi Eelco
>
> On Sun, 26 Jan 2014 12:20:04 +0100, Eelco Hoogendoorn wrote:
> > key1 = list('abaabb')
> > key2 = np.random.randint(0,2,(6,2))
> > values = np.random.rand(6,3)
> > print group_by((key1, key2)).median(values)
>
> I agree that group_by functionality could be handy in numpy.
> In the above example, what would the output of
>
> ``group_by((key1, key2))``
>
> be?
>
> Stéfan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140126/a99985b2/attachment.html>


More information about the NumPy-Discussion mailing list