[Numpy-discussion] Picking rows with the first (or last) occurrence of each key

Mon Jul 4 19:31:11 EDT 2016

This is trivial in pandas. a simple groupby.

In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b',
12, -329.0]]

In [7]: df = DataFrame(data, columns=list('ABC'))

In [8]: df
Out[8]:
   A   B      C
0  a  27   14.5
1  b  12   99.0
2  a  17  100.3
3  b  12 -329.0

In [9]: df.groupby('A').first()
Out[9]:
    B     C
A
a  27  14.5
b  12  99.0

In [10]: df.groupby('A').last()
Out[10]:
    B      C
A
a  17  100.3
b  12 -329.0

On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro <skip.montanaro at gmail.com>
wrote:

> > Any way that you can make your keys numeric? Then you can run np.diff on
> > that first column, and use the indices of nonzero entries
> (np.flatnonzero)
> > to know where values change. With a +1/-1 offset (that I am too lazy to
> > figure out right now ;) you can then index into the original rows to get
> > either the first or last occurrence of each run.
>
> I'll give it some thought, but one of the elements of the key is definitely
> a (short, < six characters) string.  Hashing it probably wouldn't work, too
> great a chance for collisions.
>
> S
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160704/9c6c6ffa/attachment.html>