[Numpy-discussion] Picking rows with the first (or last) occurrence of each key
Jeff Reback
jeffreback at gmail.com
Mon Jul 4 19:31:11 EDT 2016
This is trivial in pandas. a simple groupby.
In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b',
12, -329.0]]
In [7]: df = DataFrame(data, columns=list('ABC'))
In [8]: df
Out[8]:
A B C
0 a 27 14.5
1 b 12 99.0
2 a 17 100.3
3 b 12 -329.0
In [9]: df.groupby('A').first()
Out[9]:
B C
A
a 27 14.5
b 12 99.0
In [10]: df.groupby('A').last()
Out[10]:
B C
A
a 17 100.3
b 12 -329.0
On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro <skip.montanaro at gmail.com>
wrote:
> > Any way that you can make your keys numeric? Then you can run np.diff on
> > that first column, and use the indices of nonzero entries
> (np.flatnonzero)
> > to know where values change. With a +1/-1 offset (that I am too lazy to
> > figure out right now ;) you can then index into the original rows to get
> > either the first or last occurrence of each run.
>
> I'll give it some thought, but one of the elements of the key is definitely
> a (short, < six characters) string. Hashing it probably wouldn't work, too
> great a chance for collisions.
>
> S
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160704/9c6c6ffa/attachment.html>
More information about the NumPy-Discussion
mailing list