<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap:break-word"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Hey Skip,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Any way that you can make your keys numeric? Then you can run np.diff on that first column, and use the indices of nonzero entries (np.flatnonzero) to know where values change. With a +1/-1 offset (that I am too lazy to figure out right now ;) you can then index into the original rows to get either the first or last occurrence of each run.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">Juan.</div> <br> <div id="bloop_sign_1467511990317112832" class="bloop_sign"> <br></div> <br><p class="airmail_on">On 2 July 2016 at 10:10:16 PM, Skip Montanaro (<a href="mailto:skip.montanaro@gmail.com">skip.montanaro@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>(I'm probably going to botch the description...)
<br>
<br>Suppose I have a 2D array of Python objects, the first n elements of each
<br>row form a key, the rest of the elements form the value. Each key can (and
<br>generally does) occur multiple times. I'd like to generate a new array
<br>consisting of just the first (or last) row for each key occurrence. Rows
<br>retain their relative order on output.
<br>
<br>For example, suppose I have this array with key length 2:
<br>
<br>[ 'a', 27, 14.5 ]
<br>[ 'b', 12, 99.0 ]
<br>[ 'a', 27, 15.7 ]
<br>[ 'a', 17, 100.3 ]
<br>[ 'b', 12, -329.0 ]
<br>
<br>Selecting the first occurrence of each key would return this array:
<br>
<br>[ 'a', 27, 14.5 ]
<br>[ 'b', 12, 99.0 ]
<br>[ 'a', 17, 100.3 ]
<br>
<br>while selecting the last occurrence would return this array:
<br>
<br>[ 'a', 27, 15.7 ]
<br>[ 'a', 17, 100.3 ]
<br>[ 'b', 12, -329.0 ]
<br>
<br>In real life, my array is a bit larger than this example, with the input
<br>being on the order of a million rows, and the output being around 5000
<br>rows. Avoiding processing all those extra rows at the Python level would
<br>speed things up.
<br>
<br>I don't know what this filter might be called (though I'm sure I haven't
<br>thought of something new), so searching Google or Bing for it would seem to
<br>be fruitless. It strikes me as something which numpy or Pandas might already
<br>have in their bag(s) of tricks.
<br>
<br>Pointers appreciated,
<br>
<br>Skip
<br>
<br>
<br>_______________________________________________
<br>NumPy-Discussion mailing list
<br><a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>
<br><a href="https://mail.scipy.org/mailman/listinfo/numpy-discussion">https://mail.scipy.org/mailman/listinfo/numpy-discussion</a>
<br></div></div></span></blockquote></body></html>