Determine slices in a sorted array
Given an array with two axes, sorted by a column 'SLICE_BY', how can I extract slice indexes for rows with the same 'SLICE_BY' value? Here is an example program, demonstrating the problem: from numpy import * a = random.randint(0,100,(20,4)) SLICE_BY = 0 # Make slices of array 'a' by column SLICE_BY a.sort(SLICE_BY) slices = [] prev_val = None sidx = -1 for rowidx,row in enumerate(a): val = row[SLICE_BY] if val!=prev_val: if prev_val is None: prev_val = val sidx = rowidx else: slices.append((prev_val,sidx,rowidx)) sidx = rowidx prev_val = val if sidx<a.shape[0]-1: slices.append((val,sidx,a.shape[0])) print a print slices This program would print: [[ 1 0 8 1] [ 4 5 17 9] [ 4 11 19 23] [11 12 24 23] [13 16 28 23] [14 26 29 36] [15 33 32 37] [20 38 38 40] [28 47 47 45] [33 50 50 57] [45 55 52 65] [47 67 60 65] [56 76 71 68] [61 76 71 78] [70 83 82 83] [89 83 84 85] [91 84 85 87] [95 96 86 88] [98 96 89 88] [99 98 92 88]] [(1, 0, 1), (4, 1, 3), (11, 3, 4), (13, 4, 5), (14, 5, 6), (15, 6, 7), (20, 7, 8), (28, 8, 9), (33, 9, 10), (45, 10, 11), (47, 11, 12), (56, 12, 13), (61, 13, 14), (70, 14, 15), (89, 15, 16), (91, 16, 17), (95, 17, 18), (98, 18, 19)] Altough my demonstration program is functionally correct, it is not efficient. I need to do this with 10 million rows. Number of slices is relatively small (10 to 10000). Is is possible to construct my "slices" with pure numpy functions? E.g. anything that does not involve big number of python bytecode instructions, constucting Python objects, referencing/dereferencing 10 million times etc. Thanks, Laszlo
participants (1)
-
Laszlo Nagy