[Numpy-discussion] return unique combinations of stacked arrays - slow

Tue Oct 21 18:18:41 EDT 2014

I'm trying to create an output array of integers where each value 
represents a unique combination of values from (1..n) input arrays.  As 
a simple example, given these three arrays:

a = np.array([0, 1, 2, 3, 0, 1, 2, 3])
b = np.array([0, 1, 0, 1, 0, 1, 0, 1])
c = np.array([0, 1, 1, 0, 0, 1, 0, 1])

I want an output array that holds 'codes' for the unique combinations 
and a dictionary that holds the unique combinations as keys and codes as 
values.

out = np.array([0, 1, 2, 3, 0, 1, 4, 5])
out_dict = {
   (0, 0, 0): 0,
   (1, 1, 1): 1,
   (2, 0, 1): 2,
   (3, 1, 0): 3,
   (2, 0, 0): 4,
   (3, 1, 1): 5,
}

An additional constraint is that I'm bringing in the (a, b, c) arrays a 
chunk at a time due to memory limits (ie. very large rasters) and so I 
need to retain the mapping between chunks.

My current (very naive and pretty slow) implementation in loop form is:

out_dict = {}
out = np.zeros_like(a)
count = 0
stack = np.vstack((a, b, c)).T
for (i, arr) in enumerate(stack):
     t = tuple(arr)
     if t not in out_dict:
         out_dict[t] = count
         count += 1
     out[i] = out_dict[t]

Thanks for help,
matt