[Numpy-discussion] return unique combinations of stacked arrays - slow
Matt Gregory
matt.gregory at oregonstate.edu
Tue Oct 21 18:18:41 EDT 2014
I'm trying to create an output array of integers where each value
represents a unique combination of values from (1..n) input arrays. As
a simple example, given these three arrays:
a = np.array([0, 1, 2, 3, 0, 1, 2, 3])
b = np.array([0, 1, 0, 1, 0, 1, 0, 1])
c = np.array([0, 1, 1, 0, 0, 1, 0, 1])
I want an output array that holds 'codes' for the unique combinations
and a dictionary that holds the unique combinations as keys and codes as
values.
out = np.array([0, 1, 2, 3, 0, 1, 4, 5])
out_dict = {
(0, 0, 0): 0,
(1, 1, 1): 1,
(2, 0, 1): 2,
(3, 1, 0): 3,
(2, 0, 0): 4,
(3, 1, 1): 5,
}
An additional constraint is that I'm bringing in the (a, b, c) arrays a
chunk at a time due to memory limits (ie. very large rasters) and so I
need to retain the mapping between chunks.
My current (very naive and pretty slow) implementation in loop form is:
out_dict = {}
out = np.zeros_like(a)
count = 0
stack = np.vstack((a, b, c)).T
for (i, arr) in enumerate(stack):
t = tuple(arr)
if t not in out_dict:
out_dict[t] = count
count += 1
out[i] = out_dict[t]
Thanks for help,
matt
More information about the NumPy-Discussion
mailing list