[Numpy-discussion] return unique combinations of stacked arrays - slow
Charles R Harris
charlesr.harris at gmail.com
Tue Oct 21 18:40:24 EDT 2014
On Tue, Oct 21, 2014 at 4:18 PM, Matt Gregory <matt.gregory at oregonstate.edu>
wrote:
> I'm trying to create an output array of integers where each value
> represents a unique combination of values from (1..n) input arrays. As
> a simple example, given these three arrays:
>
> a = np.array([0, 1, 2, 3, 0, 1, 2, 3])
> b = np.array([0, 1, 0, 1, 0, 1, 0, 1])
> c = np.array([0, 1, 1, 0, 0, 1, 0, 1])
>
> I want an output array that holds 'codes' for the unique combinations
> and a dictionary that holds the unique combinations as keys and codes as
> values.
>
> out = np.array([0, 1, 2, 3, 0, 1, 4, 5])
> out_dict = {
> (0, 0, 0): 0,
> (1, 1, 1): 1,
> (2, 0, 1): 2,
> (3, 1, 0): 3,
> (2, 0, 0): 4,
> (3, 1, 1): 5,
> }
>
> An additional constraint is that I'm bringing in the (a, b, c) arrays a
> chunk at a time due to memory limits (ie. very large rasters) and so I
> need to retain the mapping between chunks.
>
> My current (very naive and pretty slow) implementation in loop form is:
>
> out_dict = {}
> out = np.zeros_like(a)
> count = 0
> stack = np.vstack((a, b, c)).T
> for (i, arr) in enumerate(stack):
> t = tuple(arr)
> if t not in out_dict:
> out_dict[t] = count
> count += 1
> out[i] = out_dict[t]
>
> Thanks for help,
> matt
>
> See
http://stackoverflow.com/questions/23268605/grouping-indices-of-unique-elements-in-numpy
for some ideas. the main difference is that you can't fit everything in
memory, but if there are lots of duplicates you should be able to do it in
batches, then combine the batches and repeat.
Another possibility if the elements are bounded is to treat them as digits
in some number system and evaluate that number, i.e., dot with something
like array([1, 10, 100, ...]).
Chuck
