[Numpy-discussion] Finding unique rows in an array

Fri Aug 24 23:08:33 EDT 2007

Francesc Altet <faltet at carabos.com> writes:

> A Tuesday 21 August 2007, Mark.Miller escrigué:
>> Is there a good loopless way to identify all of the unique rows in an
>> array?  Something like numpy.unique() is ideal, but capable of
>> extracting unique subarrays along an axis.
>
> You can always do a view of the rows as strings and then use unique().

For large arrays it probably makes sense to hash the rows by taking a
dot product with a random vector. Then sort the hash values and identify
blocks of equal values (allowing for rounding errors). Rows with
different hash values are guaranteed to be different; for blocks of rows
with the same hash value, you'll have to check, but this will probably
be much less work than checking every row, and (I hope) BLAS makes the
dot-product phase go fast.

-- 
Jouni K. Seppänen
http://www.iki.fi/jks