[Numpy-discussion] unique rows of array

Tue Aug 18 00:44:37 EDT 2009

On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukis<liukis at usc.edu> wrote:
> Hello everybody,
> While re-implementing some Matlab code in Python, I've run into a problem of
> finding a NumPy function analogous to the Matlab's "unique(array, 'rows')"
> to get unique rows of an array. Searching the web, I've found a similar
> discussion from couple of years ago with an example:
>
> ############## A SNIPPET FROM THE DISCUSSION
> [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match
> within a numpy array]
> A Tuesday 21 August 2007, Mark.Miller escrigué:
>> A slightly related question on this topic...
>>
>> Is there a good loopless way to identify all of the unique rows in an
>> array?  Something like numpy.unique() is ideal, but capable of
>> extracting unique subarrays along an axis.
> You can always do a view of the rows as strings and then use unique().
> Here is an example:
> In [1]: import numpy
> In [2]: a=numpy.arange(12).reshape(4,3)
> In [3]: a[2]=(3,4,5)
> In [4]: a
> Out[4]:
> array([[ 0,  1,  2],
>        [ 3,  4,  5],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
> now, create the view and select the unique rows:
> In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4')
> and finally restore the shape:
> In [6]: b.reshape((len(b)/a.shape[1], a.shape[1]))
> Out[6]:
> array([[ 0,  1,  2],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
> If you want to find unique columns instead of rows, do a tranpose first
> on the initial array.
> ################END OF DISCUSSION
>
> Provided example works only because array elements are row-sorted.
> Changing tested array to (in my case, it's 'c'):
>>>> c
> array([[ 0,  1,  2],
>        [ 3,  4,  5],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
>>>> c[0] = (11, 10, 0)
>>>> c
> array([[11, 10,  0],
>        [ 3,  4,  5],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
>>>> b = np.unique(c.view('S%s' %c.itemsize*c.shape[0]))
>>>> b
> array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'],
>       dtype='|S4')
>>>> b.view('i4')
> array([ 0,  3,  4,  5,  9, 10, 11])
>>>> b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: total size of new array must be unchanged
>>>>
> Since len(b) = 7.
> Suggested approach would work if the whole row would be converted to a
> single string, I guess. But from what I could gather, numpy.array.view()
> only changes display element-wise.
> Before I start re-inventing the wheel, I was just wondering if using
> existing numpy functionality one could find unique rows in an array.
>
> Many thanks in advance!
> Masha
> --------------------
> liukis at usc.edu
>
>

one way is to convert to structured array

>>> c = np.array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 3,  4,  5],
       [ 9, 10, 11]])

>>> np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1])
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 9, 10, 11]])

for explanation, I asked a similar question last december about "sortrows".
(I never remember, when I need the last reshape and when not)

Josef