[Numpy-discussion] unique 2d arrays

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Sep 21 13:43:53 EDT 2010


On Tue, Sep 21, 2010 at 1:29 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
>
>
> On Tue, Sep 21, 2010 at 1:55 AM, Peter Schmidtke <pschmidtke at mmb.pcb.ub.es>
> wrote:
>>
>> Dear all,
>>
>> I'd like to know if there is a pythonic / numpy way of retrieving unique
>> lines of a 2d numpy array.
>>
>> In a way I have this :
>>
>> [[409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [409 152]
>>  [426 193]
>>  [431 129]]
>>
>> And I'd like to get this :
>>
>> [[409 152]
>>  [426 193]
>>  [431 129]]
>>
>>
>> How can I do this without workarounds like string concatenation or such
>> things? Numpy.unique flattens the whole array so it's not really of use
>> here.
>
> Here is one alternative:
> I[15]: a = np.array([[409, 152], [409, 152], [426, 193], [431, 129]])
> I[16]: np.array(list(set(tuple(i) for i in a.tolist())))
> O[16]:
> array([[409, 152],
>        [426, 193],
>        [431, 129]])
> I[6]: %timeit
> np.unique(a.view([('',a.dtype)]*a.shape[1])).view(a.dtype).reshape(-1,a.shape[1])
> 10000 loops, best of 3: 51 us per loop
> I[8]: %timeit np.array(list(set(tuple(i) for i in a.tolist())))
> 10000 loops, best of 3: 31.4 us per loop
> # Try with a bigger array
> I[9]: k = np.array((a.tolist()*50000))
> I[10]: %timeit np.array(list(set(tuple(i) for i in k.tolist())))
> 1 loops, best of 3: 324 ms per loop
> I[11]: %timeit
> np.unique(k.view([('',k.dtype)]*k.shape[1])).view(k.dtype).reshape(-1,k.shape[1])
> 1 loops, best of 3: 790 ms per loop

I'm a bit surprised, I think np.unique does some extra work to
maintain the order.
The tolist() might not be necessary if you iterate over rows.

Josef


>
> Seems like faster on these tests comparing to the unique method. Also it is
> more readable. Still not uber Pythonic. Haskell has "nub" to remove
> duplicate list
> elements. http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-List.html#v%3Anub
> --
> Gökhan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list