Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique (array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: ############## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué:
A slightly related question on this topic...
Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis.
You can always do a view of the rows as strings and then use unique(). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. ################END OF DISCUSSION Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'):
c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: total size of new array must be unchanged
Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array. Many thanks in advance! Masha -------------------- liukis@usc.edu
On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukis<liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
############## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué:
A slightly related question on this topic...
Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis. You can always do a view of the rows as strings and then use unique(). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. ################END OF DISCUSSION
Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'):
c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: total size of new array must be unchanged
Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array.
Many thanks in advance! Masha -------------------- liukis@usc.edu
one way is to convert to structured array
c = np.array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]])
np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]])
for explanation, I asked a similar question last december about "sortrows". (I never remember, when I need the last reshape and when not) Josef
Josef, Thanks, I'll try that and will search for your question from last december :) Masha -------------------- liukis@usc.edu On Aug 17, 2009, at 9:44 PM, josef.pktd@gmail.com wrote:
On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukis<liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
############## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué:
A slightly related question on this topic...
Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis. You can always do a view of the rows as strings and then use unique (). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view ('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. ################END OF DISCUSSION
Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'):
c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: total size of new array must be unchanged
Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array.
Many thanks in advance! Masha -------------------- liukis@usc.edu
one way is to convert to structured array
c = np.array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]])
np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view (c.dtype).reshape(-1,c.shape[1]) array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]])
for explanation, I asked a similar question last december about "sortrows". (I never remember, when I need the last reshape and when not)
Josef _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array? <snip> Chuck
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote: Hello everybody,
While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array?
Yes.
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukis<liukis@usc.edu> wrote:
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array?
Yes.
I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Aug 18, 2009 at 1:03 AM, <josef.pktd@gmail.com> wrote:
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukis<liukis@usc.edu> wrote:
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array?
Yes.
I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do.
Josef
<snip>
Chuck
Just a reminder about views on views, I don't think the recommendation to take the transpose to get unique columns works. We had the discussion some time ago, that views work on the original array data and not on the view, and in this case the transpose creates a view. example below Also, unique does a sort and doesn't preserve order. Josef
c=np.array([[ 10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) cc = c.copy() #backup c = cc.T cc array([[10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) Traceback (most recent call last): File "<pyshell#46>", line 1, in <module> np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) ValueError: new type not compatible with array.
c = cc.T.copy() c array([[10, 3, 3, 9], [ 1, 4, 4, 10], [ 2, 5, 5, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 1, 4, 4, 10], [ 2, 5, 5, 11], [10, 3, 3, 9]]) c = np.ascontiguousarray(cc.T) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 1, 4, 4, 10], [ 2, 5, 5, 11], [10, 3, 3, 9]])
On Aug 17, 2009, at 10:03 PM, josef.pktd@gmail.com wrote:
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukis<liukis@usc.edu> wrote:
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array?
Sorry, I think it shows that I should stop working pass 10pm :)
Yes.
I interpreted your question as removing duplicates. It keeps rows that occur more than once.
Yes, I meant keeping only unique (without duplicates) rows.
That's what my example is intended to do.
Josef
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Josef, Many thanks for the example! It should become an official NumPy recipe :) Thanks again, Masha -------------------- liukis@usc.edu On Aug 17, 2009, at 10:03 PM, josef.pktd@gmail.com wrote:
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukis<liukis@usc.edu> wrote:
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array?
Yes.
I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do.
Josef
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Aug 18, 2009 at 2:01 AM, Maria Liukis<liukis@usc.edu> wrote:
Josef, Many thanks for the example! It should become an official NumPy recipe :) Thanks again, Masha -------------------- liukis@usc.edu
Actually, there is also an implementation of unique rows in scipy.stats._support. It uses loops (and array concatenation in the loop), but it preserves the order of the rows in the array. In general, I don't recommend using scipy.stats._support, since many or most functions are not tested and only some are used in scipy.stats. These functions wait for a rewrite or removal. When I thought about a rewrite last year, I didn't know much about structured arrays and views. Josef
cc array([[10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) scipy.stats._support.unique(cc) array([[10, 1, 2], [ 3, 4, 5], [ 9, 10, 11]])
unique columns using transpose :
cct = cc.T.copy() cct array([[10, 3, 3, 9], [ 1, 4, 4, 10], [ 2, 5, 5, 11]]) scipy.stats._support.unique(cct.T).T array([[10, 3, 9], [ 1, 4, 10], [ 2, 5, 11]])
Josef
On Aug 17, 2009, at 10:03 PM, josef.pktd@gmail.com wrote:
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukis<liukis@usc.edu> wrote:
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote:
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis <liukis@usc.edu> wrote:
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's "unique(array, 'rows')" to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example:
Just to be clear, do you mean finding all rows that only occur once in the array? Yes.
I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef
<snip> Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Charles R Harris
-
josef.pktd@gmail.com
-
Maria Liukis