[Numpy-discussion] strange behavior of numpy.unique
Charles R Harris
charlesr.harris at gmail.com
Wed Nov 7 16:48:05 EST 2012
On Tue, Nov 6, 2012 at 7:52 PM, Warren Weckesser <warren.weckesser at gmail.com
> On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman <
> phillip.m.feldman at gmail.com> wrote:
>> numpy.unique behaves as I would expect for small inputs like the
>> In : x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
>> In : unique(x, return_index=True)
>> Out: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64))
>> But, when I give it something larger, the return index values do not
>> always correspond to the first occurrences in the input. The documentation
>> is silent on the question of how the return index values are chosen when a
>> given element of x appears more than once. Either the documentation should
>> clarified, or better yet, the behavior should be changed.
> In fact, it was changed (in the master branch on github) several months
> ago, but there has not yet been a release with the changes. The sort
> method that np.unique passes to np.argsort is now 'mergesort', and the
> docstring states that the indices returned are for the first occurrences of
> the unique elements. The new docstring is here:
> https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749for the commit, and
> https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for
> the latest version of the source.
That change was backported to 1.6.2, but doesn't work for record/object
arrays. That oversight is fixed in master.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion