numpy.unique behaves as I would expect for small inputs like the following: In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] In [13]: unique(x, return_index=True) Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) But, when I give it something larger, the return index values do not always correspond to the first occurrences in the input. The documentation is silent on the question of how the return index values are chosen when a given element of x appears more than once. Either the documentation should be clarified, or better yet, the behavior should be changed.
On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman wrote: numpy.unique behaves as I would expect for small inputs like the following: In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] In [13]: unique(x, return_index=True)
Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) But, when I give it something larger, the return index values do not
always correspond to the first occurrences in the input. The documentation
is silent on the question of how the return index values are chosen when a
given element of x appears more than once. Either the documentation should
be
clarified, or better yet, the behavior should be changed. In fact, it was changed (in the master branch on github) several months
ago, but there has not yet been a release with the changes. The sort
method that np.unique passes to np.argsort is now 'mergesort', and the
docstring states that the indices returned are for the first occurrences of
the unique elements. The new docstring is here:
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#nu...
See
https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d67...
the commit, and
https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for the
latest version of the source.
Warren _______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Nov 6, 2012 at 9:52 PM, Warren Weckesser
On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman
wrote: numpy.unique behaves as I would expect for small inputs like the following:
In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
In [13]: unique(x, return_index=True) Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64))
But, when I give it something larger, the return index values do not always correspond to the first occurrences in the input. The documentation is silent on the question of how the return index values are chosen when a given element of x appears more than once. Either the documentation should be clarified, or better yet, the behavior should be changed.
In fact, it was changed (in the master branch on github) several months ago, but there has not yet been a release with the changes. The sort method that np.unique passes to np.argsort is now 'mergesort', and the docstring states that the indices returned are for the first occurrences of the unique elements. The new docstring is here: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#nu...
See https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d67... for the commit, and https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for the latest version of the source.
I think it's in 1.6.2 and it broke return_index for structured dtypes, IIRC. Josef
Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Nov 7, 2012 at 11:24 AM,
On Tue, Nov 6, 2012 at 9:52 PM, Warren Weckesser
wrote: On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman
wrote: numpy.unique behaves as I would expect for small inputs like the following:
In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
In [13]: unique(x, return_index=True) Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64))
But, when I give it something larger, the return index values do not always correspond to the first occurrences in the input. The
is silent on the question of how the return index values are chosen when a given element of x appears more than once. Either the documentation should be clarified, or better yet, the behavior should be changed.
In fact, it was changed (in the master branch on github) several months ago, but there has not yet been a release with the changes. The sort method
documentation that
np.unique passes to np.argsort is now 'mergesort', and the docstring states that the indices returned are for the first occurrences of the unique elements. The new docstring is here:
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#nu...
See
for the commit, and https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for
https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d67... the
latest version of the source.
I think it's in 1.6.2 and it broke return_index for structured dtypes, IIRC.
You are correct, Josef, that change is in 1.6.2. Thanks. Warren Josef
Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Nov 6, 2012 at 7:52 PM, Warren Weckesser wrote: On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman <
phillip.m.feldman@gmail.com> wrote: numpy.unique behaves as I would expect for small inputs like the
following: In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] In [13]: unique(x, return_index=True)
Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) But, when I give it something larger, the return index values do not
always correspond to the first occurrences in the input. The documentation
is silent on the question of how the return index values are chosen when a
given element of x appears more than once. Either the documentation should
be
clarified, or better yet, the behavior should be changed. In fact, it was changed (in the master branch on github) several months
ago, but there has not yet been a release with the changes. The sort
method that np.unique passes to np.argsort is now 'mergesort', and the
docstring states that the indices returned are for the first occurrences of
the unique elements. The new docstring is here:
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#nu... See
https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d67... the commit, and
https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for
the latest version of the source. That change was backported to 1.6.2, but doesn't work for record/object
arrays. That oversight is fixed in master.
Chuck
participants (4)
-
Charles R Harris
-
josef.pktd@gmail.com
-
Phillip Feldman
-
Warren Weckesser