Hi, I was trying to sort an array (N, 3) by rows, and firstly come with this solution: N = 1000000 arr = np.random.randint(-100, 100, size=(N, 3)) dt = np.dtype([('x', int),('y', int),('z', int)]) *arr.view(dtype=dt).sort(axis=0)* Then I found another way using lexsort function *:* *idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]])* *arr = arr[idx]* Which is 4 times faster than the previous solution. And now i have several questions: Why is the first way so much slower? What is the fastest way in numpy to sort array by rows? Why is the order of keys in lexsort function reversed? The last question was really the root of the problem for me with the lexsort function. And I still can not understand the idea of such an order (the last is the primary), it seems to me confusing. Thank you!!! With kind regards, Kirill. p.s.: One more thing, when i first try to use lexsort. I catch this strange exception: *np.lexsort(arr, axis=1)* ---------------------------------------------------------------------------AxisError Traceback (most recent call last)<ipython-input-278-5162b6ccb8f6> in <module>()----> 1 np.lexsort(ls, axis=1) AxisError: axis 1 is out of bounds for array of dimension 1
There are two mistakes in your PS. The immediate error comes from the
fact that lexsort accepts an iterable of 1D arrays, so when you pass
in arr as the argument, it is treated as an iterable over the rows,
each of which is 1D. 1D arrays do not have an axis=1. You actually
want to iterate over the columns, so np.lexsort(a.T) is the correct
phrasing of that. No idea about the speed difference.
-Joe
On Fri, Oct 20, 2017 at 6:00 AM, Kirill Balunov
Hi,
I was trying to sort an array (N, 3) by rows, and firstly come with this solution:
N = 1000000 arr = np.random.randint(-100, 100, size=(N, 3)) dt = np.dtype([('x', int),('y', int),('z', int)])
arr.view(dtype=dt).sort(axis=0)
Then I found another way using lexsort function:
idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]]) arr = arr[idx]
Which is 4 times faster than the previous solution. And now i have several questions:
Why is the first way so much slower? What is the fastest way in numpy to sort array by rows? Why is the order of keys in lexsort function reversed?
The last question was really the root of the problem for me with the lexsort function. And I still can not understand the idea of such an order (the last is the primary), it seems to me confusing.
Thank you!!! With kind regards, Kirill.
p.s.: One more thing, when i first try to use lexsort. I catch this strange exception:
np.lexsort(arr, axis=1)
--------------------------------------------------------------------------- AxisError Traceback (most recent call last) <ipython-input-278-5162b6ccb8f6> in <module>() ----> 1 np.lexsort(ls, axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thank you Josef, you gave me an idea, and now the fastest version (for big
arrays) on my laptop is:
np.lexsort(arr[:, ::-1].T)
For me the most strange thing is the order of keys, what was an idea to
keep then right-to-left? How does this relate to lexicographic order*?*
2017-10-20 17:11 GMT+03:00 Joseph Fox-Rabinovitz
There are two mistakes in your PS. The immediate error comes from the fact that lexsort accepts an iterable of 1D arrays, so when you pass in arr as the argument, it is treated as an iterable over the rows, each of which is 1D. 1D arrays do not have an axis=1. You actually want to iterate over the columns, so np.lexsort(a.T) is the correct phrasing of that. No idea about the speed difference.
-Joe
On Fri, Oct 20, 2017 at 6:00 AM, Kirill Balunov
wrote: Hi,
I was trying to sort an array (N, 3) by rows, and firstly come with this solution:
N = 1000000 arr = np.random.randint(-100, 100, size=(N, 3)) dt = np.dtype([('x', int),('y', int),('z', int)])
arr.view(dtype=dt).sort(axis=0)
Then I found another way using lexsort function:
idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]]) arr = arr[idx]
Which is 4 times faster than the previous solution. And now i have several questions:
Why is the first way so much slower? What is the fastest way in numpy to sort array by rows? Why is the order of keys in lexsort function reversed?
The last question was really the root of the problem for me with the lexsort function. And I still can not understand the idea of such an order (the last is the primary), it seems to me confusing.
Thank you!!! With kind regards, Kirill.
p.s.: One more thing, when i first try to use lexsort. I catch this strange exception:
np.lexsort(arr, axis=1)
------------------------------------------------------------
AxisError Traceback (most recent call last) <ipython-input-278-5162b6ccb8f6> in <module>() ----> 1 np.lexsort(ls, axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I do not think that there is any particular relationship between the
order of the keys and lexicographic order. The key order is just a
convention, which is clearly documented. I agree that it is a bit
counter-intuitive for anyone that has used excel or MATLAB, but it is
ingrained in the API at this point.
-Joe
On Fri, Oct 20, 2017 at 3:03 PM, Kirill Balunov
Thank you Josef, you gave me an idea, and now the fastest version (for big arrays) on my laptop is:
np.lexsort(arr[:, ::-1].T)
For me the most strange thing is the order of keys, what was an idea to keep then right-to-left? How does this relate to lexicographic order?
2017-10-20 17:11 GMT+03:00 Joseph Fox-Rabinovitz
: There are two mistakes in your PS. The immediate error comes from the fact that lexsort accepts an iterable of 1D arrays, so when you pass in arr as the argument, it is treated as an iterable over the rows, each of which is 1D. 1D arrays do not have an axis=1. You actually want to iterate over the columns, so np.lexsort(a.T) is the correct phrasing of that. No idea about the speed difference.
-Joe
On Fri, Oct 20, 2017 at 6:00 AM, Kirill Balunov
wrote: Hi,
I was trying to sort an array (N, 3) by rows, and firstly come with this solution:
N = 1000000 arr = np.random.randint(-100, 100, size=(N, 3)) dt = np.dtype([('x', int),('y', int),('z', int)])
arr.view(dtype=dt).sort(axis=0)
Then I found another way using lexsort function:
idx = np.lexsort([arr[:, 2], arr[:, 1], arr[:, 0]]) arr = arr[idx]
Which is 4 times faster than the previous solution. And now i have several questions:
Why is the first way so much slower? What is the fastest way in numpy to sort array by rows? Why is the order of keys in lexsort function reversed?
The last question was really the root of the problem for me with the lexsort function. And I still can not understand the idea of such an order (the last is the primary), it seems to me confusing.
Thank you!!! With kind regards, Kirill.
p.s.: One more thing, when i first try to use lexsort. I catch this strange exception:
np.lexsort(arr, axis=1)
--------------------------------------------------------------------------- AxisError Traceback (most recent call last) <ipython-input-278-5162b6ccb8f6> in <module>() ----> 1 np.lexsort(ls, axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Oct 20, 2017 at 1:40 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
I do not think that there is any particular relationship between the order of the keys and lexicographic order. The key order is just a convention, which is clearly documented. I agree that it is a bit counter-intuitive for anyone that has used excel or MATLAB, but it is ingrained in the API at this point.
When I wrote lexsort for numarray, together with the typed sorting routines, I went back and forth on the key order, but finally decided that the simplest thing would be to leave them in the same order as the sorts. That requires a bit of knowledge as to what the effect of that is, but if one remembers that the last sort dominates it isn't to bad. <snip> Chuck
participants (3)
-
Charles R Harris
-
Joseph Fox-Rabinovitz
-
Kirill Balunov