Hi all,
This is a question which has been bugging me for a while. I have an (N,
3) array where N ~ 16 of points. These points are all unique and
separated by a reasonable distance.
I wish to sort these points into a canonical order in a fashion which is
robust against small perturbations. In other words changing any
component of any of the points by an epsilon ~ 1e-12 should not affect
the resulting sorted order.
Considering a direct application of np.lexsort:
In [6]: my_array = np.array([[-0.5, 0, 2**0.5],
[0.5, 0, 2**0.5 - 1e-15]])
In [7]: my_array[np.lexsort(my_array.T)]
Out[7]: array([[ 0.5 , 0. , 1.41421356],
[-0.5 , 0. , 1.41421356]])
however, if the small 1e-15 perturbation is removed the order changes to
the 'natural' ordering. Hence, np.lexsort is out.
Rounding the array before sorting is not suitable either; just because
(a - b) < epsilon does not mean that np.around(a, decimals=x) ==
np.around(b, decimals=b).
I am therefore looking for an efficient (= within a factor of 10 of
np.lexsort) solution to the problem. I've looked at writing my own
comparison function cmp(x, y) which looks at the next dimension if
abs(x[i] - y[i]) < epsilon however using this with sorted is thousands
of times slower. Given that I have well over 100,000 of these arrays
this is nuisance.
My other idea is to therefore find a means of quickly replacing all
numbers within 10*epsilon of a given number in an array with that
number. This should permit the application of np.lexsort in order to
obtain the desired ordering (which is what I'm interesting in).
However, I am yet to figure out how to do this efficiently.
Before I throw in the towel and drop down to C are there any other neat
tricks I am missing?
Regards, Freddie.