[Numpy-discussion] Robust Sorting of Points

Freddie Witherden freddie at witherden.org
Sun Oct 27 14:28:11 EDT 2013


Hi all,

This is a question which has been bugging me for a while.  I have an (N,
3) array where N ~ 16 of points.  These points are all unique and
separated by a reasonable distance.

I wish to sort these points into a canonical order in a fashion which is
robust against small perturbations.  In other words changing any
component of any of the points by an epsilon ~ 1e-12 should not affect
the resulting sorted order.

Considering a direct application of np.lexsort:

  In [6]: my_array = np.array([[-0.5, 0, 2**0.5],
                               [0.5, 0, 2**0.5 - 1e-15]])

  In [7]: my_array[np.lexsort(my_array.T)]
  Out[7]:  array([[ 0.5       ,  0.        ,  1.41421356],
                  [-0.5       ,  0.        ,  1.41421356]])

however, if the small 1e-15 perturbation is removed the order changes to
the 'natural' ordering.  Hence, np.lexsort is out.

Rounding the array before sorting is not suitable either; just because
(a - b) < epsilon does not mean that np.around(a, decimals=x) ==
np.around(b, decimals=b).

I am therefore looking for an efficient (= within a factor of 10 of
np.lexsort) solution to the problem.  I've looked at writing my own
comparison function cmp(x, y) which looks at the next dimension if
abs(x[i] - y[i]) < epsilon however using this with sorted is thousands
of times slower.  Given that I have well over 100,000 of these arrays
this is nuisance.

My other idea is to therefore find a means of quickly replacing all
numbers within 10*epsilon of a given number in an array with that
number.  This should permit the application of np.lexsort in order to
obtain the desired ordering (which is what I'm interesting in).
However, I am yet to figure out how to do this efficiently.

Before I throw in the towel and drop down to C are there any other neat
tricks I am missing?

Regards, Freddie.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131027/153d8aca/attachment.sig>


More information about the NumPy-Discussion mailing list