[Numpy-discussion] Robust Sorting of Points

Charles R Harris charlesr.harris at gmail.com
Mon Oct 28 00:13:26 EDT 2013


On Sun, Oct 27, 2013 at 12:28 PM, Freddie Witherden
<freddie at witherden.org>wrote:

> Hi all,
>
> This is a question which has been bugging me for a while.  I have an (N,
> 3) array where N ~ 16 of points.  These points are all unique and
> separated by a reasonable distance.
>
> I wish to sort these points into a canonical order in a fashion which is
> robust against small perturbations.  In other words changing any
> component of any of the points by an epsilon ~ 1e-12 should not affect
> the resulting sorted order.
>
> Considering a direct application of np.lexsort:
>
>   In [6]: my_array = np.array([[-0.5, 0, 2**0.5],
>                                [0.5, 0, 2**0.5 - 1e-15]])
>
>   In [7]: my_array[np.lexsort(my_array.T)]
>   Out[7]:  array([[ 0.5       ,  0.        ,  1.41421356],
>                   [-0.5       ,  0.        ,  1.41421356]])
>
> however, if the small 1e-15 perturbation is removed the order changes to
> the 'natural' ordering.  Hence, np.lexsort is out.
>
> Rounding the array before sorting is not suitable either; just because
> (a - b) < epsilon does not mean that np.around(a, decimals=x) ==
> np.around(b, decimals=b).
>
> I am therefore looking for an efficient (= within a factor of 10 of
> np.lexsort) solution to the problem.  I've looked at writing my own
> comparison function cmp(x, y) which looks at the next dimension if
> abs(x[i] - y[i]) < epsilon however using this with sorted is thousands
> of times slower.  Given that I have well over 100,000 of these arrays
> this is nuisance.
>
> My other idea is to therefore find a means of quickly replacing all
> numbers within 10*epsilon of a given number in an array with that
> number.  This should permit the application of np.lexsort in order to
> obtain the desired ordering (which is what I'm interesting in).
> However, I am yet to figure out how to do this efficiently.
>
> Before I throw in the towel and drop down to C are there any other neat
> tricks I am missing?
>
>
Sort them as strings?

In [17]: a = np.random.random((5,3))

In [18]: a
Out[18]:
array([[ 0.64734085,  0.71582772,  0.82743219],
       [ 0.92769057,  0.46880266,  0.73836167],
       [ 0.11904745,  0.56834934,  0.16144849],
       [ 0.59186013,  0.90698496,  0.56950572],
       [ 0.7317681 ,  0.93495724,  0.19217244]])

In [19]: a.view('S24').sort(0)

In [20]: a
Out[20]:
array([[ 0.7317681 ,  0.93495724,  0.19217244],
       [ 0.64734085,  0.71582772,  0.82743219],
       [ 0.59186013,  0.90698496,  0.56950572],
       [ 0.92769057,  0.46880266,  0.73836167],
       [ 0.11904745,  0.56834934,  0.16144849]])

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131027/2d358f72/attachment.html>


More information about the NumPy-Discussion mailing list