[Numpy-discussion] Robust Sorting of Points

Charles R Harris charlesr.harris at gmail.com
Mon Oct 28 00:16:37 EDT 2013


On Sun, Oct 27, 2013 at 10:13 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
>
> On Sun, Oct 27, 2013 at 12:28 PM, Freddie Witherden <freddie at witherden.org
> > wrote:
>
>> Hi all,
>>
>> This is a question which has been bugging me for a while.  I have an (N,
>> 3) array where N ~ 16 of points.  These points are all unique and
>> separated by a reasonable distance.
>>
>> I wish to sort these points into a canonical order in a fashion which is
>> robust against small perturbations.  In other words changing any
>> component of any of the points by an epsilon ~ 1e-12 should not affect
>> the resulting sorted order.
>>
>> Considering a direct application of np.lexsort:
>>
>>   In [6]: my_array = np.array([[-0.5, 0, 2**0.5],
>>                                [0.5, 0, 2**0.5 - 1e-15]])
>>
>>   In [7]: my_array[np.lexsort(my_array.T)]
>>   Out[7]:  array([[ 0.5       ,  0.        ,  1.41421356],
>>                   [-0.5       ,  0.        ,  1.41421356]])
>>
>> however, if the small 1e-15 perturbation is removed the order changes to
>> the 'natural' ordering.  Hence, np.lexsort is out.
>>
>> Rounding the array before sorting is not suitable either; just because
>> (a - b) < epsilon does not mean that np.around(a, decimals=x) ==
>> np.around(b, decimals=b).
>>
>> I am therefore looking for an efficient (= within a factor of 10 of
>> np.lexsort) solution to the problem.  I've looked at writing my own
>> comparison function cmp(x, y) which looks at the next dimension if
>> abs(x[i] - y[i]) < epsilon however using this with sorted is thousands
>> of times slower.  Given that I have well over 100,000 of these arrays
>> this is nuisance.
>>
>> My other idea is to therefore find a means of quickly replacing all
>> numbers within 10*epsilon of a given number in an array with that
>> number.  This should permit the application of np.lexsort in order to
>> obtain the desired ordering (which is what I'm interesting in).
>> However, I am yet to figure out how to do this efficiently.
>>
>> Before I throw in the towel and drop down to C are there any other neat
>> tricks I am missing?
>>
>>
> Sort them as strings?
>
> In [17]: a = np.random.random((5,3))
>
> In [18]: a
> Out[18]:
> array([[ 0.64734085,  0.71582772,  0.82743219],
>        [ 0.92769057,  0.46880266,  0.73836167],
>        [ 0.11904745,  0.56834934,  0.16144849],
>        [ 0.59186013,  0.90698496,  0.56950572],
>        [ 0.7317681 ,  0.93495724,  0.19217244]])
>
> In [19]: a.view('S24').sort(0)
>
> In [20]: a
> Out[20]:
> array([[ 0.7317681 ,  0.93495724,  0.19217244],
>        [ 0.64734085,  0.71582772,  0.82743219],
>        [ 0.59186013,  0.90698496,  0.56950572],
>        [ 0.92769057,  0.46880266,  0.73836167],
>        [ 0.11904745,  0.56834934,  0.16144849]])
>
>
nvm, that won't work.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131027/ea84e214/attachment.html>


More information about the NumPy-Discussion mailing list