# [Numpy-discussion] Robust Sorting of Points

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Oct 27 16:22:34 EDT 2013

```On Sun, Oct 27, 2013 at 3:22 PM, Freddie Witherden
<freddie at witherden.org> wrote:
> On 27/10/13 18:54, Daniele Nicolodi wrote:
>> On 27/10/2013 19:42, Freddie Witherden wrote:
>>> On 27/10/13 18:35, Nathaniel Smith wrote:
>>>> On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
>>>> <freddie at witherden.org> wrote:
>>>>> Hi all,
>>>>>
>>>>> This is a question which has been bugging me for a while.  I have an (N,
>>>>> 3) array where N ~ 16 of points.  These points are all unique and
>>>>> separated by a reasonable distance.
>>>>>
>>>>> I wish to sort these points into a canonical order in a fashion which is
>>>>> robust against small perturbations.  In other words changing any
>>>>> component of any of the points by an epsilon ~ 1e-12 should not affect
>>>>> the resulting sorted order.
>>>>
>>>> I don't understand how this is possible even in principle.
>>>>
>>>>
>>>>  a = [0, 0, 0]
>>>>  b = [0, 0, 1e-12]
>>>>
>>>> According to your criterion, either a or b should go first -- I don't
>>>> know which. Let's say our canonical ordering decides that a goes
>>>> first. But if you perturb both of them, then you have
>>>>
>>>>  a = [0, 0, 1e-12]
>>>>  b = [0, 0, 0]
>>>>
>>>> And now your requirement says that a still has to go first. But if a
>>>> goes first this time, then b had to go first the last time, by
>>>
>>> Not exactly; in your case the distance between a and b is of the order
>>> epislon.  However, my points are "all unique and separated by a
>>> reasonable distance."  This requires at least one of the components of
>>> any two points to differ in all instances, permitting an ordering to be
>>> defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
>>> points is of order ~ 1e-6.)
>>
>> Do you mean that all you points are distributed around some fixed points
>> in your space?  In this case, it looks like what you are looking for is
>> categorization or clustering and not sorting.  Once you perform
>> clustering, you can simply define an arbitrary order in which report the
>> content of each cluster.  If this is not the case, the problem that
>> Nathaniel highlishts is still present.
>
> I am not entirely sure what you mean here.  If x is my array of points
> of size (16, 3) then I am guarenteeing that
>
>   np.min(scipy.spatial.distance.pdist(x)) >= 1e-6
>
> In this instance I am unsure how the issue highlighted by Nathaniel
> might arise.  Of course it is (very) possible that I am missing
> something, however, I believe under the terms of this constraint that it
> is always possible to define an order with which to iterate through the
> points which is invarient to shuffling of the points and small
> pertubations of the components.

If the epsilon or scale depends on the column, then, I think, divmod
should work to cut off the noise

>>> my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)]
array([[-0.5       ,  0.        ,  1.41421356],
[ 0.5       ,  0.        ,  1.41421356]])

Josef

>
> Regards, Freddie.
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

```