[Numpy-discussion] Robust Sorting of Points

Freddie Witherden freddie at witherden.org
Sun Oct 27 15:22:33 EDT 2013


On 27/10/13 18:54, Daniele Nicolodi wrote:
> On 27/10/2013 19:42, Freddie Witherden wrote:
>> On 27/10/13 18:35, Nathaniel Smith wrote:
>>> On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
>>> <freddie at witherden.org> wrote:
>>>> Hi all,
>>>>
>>>> This is a question which has been bugging me for a while.  I have an (N,
>>>> 3) array where N ~ 16 of points.  These points are all unique and
>>>> separated by a reasonable distance.
>>>>
>>>> I wish to sort these points into a canonical order in a fashion which is
>>>> robust against small perturbations.  In other words changing any
>>>> component of any of the points by an epsilon ~ 1e-12 should not affect
>>>> the resulting sorted order.
>>>
>>> I don't understand how this is possible even in principle.
>>>
>>> Say your points are
>>>
>>>  a = [0, 0, 0]
>>>  b = [0, 0, 1e-12]
>>>
>>> According to your criterion, either a or b should go first -- I don't
>>> know which. Let's say our canonical ordering decides that a goes
>>> first. But if you perturb both of them, then you have
>>>
>>>  a = [0, 0, 1e-12]
>>>  b = [0, 0, 0]
>>>
>>> And now your requirement says that a still has to go first. But if a
>>> goes first this time, then b had to go first the last time, by
>>> symmetry. Thus your criterion is self-contradictory...?
>>
>> Not exactly; in your case the distance between a and b is of the order
>> epislon.  However, my points are "all unique and separated by a
>> reasonable distance."  This requires at least one of the components of
>> any two points to differ in all instances, permitting an ordering to be
>> defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
>> points is of order ~ 1e-6.)
> 
> Do you mean that all you points are distributed around some fixed points
> in your space?  In this case, it looks like what you are looking for is
> categorization or clustering and not sorting.  Once you perform
> clustering, you can simply define an arbitrary order in which report the
> content of each cluster.  If this is not the case, the problem that
> Nathaniel highlishts is still present.

I am not entirely sure what you mean here.  If x is my array of points
of size (16, 3) then I am guarenteeing that

  np.min(scipy.spatial.distance.pdist(x)) >= 1e-6

In this instance I am unsure how the issue highlighted by Nathaniel
might arise.  Of course it is (very) possible that I am missing
something, however, I believe under the terms of this constraint that it
is always possible to define an order with which to iterate through the
points which is invarient to shuffling of the points and small
pertubations of the components.

Regards, Freddie.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131027/da474b1a/attachment.sig>


More information about the NumPy-Discussion mailing list