[Numpy-discussion] Robust Sorting of Points

Jonathan March jmarch at enthought.com
Sun Oct 27 17:05:02 EDT 2013


If an "almost always works" solution is good enough, then sort on the
distance to some fixed random point that is in the vicinity of your N
points.

Jonathan


On Sun, Oct 27, 2013 at 3:51 PM, Freddie Witherden <freddie at witherden.org>wrote:

> On 27/10/13 20:22, josef.pktd at gmail.com wrote:
> > On Sun, Oct 27, 2013 at 3:22 PM, Freddie Witherden
> > <freddie at witherden.org> wrote:
> >> On 27/10/13 18:54, Daniele Nicolodi wrote:
> >>> On 27/10/2013 19:42, Freddie Witherden wrote:
> >>>> On 27/10/13 18:35, Nathaniel Smith wrote:
> >>>>> On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
> >>>>> <freddie at witherden.org> wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> This is a question which has been bugging me for a while.  I have
> an (N,
> >>>>>> 3) array where N ~ 16 of points.  These points are all unique and
> >>>>>> separated by a reasonable distance.
> >>>>>>
> >>>>>> I wish to sort these points into a canonical order in a fashion
> which is
> >>>>>> robust against small perturbations.  In other words changing any
> >>>>>> component of any of the points by an epsilon ~ 1e-12 should not
> affect
> >>>>>> the resulting sorted order.
> >>>>>
> >>>>> I don't understand how this is possible even in principle.
> >>>>>
> >>>>> Say your points are
> >>>>>
> >>>>>  a = [0, 0, 0]
> >>>>>  b = [0, 0, 1e-12]
> >>>>>
> >>>>> According to your criterion, either a or b should go first -- I don't
> >>>>> know which. Let's say our canonical ordering decides that a goes
> >>>>> first. But if you perturb both of them, then you have
> >>>>>
> >>>>>  a = [0, 0, 1e-12]
> >>>>>  b = [0, 0, 0]
> >>>>>
> >>>>> And now your requirement says that a still has to go first. But if a
> >>>>> goes first this time, then b had to go first the last time, by
> >>>>> symmetry. Thus your criterion is self-contradictory...?
> >>>>
> >>>> Not exactly; in your case the distance between a and b is of the order
> >>>> epislon.  However, my points are "all unique and separated by a
> >>>> reasonable distance."  This requires at least one of the components of
> >>>> any two points to differ in all instances, permitting an ordering to
> be
> >>>> defined.  (Where if epislon ~ 1e-12 the minimum instance between any
> two
> >>>> points is of order ~ 1e-6.)
> >>>
> >>> Do you mean that all you points are distributed around some fixed
> points
> >>> in your space?  In this case, it looks like what you are looking for is
> >>> categorization or clustering and not sorting.  Once you perform
> >>> clustering, you can simply define an arbitrary order in which report
> the
> >>> content of each cluster.  If this is not the case, the problem that
> >>> Nathaniel highlishts is still present.
> >>
> >> I am not entirely sure what you mean here.  If x is my array of points
> >> of size (16, 3) then I am guarenteeing that
> >>
> >>   np.min(scipy.spatial.distance.pdist(x)) >= 1e-6
> >>
> >> In this instance I am unsure how the issue highlighted by Nathaniel
> >> might arise.  Of course it is (very) possible that I am missing
> >> something, however, I believe under the terms of this constraint that it
> >> is always possible to define an order with which to iterate through the
> >> points which is invarient to shuffling of the points and small
> >> pertubations of the components.
> >
> >
> > If the epsilon or scale depends on the column, then, I think, divmod
> > should work to cut off the noise
> >
> >>>> my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)]
> > array([[-0.5       ,  0.        ,  1.41421356],
> >        [ 0.5       ,  0.        ,  1.41421356]])
>
> An interesting proposal.  However, it appears to have the same issues as
> the rounding approach.  Consider:
>
> In [5]: a, b = 1.0 + 1e-13, 1.0 - 1e-13
>
> In [6]: abs(a - b) < 1e-12
> Out[6]: True
>
> In [7]: divmod(a, 1e-6)[0] == divmod(b, 1e-6)[0]
> Out[7]: False
>
> Hence should np.lexsort encounter such a pair it will consider a and b
> to be different even though they are within epsilon of one another.
>
> Regards, Freddie.
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131027/43554864/attachment.html>


More information about the NumPy-Discussion mailing list