Sorting without transitivity

Tim Peters tim_one at email.msn.com
Sun Apr 20 18:28:19 CEST 2003


[Frank Niessink]
> [It's been while since I've dealt with transitivity and stuff, so
> I hope I got the terminology right]
>
> I have objects that are not transitivily ordered, e.g.:
>
> A > B
> A == C
> B == C
>
> Now I have a list [A, B, C] and I want to sort it.

What does sorting mean to you?  What it means to most people <wink> is that
a list x is sorted if and only if

    x[i] <= x[i+1]

for each i in 0, 1, ..., len(x)-2.  But the list [A, C, B] is sorted by that
definition, given your odd meanings for "==" and "<".  You don't accept [A,
C, B] as being sorted, though, so you need to explain what sorting means to
you.  Perhaps you consider list x to be sorted if and only if

    x[i] <= x[j]

for all i in 0, 1, ..., len(x)-2 and j in i, i+1, ..., len(x)-1.  When you
make up meanings for < and ==, it requires proof first that such an ordering
always exists.

> The resulting list should have B before A, I don't care where C is.

This prompted people to suggest a topological sort, which may be what you
want (it's hard to say given just one example).  What they haven't suggested
is an efficient way to prepare input for a topsort algorithm -- those
usually work with an explicit sequence of (predecessor, successor) pairs,
but you don't have such a sequence.  You could create one with a quadratic
algorithm comparing every Point r to every other Point s, materializing a
pair (r, s) only when r < s.

It may be possible to do better than that:

"""
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return 'Point(%d, %d)'%(self.x, self.y)

    def __cmp__(self, other):
        if self.x == other.x and self.y == other.y:
            return 0
        elif self.x <= other.x and self.y <= other.y:
            return -1
        elif self.x >= other.x and self.y >= other.y:
            return 1
        else:
            return 0
"""

This __cmp__ makes the intent of code like

        self.failUnless(pointList.index(a) < pointList.index(b))

pretty mysterious; e.g.,

    [Point(20, 3)].index(Point(3, 20))

is 0:  Point.__cmp__ says Point(3, 20) is equal to Point(20, 3), and index()
uses __cmp__.

Glossing over that (I'll just suggest it's unlikely that .index() does what
you want either), note that in Python's lexicographic tuple comparison,

    (a, b) < (c, d)

iff

    a < c or (a == c and b < d)

(a, b) < (c, d) does not imply Point(a, b) < Point(c, d); for example,

    (3, 20) < (20, 3)
but
    Point(3, 20) == Point(20, 3)

However, I think the other direction works:  Point(a, b) < Point(c, d)
implies (a, b) < (c, d).  So whenever Point.__cmp__ thinks p1 < p2, native
tuple comparison would also think (p1.x, p1.y) < (p2.x, p2.y).  So if you
let Python sort naturally on (self.x, self.y) tuples, it should respect your
notion of "<" too (in addition to calling some pairs "<" that Point.__cmp__
doesn't call "<"):

def point_sort(s):  # s is a list of Points
    temp = [(p.x, p.y, p) for p in s]
    temp.sort()
    return [t[-1] for t in temp]

s = [Point(i, j) for i in range(5) for j in range(5)]
import random
random.shuffle(s)
print s
print point_sort(s)

displays (for example -- random.shuffle() may produce any permutation):

[Point(0, 1), Point(1, 0), Point(1, 3), Point(2, 0), Point(1, 4),
 Point(3, 2), Point(4, 0), Point(0, 2), Point(0, 4), Point(1, 1),
 Point(3, 3), Point(2, 3), Point(4, 1), Point(2, 4), Point(2, 1),
 Point(0, 3), Point(3, 4), Point(0, 0), Point(4, 3), Point(2, 2),
 Point(3, 1), Point(4, 4), Point(1, 2), Point(3, 0), Point(4, 2)]

[Point(0, 0), Point(0, 1), Point(0, 2), Point(0, 3), Point(0, 4),
 Point(1, 0), Point(1, 1), Point(1, 2), Point(1, 3), Point(1, 4),
 Point(2, 0), Point(2, 1), Point(2, 2), Point(2, 3), Point(2, 4),
 Point(3, 0), Point(3, 1), Point(3, 2), Point(3, 3), Point(3, 4),
 Point(4, 0), Point(4, 1), Point(4, 2), Point(4, 3), Point(4, 4)]

Using big words <wink>, a partial ordering can be extended to a total
ordering (usually in many different ways), and a sort wrt such a total
ordering necessarily respects the partial ordering it extends.  In this
case, native tuple comparison appears to be a suitable extension of your
partial ordering.






More information about the Python-list mailing list