Sorting without transitivity
Tim Peters
tim_one at email.msn.com
Sun Apr 20 12:28:19 EDT 2003
[Frank Niessink]
> [It's been while since I've dealt with transitivity and stuff, so
> I hope I got the terminology right]
>
> I have objects that are not transitivily ordered, e.g.:
>
> A > B
> A == C
> B == C
>
> Now I have a list [A, B, C] and I want to sort it.
What does sorting mean to you? What it means to most people <wink> is that
a list x is sorted if and only if
x[i] <= x[i+1]
for each i in 0, 1, ..., len(x)-2. But the list [A, C, B] is sorted by that
definition, given your odd meanings for "==" and "<". You don't accept [A,
C, B] as being sorted, though, so you need to explain what sorting means to
you. Perhaps you consider list x to be sorted if and only if
x[i] <= x[j]
for all i in 0, 1, ..., len(x)-2 and j in i, i+1, ..., len(x)-1. When you
make up meanings for < and ==, it requires proof first that such an ordering
always exists.
> The resulting list should have B before A, I don't care where C is.
This prompted people to suggest a topological sort, which may be what you
want (it's hard to say given just one example). What they haven't suggested
is an efficient way to prepare input for a topsort algorithm -- those
usually work with an explicit sequence of (predecessor, successor) pairs,
but you don't have such a sequence. You could create one with a quadratic
algorithm comparing every Point r to every other Point s, materializing a
pair (r, s) only when r < s.
It may be possible to do better than that:
"""
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return 'Point(%d, %d)'%(self.x, self.y)
def __cmp__(self, other):
if self.x == other.x and self.y == other.y:
return 0
elif self.x <= other.x and self.y <= other.y:
return -1
elif self.x >= other.x and self.y >= other.y:
return 1
else:
return 0
"""
This __cmp__ makes the intent of code like
self.failUnless(pointList.index(a) < pointList.index(b))
pretty mysterious; e.g.,
[Point(20, 3)].index(Point(3, 20))
is 0: Point.__cmp__ says Point(3, 20) is equal to Point(20, 3), and index()
uses __cmp__.
Glossing over that (I'll just suggest it's unlikely that .index() does what
you want either), note that in Python's lexicographic tuple comparison,
(a, b) < (c, d)
iff
a < c or (a == c and b < d)
(a, b) < (c, d) does not imply Point(a, b) < Point(c, d); for example,
(3, 20) < (20, 3)
but
Point(3, 20) == Point(20, 3)
However, I think the other direction works: Point(a, b) < Point(c, d)
implies (a, b) < (c, d). So whenever Point.__cmp__ thinks p1 < p2, native
tuple comparison would also think (p1.x, p1.y) < (p2.x, p2.y). So if you
let Python sort naturally on (self.x, self.y) tuples, it should respect your
notion of "<" too (in addition to calling some pairs "<" that Point.__cmp__
doesn't call "<"):
def point_sort(s): # s is a list of Points
temp = [(p.x, p.y, p) for p in s]
temp.sort()
return [t[-1] for t in temp]
s = [Point(i, j) for i in range(5) for j in range(5)]
import random
random.shuffle(s)
print s
print point_sort(s)
displays (for example -- random.shuffle() may produce any permutation):
[Point(0, 1), Point(1, 0), Point(1, 3), Point(2, 0), Point(1, 4),
Point(3, 2), Point(4, 0), Point(0, 2), Point(0, 4), Point(1, 1),
Point(3, 3), Point(2, 3), Point(4, 1), Point(2, 4), Point(2, 1),
Point(0, 3), Point(3, 4), Point(0, 0), Point(4, 3), Point(2, 2),
Point(3, 1), Point(4, 4), Point(1, 2), Point(3, 0), Point(4, 2)]
[Point(0, 0), Point(0, 1), Point(0, 2), Point(0, 3), Point(0, 4),
Point(1, 0), Point(1, 1), Point(1, 2), Point(1, 3), Point(1, 4),
Point(2, 0), Point(2, 1), Point(2, 2), Point(2, 3), Point(2, 4),
Point(3, 0), Point(3, 1), Point(3, 2), Point(3, 3), Point(3, 4),
Point(4, 0), Point(4, 1), Point(4, 2), Point(4, 3), Point(4, 4)]
Using big words <wink>, a partial ordering can be extended to a total
ordering (usually in many different ways), and a sort wrt such a total
ordering necessarily respects the partial ordering it extends. In this
case, native tuple comparison appears to be a suitable extension of your
partial ordering.
More information about the Python-list
mailing list