More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects)

Tim Peters tim.one@home.com
Sat, 20 Jan 2001 15:10:51 -0500


[Tim]
> ...
> 4. We've again given up on avoiding surprises in *simple* comparisons
>    among builtin types, like (under current CVS):
>
> >>> 1 < [1] < 0L < 1
> 1
> >>> 1 < 1
> 0
> >>>

I really dislike that.  Here's a consequence at a higher level:

N = 5
x = [1 for i in range(N)] + \
    [[1] for i in range(N)] + \
    [0L for i in range(N)]

x.sort()
print x

from random import shuffle
tries = failures = 0
while failures < 5:
    tries += 1
    y = x[:]
    shuffle(y)
    y.sort()
    if x != y:
        print "oops, on try number", tries
        print y
        failures += 1

and here's a typical run (2.1a1):

[1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L, 0L]
oops, on try number 3
[0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]]
oops, on try number 5
[[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]]
oops, on try number 6
[0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]]
oops, on try number 7
[[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]]
oops, on try number 8
[0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L]

I've often used list.sort() on a heterogeneous list simply to bring the
elements of the same type next to each other.  But as "try number 5" shows,
I can no longer rely on even getting all the lists together.  Indeed,
heterogenous list.sort() has become a very bad (biased and slow)
implementation of random.shuffle() <wink>.

Under 2.0, the program never prints "oops", because the only violations of
transitivity in 2.0's ordering of builtin types were bugs in the
implementation (none of which show up in this simple test case); 2.0's
.sort() *always* produces

[0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]]

The base trick in 2.0 was sound:  when falling back to the "compare by name
of the type" last resort, treat all numeric types as if they had the same
name.

While Python can't enforce that any user-defined __cmp__ is consistent, I
think it should continue to set a good example in the way it implements its
own comparisons.

grumblingly y'rs  - tim