Q: sort's key and cmp parameters

Paul Rubin http
Mon Oct 5 20:27:19 EDT 2009


Raymond Hettinger <python at rcn.com> writes:
> FWIW, you could also just flatten it to:  [(1,3,7,5), (19,23)].
> 
> The point is that the tree comparisons you presented have a
> predetermined order of values to compare, so they either be recast a
> flattened list of comparison values or the tree itself can be cast
> in a list of lists form.  Either way, the O(n log n) step of doing
> the actual comparisons runs much faster than if you coded a
> recursive cmp function written in pure python.

Possibly so, but asymptotically it's the same, and anyway 

> >  Say you want to change the numeric comparisons so that
> > even numbers always sort before odd numbers, ie.
> >     -4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...
> 
> This is too easy:
> 
>     >>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
>     >>> s.sort()
>     >>> s.sort(key=lambda x: x%2)
>     >>> s
>     [-4, -2, 0, 2, 4, -3, -1, 1, 3]

I don't think so:

    >>> s = [0, 2, -2, 3, 1, -1, 4, -4, -3]
    >>> s.sort(key=lambda x:x%2)
    >>> s
    [0, 2, -2, 4, -4, 3, 1, -1, -3]

s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed
to being able to write one's natural thought pattern if that pattern
happens to be a comparison function?  Who wants to concoct these
ad-hoc encodings for every ordering?

> As long as the values of a tree get compared in a predetermined order,
> there will always be a flattened list equivalent that works faster
> using a key function.  If you're going to concoct something isn't
> easily transformable to a key function, I think your best bet is to
> create a comparison where the trees have an interaction other than
> comparing values at identical node positions; 

How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate
representations of various real numbers as possibly infinite sequences
of digits:

   def gen_pi():
      yield 3
      yield 1
      yield 4
      # ... compute infinite stream one digit at a time

Comparison is obvious if (hmmm....) you can guarantee that you're
sorting unique elements and that the sort function won't compare an
element with itself (otherwise I guess you could cut off comparison at
a million digits or whatever):

   def cmp(gen1, gen2):
     for d,e in izip(gen1(), gen2()):
        c = cmp(d,e)
        if c: return c

I don't see any clean way to handle that with key=, though of course
maybe I'm missing something.  

> P.S.  I accept than you hate sorting in Py3.x.  There's no need to
> convince me of that.  I was just curious about your one real-world
> use case and whether I could find a straight-forward key function
> that would get the job done.

I don't hate sorting in py3.x.  I understand the concept that py3.x
introduces incompatibilities to make things better.  What I don't like
is the approach of breaking stuff gratutiously with no benefit.  If
the positional arg for cmp is a kludge, why not switch it to a keyword
arg instead of eliminating it?  

I don't subscribe to the cult of the real-world use case.  I'm not
smart enough to anticipate every way anyone might ever want to use
code that I write.  So I try to identify the obvious "basis vectors"
of sensible functionality, implement those, and make sure that the
entire space of combinations works the way it should, without worrying
about whether anyone will care about any specific combination.  There
is no real-world use case for the constant assignment

 a = 0x36b20b32c927eaf68bb40d87d251049db98da0c03a0e8c791f04ee486ec2f7ee

which I'm sure nobody has ever seen before (I just made it from
os.urandom).  But if Python somehow failed to parse that number (and
only that number), that would be considered an unacceptable bug,
because a straightforward combination of primitives had failed to work
as it should.

Any book on sorting spends most of its time on comparison sorting,
which is the primitive operation.  key= is very useful but it is
a derived operation, not the other way around.



More information about the Python-list mailing list