tricky sorting

Jeff Epler jepler at unpythonic.net
Mon Oct 14 17:01:01 EDT 2002


On Mon, Oct 14, 2002 at 01:46:04PM -0700, les wrote:
> Hi,
> suppose I have a list of following
> ##########################################
> abdng  I    10-11
> abdns  IV   43-45
> rshts  I    1201-1210
> qurst  XIII 87-98
> ..
> andng  IX   18293-19283
> 
> etc
> #################################
> I would like to sort by two fields: first sort by the second column (which is in 
> roman numerals) and then I would like to sort by the last column (only the "from"
> part only) 
> so the above result would be 
> abdng I 10-11
> rshts I 1201-1210
> abdns IV 43-45
> qurst XIII 87-98
> andng IX  18293-19283
> 
> I can do this in C with a compare function but i don't know how
> to write this special compare function in python. 
> can some one help me do this ? thanks

Assuming you have a list 'l' with the above items as strings, you could
try something like the following:
    def sortfunc(a, b):
        a = a.split()
        b = b.split()
        af = int(a[2].split("-")[0])
        bf = int(b[2].split("-")[0])
        an = roman2int(a[1])
        bn = roman2int(b[1])

        return cmp(an, bn) or cmp(af, bf)

    l.sort(sortfunc)

Read the documentation for more information on the "sort" method of the
"list" object, and the builtin "cmp" function.  You'll need to write
"roman2int" yourself.  You might want to replace the "parsing" with
something a little more robust.

You might get a little more sophisticated and create an object for each
line:
    class Record:
        def __init__(self, line):
            a, b, c = line.split()
            c, d = c.split("-")

            self.a = a
            self.b = roman2int(b)
            self.c = int(c)
            self.d = int(d)

        def __str__(self):
            return "%s %s %s-%s" % (self.a, self.b, self.c, self.d)

        def __cmp__(self, other):
            return cmp(self.b, other.b) or cmp(self.c, other.c)

    l = [Record(l) for l in f]
    l.sort()  # uses __cmp__

Again, you might want to replace the parsing with something more robust,
and also give more meaningful names to the attributes.


You might also consider the "DSU" (decorate, sort, undecorate) pattern:
    def decorate(rec):
        return (rec.b, rec.c, rec)
    def undecorate(tup):
        return tup[-1]

    l = [decorate(i) for i in l]
    l.sort()
    l = [undecorate(i) for i in l]
it can be faster than sort with the "compare" function specified.

Jeff




More information about the Python-list mailing list