[Tutor] Sorting a list of lists aka nested lists

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Sun Aug 14 07:52:04 CEST 2005



> >AMK has written a nice mini-tutorial on how to use a list's sort()
> >effectively:
> >
> >    http://www.amk.ca/python/howto/sorting/sorting.html
> >
> >Does his tutorial make sense, or are there parts in there that are
> >baffling?  Please feel free to ask questions on it, and we'll try to
> >help.
> >
> I actually saw this page this morning, which leads me to believe what I
> want is something like:
>
>     Quant.sort(lambda x, y: x-y)
>
> except how does x and y relate to the list of lists I created?


Hi Jim,

Whenever we're sorting lists of objects, we have to tell Python how to
detect whenever two things are misordered.

Let's make this concrete: say that we have a list of names, like this:

#######
>>> names = ['hamlet', 'rosencrantz', 'guildenstern']
#######

We'd like to order things "alphabetically".

(... and pretend that we don't know about list.sort() for the moment.
*grin*)


How could we go about doing this?  One way is to scan across the list,
pairwise.  As we look at adjacent pairs of words, we can check to see if
those words are misordered, and if so, swap!  I'll step through this just
so you get the general idea of what's so useful about a comparison
function.

Let's start looking at the first two names in the list.

#######
>>> names[0], names[1]
('hamlet', 'rosencrantz')
#######

Are they in the right order?  We know they are, but again, let's pretend
that we don't.  *grin*  So the question really is: how does Python know if
they're in the right order?

We can ask Python this "are they in order?" question by using the built-in
cmp() function:

#######
>>> cmp(names[0], names[1])
-1
#######

If cmp gives a negative number, that means that names[0] is "less than"
names[1], and so, yes, they are.  So we'll leave the first two list
elements alone for the moment.


Now let's look at the next pair:

######
>>> cmp(names[1], names[2])
1
######

Ok, this is different.  cmp() gives a positive value if the first argument
and the second argument are misordered.  They should be swapped around!
So let's swap them:

#######
>>> names[1], names[2] = names[2], names[1]
#######


Intuitively, we've increased the "order" of our list by doing the swap.
Just to make sure things are fine, let's cmp() both pairs again, and make
sure everything comes out as negative.

#######
>>> cmp(names[0], names[1])
1
#######

Oh, now the first two elements of our list are misordered.  Let's swap
again.

######
>>> names[0], names[1] = names[1], names[0]
######


Do we have to swap anymore?  Let's check again:

######
>>> cmp(names[0], names[1])
-1
>>> cmp(names[1], names[2])
-1
######

Ok, good.  No more need to swap things around anymore.  Now that we
swapped them, let's look at our list:

#######
>>> names
['guildenstern', 'hamlet', 'rosencrantz']
#######

Everything's in the right "alphabetical" order now.  Hurrah!

What Python actually does in sorting a list is a little different that
what we described above.  But conceptually, it uses a comparison function
to detect disorder, and increases the order of a list by swapping and
moving things around, until everything's truly sorted.  These techniques
are called "comparison-based" sorts, because they compare between two
elements at a time.


Let's change the problem in a slightly wacky way: what if we'd like to
order the names by their third letter?

#######
def my_custom_cmp(first_name, second_name):
    return cmp(first_name[2], second_name[2])
#######

Given any two names, we check to see if their respective third letters are
misordered.  Notice that we've actually reused the built-in cmp()
operator, but got it to concentrate on the third letter of the given
names.  Let's try it out:

#######
>>> my_custom_cmp("danny", "jim")
1
>>> my_custom_cmp("jim", "danny")
-1
>>> my_custom_cmp("jim", "jim")
0
#######


Once we have this customized comparison function, we can pass it to
sort(), and it'll know to use our comparison operator when it's checking
for disorder:

#######
>>> names.sort(my_custom_cmp)
>>> names
['guildenstern', 'hamlet', 'rosencrantz']
#######

Hmmmm.  No change.  But that's ok; that's perfectly correct.


Ok, let's see if we order by the lengths of words:

#######
>>> def another_custom_cmp(w1, w2):
...     return cmp(len(w1), len(w2))
...
>>> names.sort(another_custom_cmp)
>>> names
['hamlet', 'rosencrantz', 'guildenstern']
#######

We've come full-circle.  *grin* But I hope that this makes sense.


Please feel free to ask questions on any part of this.  Good luck!



More information about the Tutor mailing list