[Tutor] Sorting multiple sequences

Steven D'Aprano steve at pearwood.info
Sat Mar 12 01:16:30 CET 2011

Dinesh B Vadhia wrote:
> I want to sort two sequences with different data types but both with an equal number of elements eg.
> f = [0.21, 0.68, 0.44, ..., 0.23]
> i = [6, 18, 3, ..., 45]
> The obvious solution is to use zip ie. pairs = zip(f,i) followed by pairs.sort().  This is fine 

It doesn't sound fine to me. Sorting pairs of items is *not* the same as 
sorting each sequence separately, except by accident. Even with the 
small example shown, you can see this:

 >>> f = [0.21, 0.68, 0.44, 0.23]
 >>> i = [6, 18, 3, 45]
 >>> sorted(f); sorted(i)  # sorting individually
[0.21, 0.23, 0.44, 0.68]
[3, 6, 18, 45]

 >>> pairs = sorted(zip(f, i))  # sorting as pairs
 >>> print(pairs)
[(0.21, 6), (0.23, 45), (0.44, 3), (0.68, 18)]
 >>> list(zip(*pairs))  # Split the pairs into separate sequences.
[(0.21, 0.23, 0.44, 0.68), (6, 45, 3, 18)]

In Python, the fastest way to sort multiple sequences is to sort 
multiple sequences. No tricks, nothing fancy, just:


Don't use sorted() unless you have to keep the unsorted list as well, 
because sorted makes a copy of the data. In other words, don't do this:

f = sorted(f)  # No! Bad!

but you can do this:

old_f = f
f = sorted(f)

> but my sequences contain 10,000+ elements and the sort is performed thousands of times.  Is there a faster solution?

Ten thousand elements is not very many.

Why do you need to sort thousands of times? What are you doing with the 
data that it needs repeated sorting?

Python's sort routine is implemented in C, highly optimized, and is 
extremely fast. It is especially fast if the data is already almost 
sorted. So if you have a list of sorted data, and you add one item to 
the end, and re-sort, that will be *extremely* fast. There is literally 
nothing you can write in pure Python that will even come close to the 
speed of Python's sort.

Unless you have profiled your application and discovered that sorting is 
the bottleneck making the app too slow, you are engaged in premature 
optimization. Don't try and guess what makes your code slow, measure!


More information about the Tutor mailing list