[Tutor] Sorting multiple sequences

Dinesh B Vadhia dineshbvadhia at hotmail.com
Sat Mar 12 09:18:30 CET 2011


The two sequences are pairs and have to be sorted as such.  There are in fact hundreds of thousands of pairs of sequences (with different elements) with each sequence containing 10,000+ elements.

Looks like pairs = sorted(zip(g,h)) is the best option?

Dinesh

Message: 6
Date: Sat, 12 Mar 2011 11:16:30 +1100
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] Sorting multiple sequences
Message-ID: <4D7ABB5E.9080506 at pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dinesh B Vadhia wrote:
> I want to sort two sequences with different data types but both with an equal number of elements eg.
> 
> f = [0.21, 0.68, 0.44, ..., 0.23]
> i = [6, 18, 3, ..., 45]
> 
> The obvious solution is to use zip ie. pairs = zip(f,i) followed by pairs.sort().  This is fine 

It doesn't sound fine to me. Sorting pairs of items is *not* the same as 
sorting each sequence separately, except by accident. Even with the 
small example shown, you can see this:

 >>> f = [0.21, 0.68, 0.44, 0.23]
 >>> i = [6, 18, 3, 45]
 >>> sorted(f); sorted(i)  # sorting individually
[0.21, 0.23, 0.44, 0.68]
[3, 6, 18, 45]

 >>> pairs = sorted(zip(f, i))  # sorting as pairs
 >>> print(pairs)
[(0.21, 6), (0.23, 45), (0.44, 3), (0.68, 18)]
 >>> list(zip(*pairs))  # Split the pairs into separate sequences.
[(0.21, 0.23, 0.44, 0.68), (6, 45, 3, 18)]


In Python, the fastest way to sort multiple sequences is to sort 
multiple sequences. No tricks, nothing fancy, just:

f.sort()
i.sort()

Don't use sorted() unless you have to keep the unsorted list as well, 
because sorted makes a copy of the data. In other words, don't do this:

f = sorted(f)  # No! Bad!

but you can do this:

old_f = f
f = sorted(f)


> but my sequences contain 10,000+ elements and the sort is performed thousands of times.  Is there a faster solution?


Ten thousand elements is not very many.

Why do you need to sort thousands of times? What are you doing with the 
data that it needs repeated sorting?

Python's sort routine is implemented in C, highly optimized, and is 
extremely fast. It is especially fast if the data is already almost 
sorted. So if you have a list of sorted data, and you add one item to 
the end, and re-sort, that will be *extremely* fast. There is literally 
nothing you can write in pure Python that will even come close to the 
speed of Python's sort.

Unless you have profiled your application and discovered that sorting is 
the bottleneck making the app too slow, you are engaged in premature 
optimization. Don't try and guess what makes your code slow, measure!
-- 
Steven

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110312/ca32fc3a/attachment.html>


More information about the Tutor mailing list