[Tutor] Sorting multiple sequences
Dinesh B Vadhia
dineshbvadhia at hotmail.com
Sat Mar 12 09:18:30 CET 2011
The two sequences are pairs and have to be sorted as such. There are in fact hundreds of thousands of pairs of sequences (with different elements) with each sequence containing 10,000+ elements.
Looks like pairs = sorted(zip(g,h)) is the best option?
Date: Sat, 12 Mar 2011 11:16:30 +1100
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] Sorting multiple sequences
Message-ID: <4D7ABB5E.9080506 at pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dinesh B Vadhia wrote:
> I want to sort two sequences with different data types but both with an equal number of elements eg.
> f = [0.21, 0.68, 0.44, ..., 0.23]
> i = [6, 18, 3, ..., 45]
> The obvious solution is to use zip ie. pairs = zip(f,i) followed by pairs.sort(). This is fine
It doesn't sound fine to me. Sorting pairs of items is *not* the same as
sorting each sequence separately, except by accident. Even with the
small example shown, you can see this:
>>> f = [0.21, 0.68, 0.44, 0.23]
>>> i = [6, 18, 3, 45]
>>> sorted(f); sorted(i) # sorting individually
[0.21, 0.23, 0.44, 0.68]
[3, 6, 18, 45]
>>> pairs = sorted(zip(f, i)) # sorting as pairs
[(0.21, 6), (0.23, 45), (0.44, 3), (0.68, 18)]
>>> list(zip(*pairs)) # Split the pairs into separate sequences.
[(0.21, 0.23, 0.44, 0.68), (6, 45, 3, 18)]
In Python, the fastest way to sort multiple sequences is to sort
multiple sequences. No tricks, nothing fancy, just:
Don't use sorted() unless you have to keep the unsorted list as well,
because sorted makes a copy of the data. In other words, don't do this:
f = sorted(f) # No! Bad!
but you can do this:
old_f = f
f = sorted(f)
> but my sequences contain 10,000+ elements and the sort is performed thousands of times. Is there a faster solution?
Ten thousand elements is not very many.
Why do you need to sort thousands of times? What are you doing with the
data that it needs repeated sorting?
Python's sort routine is implemented in C, highly optimized, and is
extremely fast. It is especially fast if the data is already almost
sorted. So if you have a list of sorted data, and you add one item to
the end, and re-sort, that will be *extremely* fast. There is literally
nothing you can write in pure Python that will even come close to the
speed of Python's sort.
Unless you have profiled your application and discovered that sorting is
the bottleneck making the app too slow, you are engaged in premature
optimization. Don't try and guess what makes your code slow, measure!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor