[Tutor] Sorting multiple sequences
Steven D'Aprano
steve at pearwood.info
Sat Mar 12 01:16:30 CET 2011
Dinesh B Vadhia wrote:
> I want to sort two sequences with different data types but both with an equal number of elements eg.
>
> f = [0.21, 0.68, 0.44, ..., 0.23]
> i = [6, 18, 3, ..., 45]
>
> The obvious solution is to use zip ie. pairs = zip(f,i) followed by pairs.sort(). This is fine
It doesn't sound fine to me. Sorting pairs of items is *not* the same as
sorting each sequence separately, except by accident. Even with the
small example shown, you can see this:
>>> f = [0.21, 0.68, 0.44, 0.23]
>>> i = [6, 18, 3, 45]
>>> sorted(f); sorted(i) # sorting individually
[0.21, 0.23, 0.44, 0.68]
[3, 6, 18, 45]
>>> pairs = sorted(zip(f, i)) # sorting as pairs
>>> print(pairs)
[(0.21, 6), (0.23, 45), (0.44, 3), (0.68, 18)]
>>> list(zip(*pairs)) # Split the pairs into separate sequences.
[(0.21, 0.23, 0.44, 0.68), (6, 45, 3, 18)]
In Python, the fastest way to sort multiple sequences is to sort
multiple sequences. No tricks, nothing fancy, just:
f.sort()
i.sort()
Don't use sorted() unless you have to keep the unsorted list as well,
because sorted makes a copy of the data. In other words, don't do this:
f = sorted(f) # No! Bad!
but you can do this:
old_f = f
f = sorted(f)
> but my sequences contain 10,000+ elements and the sort is performed thousands of times. Is there a faster solution?
Ten thousand elements is not very many.
Why do you need to sort thousands of times? What are you doing with the
data that it needs repeated sorting?
Python's sort routine is implemented in C, highly optimized, and is
extremely fast. It is especially fast if the data is already almost
sorted. So if you have a list of sorted data, and you add one item to
the end, and re-sort, that will be *extremely* fast. There is literally
nothing you can write in pure Python that will even come close to the
speed of Python's sort.
Unless you have profiled your application and discovered that sorting is
the bottleneck making the app too slow, you are engaged in premature
optimization. Don't try and guess what makes your code slow, measure!
--
Steven
More information about the Tutor
mailing list