[Tutor] Generate list-of-transforms
Steven D'Aprano
steve at pearwood.info
Thu Dec 5 11:52:02 CET 2013
On Thu, Dec 05, 2013 at 09:55:27AM +0000, Alan Gauld wrote:
> On 05/12/13 03:56, R. Alan Monroe wrote:
> >Given two lists, before and after a sort:
> > 0 1 2 3 4
> > ---------
> >before: 3 1 2 5 4
> >after: 1 2 3 4 5
> >
> >Is there a well-known algorithm that can give me the
> >list-of-transforms that got me from before to after?
>
> No.
> The reason being that it depends on the sorting algorithm
> used, and there are many.
But they all give the same end result, and hence the same
transformation. If you want to go from A to B, the transformation
remains A->B regardless of whether you go directly to B or via C and D.
The specific sort algorithm doesn't effect the final result.
In the above case, regardless of whether you use TimSort (Python's
built-in sort algorithm) or Quicksort or Bozo Sort (look it up), the
transformation is the same:
- the item in position 0 moves to position 2;
- the item in position 1 moves to position 0;
- the item in position 2 moves to position 1;
- the item in position 3 moves to position 4; and
- the item in position 4 moves to position 3.
Which gives the mapping:
{0: 2, 1: 0, 2: 1, 3: 4, 4: 3}
If we just list the end position, the starting position can be implied
by the order of the list:
[2, 0, 1, 4, 3]
There is a name for this: it is called a RANK TABLE. In the example
given:
values = [3, 1, 2, 5, 4]
the rank table gives the rank of each element, that is, the position
they would get after sorting. In this case, the rank table is not
terribly exciting, because the items are equal to their ranks, less 1
because Python counts indexes from 0 rather than 1:
3 is the 3rd smallest item, so position 2
1 is the 1st smallest item, so position 0
2 is the 2nd smallest item, so position 1
5 is the 5th smallest item, so position 4
4 is the 4th smallest item, so position 3
and the rank table would be [2, 0, 1, 4, 3], which gives exactly the
ending positions of the transformation. Purely by virtue of the items
being the consecutive numbers 1 through 5, the rank of each number is
just itself minus 1.
A more interesting example might be:
values = ["dog", "fox", "cat", "ape", "cow"]
ranks:
dog = 4th => position 3
fox = 5th => position 4
cat = 2nd => position 1
ape = 1st => position 0
cow = 3rd => position 2
and so the rank table would be: [3, 4, 1, 0, 2]
The opposite to a rank table is an INDEX TABLE. That gives the index of
the value which would end up in that position after sorting. In this
example:
values = ["dog", "fox", "cat", "ape", "cow"]
sorted = ["ape", "cat", "cow", "dog", "fox"]
the 1st element (ape) came from index 3
the 2nd element (cat) came from index 2
the 3rd element (cow) came from index 4
the 4th element (dog) came from index 0
the 5th element (fox) came from index 1
which gives us an index table of [3, 2, 4, 0, 1].
We can easy calculate the index table using the DSU (Decorate, Sort,
Undecorate) idiom:
py> values = ["dog", "fox", "cat", "ape", "cow"]
py> decorated = list(zip(values, range(len(values))))
py> decorated.sort()
py> indexes = [t[1] for t in decorated]
py> print(indexes)
[3, 2, 4, 0, 1]
Here's a way that should be faster, but less flexible (you can't pass a
key function to the sort method).
indexes = sorted(range(len(values)), key=values.__getitem__)
Once you have an index table, it is easy to convert it into a rank
table:
py> ranks = [None]*len(values)
py> for j, idx in enumerate(indexes):
... ranks[idx] = j
...
py> print(ranks)
[3, 4, 1, 0, 2]
which is the rank table I gave earlier.
Bringing it back to the original (boring) example:
values = [3, 1, 2, 5, 4]
sorted = [1, 2, 3, 4, 5]
and the associated tables are:
indexes = [1, 2, 0, 4, 3]
ranks = [2, 0, 1, 4, 3]
--
Steven
More information about the Tutor
mailing list