[Tutor] Generate list-of-transforms

Thu Dec 5 11:52:02 CET 2013

On Thu, Dec 05, 2013 at 09:55:27AM +0000, Alan Gauld wrote:
> On 05/12/13 03:56, R. Alan Monroe wrote:
> >Given two lists, before and after a sort:
> >          0 1 2 3 4
> >          ---------
> >before:  3 1 2 5 4
> >after:   1 2 3 4 5
> >
> >Is there a well-known algorithm that can give me the
> >list-of-transforms that got me from before to after?
> 
> No.
> The reason being that it depends on the sorting algorithm
> used, and there are many.

But they all give the same end result, and hence the same 
transformation. If you want to go from A to B, the transformation 
remains A->B regardless of whether you go directly to B or via C and D. 
The specific sort algorithm doesn't effect the final result.

In the above case, regardless of whether you use TimSort (Python's 
built-in sort algorithm) or Quicksort or Bozo Sort (look it up), the 
transformation is the same:

- the item in position 0 moves to position 2;
- the item in position 1 moves to position 0;
- the item in position 2 moves to position 1;
- the item in position 3 moves to position 4; and
- the item in position 4 moves to position 3.

Which gives the mapping:

{0: 2, 1: 0, 2: 1, 3: 4, 4: 3}

If we just list the end position, the starting position can be implied 
by the order of the list:

[2, 0, 1, 4, 3]

There is a name for this: it is called a RANK TABLE. In the example 
given:

values = [3, 1, 2, 5, 4]

the rank table gives the rank of each element, that is, the position 
they would get after sorting. In this case, the rank table is not 
terribly exciting, because the items are equal to their ranks, less 1 
because Python counts indexes from 0 rather than 1:

3 is the 3rd smallest item, so position 2
1 is the 1st smallest item, so position 0
2 is the 2nd smallest item, so position 1
5 is the 5th smallest item, so position 4
4 is the 4th smallest item, so position 3

and the rank table would be [2, 0, 1, 4, 3], which gives exactly the 
ending positions of the transformation. Purely by virtue of the items 
being the consecutive numbers 1 through 5, the rank of each number is 
just itself minus 1.

A more interesting example might be:

values = ["dog", "fox", "cat", "ape", "cow"]
ranks:    
  dog = 4th => position 3
  fox = 5th => position 4
  cat = 2nd => position 1
  ape = 1st => position 0
  cow = 3rd => position 2

and so the rank table would be: [3, 4, 1, 0, 2]

The opposite to a rank table is an INDEX TABLE. That gives the index of 
the value which would end up in that position after sorting. In this 
example:

values = ["dog", "fox", "cat", "ape", "cow"]
sorted = ["ape", "cat", "cow", "dog", "fox"]

the 1st element (ape) came from index 3
the 2nd element (cat) came from index 2
the 3rd element (cow) came from index 4
the 4th element (dog) came from index 0
the 5th element (fox) came from index 1

which gives us an index table of [3, 2, 4, 0, 1].

We can easy calculate the index table using the DSU (Decorate, Sort, 
Undecorate) idiom:

py> values = ["dog", "fox", "cat", "ape", "cow"]
py> decorated = list(zip(values, range(len(values))))
py> decorated.sort()
py> indexes = [t[1] for t in decorated]
py> print(indexes)
[3, 2, 4, 0, 1]

Here's a way that should be faster, but less flexible (you can't pass a 
key function to the sort method).

indexes = sorted(range(len(values)), key=values.__getitem__)

Once you have an index table, it is easy to convert it into a rank 
table:

py> ranks = [None]*len(values)
py> for j, idx in enumerate(indexes):
...     ranks[idx] = j
...
py> print(ranks)
[3, 4, 1, 0, 2]

which is the rank table I gave earlier.

Bringing it back to the original (boring) example:

values = [3, 1, 2, 5, 4]
sorted = [1, 2, 3, 4, 5]

and the associated tables are:

indexes = [1, 2, 0, 4, 3]
ranks = [2, 0, 1, 4, 3]

-- 
Steven