[Python-Dev] Tunning binary insertion sort algorithm in Timsort.

Tim Peters tim.peters at gmail.com
Tue Mar 10 06:28:22 CET 2015


[nha pham <phqnha at gmail.com>]
> Thank you very much. I am very happy that I got a reply from Tim Peter.

My pleasure to speak with you too :-)


> You are correct, my mistake.
>
> The python code should be:
>     for i in range(low+1,high):              //because we already add
> data[low]
>         x = data[i]
>         if x >= data[i-1]:
>
> After I fix it, here is the result:
>
> random array 10^6:
> Old binsort:  1.3322
> New binsort: 1.0015
> ratio: 0.33
>
> You are right, it is not ten times faster anymore. I will update other
> results soon.
>
> I do check the result of two sorting methods many times to make sure they
> are the same. It is just because I do not know how to put assert into the
> timeit.Timer class.

`assert` is just another Python statement.  You simply add it to the
code - there's nothing tricky about this.  You could, e.g., simply
copy and paste the `assert`s I suggested last time.

Before you do, trying adding `print index` to your inner loops, and
make SIZE much smaller (say, 1000) so you're not overwhelmed with
output.  You'll be surprised by what you see on the second (and
following) CHUNKs.  For example, in both `sample` and `new` it will
print 900 ninety nine times in a row when doing the last CHUNK.  The
code still isn't doing what you intend.  Until it does, timing it
makes little sense :-)

> I am pretty sure about this.

Note that I'm talking about the Python code here, the code you run
through timeit.  You cannot have checked the results of running _that_
specific code, because it doesn't work at all.  You may have checked
_other_ code many times.  We may get to that later, but since I speak
Python, I'm not going to understand what you're doing until we have
Python code that works ;-)


More information about the Python-Dev mailing list