<div dir="ltr">Thank you very much. I am very happy that I got a reply from Tim Peter.<div><br></div><div>You are correct, my mistake. </div><div><br></div><div>The python code should be:</div><div><span style="font-size:13px">    for i in range(low+1,high):              //because we already add data[low]</span><br style="font-size:13px"><span style="font-size:13px">        x = data[i]</span><br style="font-size:13px"><span style="font-size:13px">        if x >= data[i-1]:</span><br></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">After I fix it, here is the result:</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">random array 10^6:</span></div><div><span style="font-size:13px">Old binsort:  1.3322</span></div><div><span style="font-size:13px">New binsort: 1.0015</span></div><div><span style="font-size:13px">ratio: 0.33</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">You are right, it is not ten times faster anymore. I will update other results soon.</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">I do check the result of two sorting methods many times to make sure they are the same. It is just because I do not know how to put assert into the timeit.Timer class. I am pretty sure about this.</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">I will try to write the proof more clearly, sorry for </span>inconvenience.</div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">Thank you very much.</span></div><div><span style="font-size:13px">Nha Pham. </span></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 9, 2015 at 9:27 PM, Tim Peters <span dir="ltr"><<a href="mailto:tim.peters@gmail.com" target="_blank">tim.peters@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">[nha pham <<a href="mailto:phqnha@gmail.com">phqnha@gmail.com</a>>]<br>

<span class="">> Statement_1: With an array of size N or less than N, we need at most log2(N)<br>

> comparisons to find a value<br>

> (or a position, incase the search miss), using the binary search algorithm.<br>

><br>

> proof: This statement is trivia, and I believe, someone outthere already<br>

> proved it.<br>

<br>

</span>Sorry for the quick message here.  It's just a simple point where it<br>

will pay not to get off on a wrong foot ;-)<br>

<br>

Correct:  for an array of size N, binary search can require as many as<br>

ceiling(log2(N+1)) comparisons.<br>

<br>

That's because there are N+1 possible results for an array of size N.<br>

For example, for an array of size 3, [A, B, C], "the answer" may be<br>

"before A", "between A and B", "between B and C", or "after C".  3<br>

elements, 3+1 = 4 possible results.  log2(3) comparisons are not<br>

enough to distinguish among 4 results.<br>

<br>

Make it trivial, an array of length 1.  Then 1 comparison is obviously<br>

necessary and sufficient in all cases.  And, indeed,<br>

ceiling(log2(1+1)) = 1.  log2(1) equals 0, too small.<br>

<br>

For the rest, I haven't been able to understand your description or<br>

your pseudo-code.  I'll try harder.  Some things clearly aren't doing<br>

what you _intend_ them to do.  For example, in your Python code, each<br>

time through the outer loop you're apparently trying to sort the next<br>

CHUNK elements, but you end up appending CHUNK+1 values to data2 (or<br>

data3).<br>

<br>

Or in this part:<br>

<br>

    for i in range(low,high):<br>

        x = data[i]<br>

        if x >= data[i-1]:<br>

<br>

the first time that loop is executed low == 0, and so i == 0 on the<br>

first iteration, and so the conditional is<br>

<br>

       if x >= data[0-1]<br>

<br>

That's referencing data[-1], which is the very last element in data -<br>

which has nothing to do with the CHUNK you're trying to sort at the<br>

time.<br>

<br>

So there are a number of errors here, which makes it that much harder<br>

to sort out (pun intended <wink>) what you're trying to do.  It would<br>

help you to add some asserts to verify your code is doing what you<br>

_hope_ it's doing.  For example, add<br>

<br>

    assert data2[low: high] == sorted(data[low: high])<br>

    assert len(data2) == high<br>

<br>

to the end of your `sample` loop, and similarly for data3 in your<br>

`new` loop.  Until those asserts all pass, you're not testing code<br>

that's actually sorting correctly.  Repair the errors and you almost<br>

certainly won't find `new` running over 10 times faster than `sample`<br>

anymore.  I don't know what you _will_ discover, though.  If the code<br>

doesn't have to sort correctly, there are much simpler ways to make it<br>

run _very_ much faster ;-)<br>

</blockquote></div><br></div></div>