[Python-Dev] Tunning binary insertion sort algorithm in Timsort.
Neil Girdhar
mistersheik at gmail.com
Mon Mar 9 21:42:48 CET 2015
It may be that the comparison that you do is between two elements that are
almost always in the same cache line whereas the binary search might often
incur a cache miss.
On Mon, Mar 9, 2015 at 2:49 PM, nha pham <phqnha at gmail.com> wrote:
> I do not know exactly, one thing I can imagine is: it turns the worst case
> of binary insertion sort to best case.
> With sorted array in range of 32 or 64 items, built from zero element. The
> new element you put into the sorted list has a high chance of being the
> smallest or the the highest of the sorted list (or nearly highest or nearly
> smallest)
>
> If that case happen, the old binary insertion sort will have the
> investigate all the list, while with my idea, it just have to compare more
> 1-2 times.
> I will try to run more test an more thinking to make sure though.
>
> On Mon, Mar 9, 2015 at 11:48 AM, nha pham <phqnha at gmail.com> wrote:
>
>> I do not know exactly, one thing I can imagine is: it turns the worst
>> case of binary insertion sort to best case.
>> With sorted array in range of 32 or 64 items, built from zero element.
>> The new element you put into the sorted list has a high chance of being the
>> smallest or the the highest of the sorted list (or nearly highest or nearly
>> smallest)
>>
>> If that case happen, the old binary insertion sort will have the
>> investigate all the list, while with my idea, it just have to compare more
>> 1-2 times.
>> I will try to run more test an more thinking to make sure though.
>>
>>
>>
>> On Mon, Mar 9, 2015 at 10:39 AM, Isaac Schwabacher <ischwabacher at wisc.edu
>> > wrote:
>>
>>> On 15-03-08, nha pham
>>> wrote:
>>> >
>>> > We can optimize the TimSort algorithm by optimizing its binary
>>> insertion sort.
>>> >
>>> > The current version of binary insertion sort use this idea:
>>> >
>>> > Use binary search to find a final position in sorted list for a new
>>> element X. Then insert X to that location.
>>> >
>>> > I suggest another idea:
>>> >
>>> > Use binary search to find a final postion in sorted list for a new
>>> element X. Before insert X to that location, compare X with its next
>>> element.
>>> >
>>> > For the next element, we already know if it is lower or bigger than X,
>>> so we can reduce the search area to the left side or on the right side of X
>>> in the sorted list.
>>>
>>> I don't understand how this is an improvement, since with binary search
>>> the idea is that each comparison cuts the remaining list to search in half;
>>> i.e., each comparison yields one bit of information. Here, you're spending
>>> a comparison to cut the list to search at the element you just inserted,
>>> which is probably not right in the middle. If you miss the middle, you're
>>> getting on average less than a full bit of information from your
>>> comparison, so you're not reducing the remaining search space by as much as
>>> you would be if you just compared to the element in the middle of the list.
>>>
>>> > I have applied my idea on java.util. ComparableTimSort.sort() and
>>> testing. The execute time is reduced by 2%-6% with array of random integer.
>>>
>>> For all that, though, experiment trumps theory...
>>>
>>> > Here is detail about algorithm and testing:
>>> https://github.com/nhapq/Optimize_binary_insertion_sort
>>> >
>>> > Sincerely.
>>> >
>>> > phqnha
>>>
>>
>>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mistersheik%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150309/5d1f65c1/attachment.html>
More information about the Python-Dev
mailing list