Optimize binary insertion sort algorithm in Timsort.

This describes an optimization for "binary insertion sort" (BINSORT for short). BINSORT has been implemented in Python, CyThon, and Timsort (the default Array.sort() in JAVA SE 7 and JAVA SE 8) I have read the BINSORT in Timsort http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b13... and I think that I can make a little more optimization. ================= The old BINSORT: The basic idea is to use binary search on the sorted list to find a final position for a new element X, then insert X to the sorted list. [SORTED_LIST], [X in UNSORTED_LIST] // pick X in UNSORTED_LIST index = binarySeach([SORTED_LIST]) // use binary search to find appropriate location for X in SORTED_LIST [SORTED_LIST].add(index, X) // insert X to index location ================== New BINSORT: [SORTED_LIST], [A] // A is an UNSORTED_LIST j = compare(A[i], A[i+1]) // pick A[i], compare to next element index = binarySeach([SORTED_LIST], A[i]) // use binary search to find // appropriate location for A[i] in SORTED_LIST, and remember the index [SORTED_LIST].add(index, A[i]) // insert A[i] to index location // Now for A[+1], where already know where it should be, compare to A[i] if j >= 0: // A[i] > A[i+1], so A[i+1] will be in the right side of A[i] // We only have to search on a reduced Array: index = binarySearch(SORTED_LIST[index : length(SORTED_LIST)], A[i+1]) else: // we search on the left side of of A[i] index = binarySearch(SORTED_LIST[0 : index], A[i+1]) [SORTED_LIST].add(index, A[i+1]) // insert A[i+1] to index location //repeat the loop ================== Run test. Intuitively, new BINSORT will be better if the Array become bigger, because it reduce the array to search with the cost of only 1 more comparison. I only care about small array, with the length < 100 (we have known that in Timsort, the list is divided to chunks with length 64, then apply BINSORT to them). So I make a big Array, divide them, and apply new BINSORT in each chunk, then compare to OLD BINSORT. The source code is in the bottom of this document. Here is the results: cpuinfo: model name : Intel(R) Core(TM) i3 CPU M 350 @ 2.27GHz stepping : 2 microcode : 0xc cpu MHz : 933.000 cache size : 3072 KB ----- random array: ARRAY_SIZE: 1000000 CHUNK_SIZE: 100 DATA: randint(0, 1000000) OLD BINSORT: 81.45754 new BINSORT: 5.26754 RATIO: (OLD - new) / new = 14.464 --- incremental array: ARRAY_SIZE: 1000000 CHUNK_SIZE: 100 DATA: range(0, 1000000) OLD BINSORT: 81.87927 new BINSORT: 5.41651 RATIO: (OLD - new) / new = 14.11659 --- decremental array: ARRAY_SIZE: 1000000 CHUNK_SIZE: 100 DATA: range(0, 1000000) OLD BINSORT: 81.45723 new BINSORT: 5.09823 RATIO: (OLD - new) / new = 14.97753 ---- all equal array: ARRAY_SIZE: 1000000 CHUNK_SIZE: 100 DATA: 50000 OLD BINSORT: 40.46027 new BINSORT: 5.41221 RATIO: (OLD - new) / new = 6.47573 ==================== What should we do next: - Tuning my test code (I have just graphed it). - Test other cases, and bigger array (my laptop cannot handle array more than 10^6.) - Modifying Timsort in java.util. and test if it is better. ==================== My test code, written by Python. from timeit import Timer setup ="""\ import bisect from random import randint from timeit import Timer SIZE = 1000000 CHUNK = 100 NUM_CHUNK = SIZE/CHUNK data = [] data2 = [] data3 = [] for i in range(0,SIZE): data.append(randint(0,1000000)) #data.append(i) #data = data[::-1] """ sample ="""\ for j in range(0,NUM_CHUNK): low = CHUNK*j high= low + CHUNK data2.append(data[low]) index = low for i in range(low,high): x = data[i] index = bisect.bisect_right(data2[low:], x, low, len(data2) - low-1) data2.insert(index, x) """ new ="""\ for j in range(0,NUM_CHUNK): low = CHUNK*j high= low + CHUNK data3.append(data[low]) index = low for i in range(low,high): x = data[i] if x >= data[i-1]: index = bisect.bisect_right(data3[low:len(data3) - low-1], x, index, len(data3) - low-1) else: index = bisect.bisect_right(data3[low:index], x, low, index) data3.insert(index, x) """ t2 = Timer(stmt = sample, setup=setup) a = t2.timeit(1) print a t3 = Timer(stmt = new, setup=setup) b = t3.timeit(1) print b print (str((a - b)/b)) ==================================== Nha Pham Mar 07 2015
participants (1)
-
nha pham