[Tutor] Dumb Luck with the Partitioning Problem
Sat, 18 Aug 2001 00:48:01 -0400
This is a multi-part message in MIME format.
Hmm. Dumb brute force is still winning? OK, so let's go to the other
The attached implements the Karmarkar-Karp heuristic for this problem. It's
very, very, very clever, and runs in an eyeblink. I won't try to explain it
in full. Suffice it to say that at each step it decides to put the two
largest numbers remaining into different sets -- but doesn't decide *which*
sets until the very end! To a first approximation, this isn't entirely
unlike the effect Steven got by splitting the numbers according to
even-or-odd indices (which split the two largest numbers from the start).
For N=50, it prints
Set 1 sum 119.51791973220098
Set 2 sum 119.51788087131982
and the "distance to target" measure is half of the last number printed, a
bit less than 2e-5.
A truly remarkable property of KK is that it does better the *larger* the
input set. Run it with N=3000, for example, and after a few seconds it
Set 1 sum 54785.845251704624
Set 2 sum 54785.845251704624
An interesting online paper about this approach is korf-ckk.pdf (or .ps), at
Note that dumb brute force *still* did better than KK, although it took a
lot more computer time to find something better. On the other hand, dumb
brute force was very easy to program, easy to dream up, and easy to get
right the first time. I made several errors while implementing KK, its
inventors must have spent days dreaming it up, and it took at least an hour
of my time to get the bugs out. Is two minutes of computer time worth an
hour of mine? Sure -- but only when it's just for fun <0.9 wink>.
from __future__ import nested_scopes
def __init__(self, value, index):
self.value = value
self.i = index
def __lt__(self, other):
return self.value < other.value
# This implements the Karmarkar-Karp heuristic for partitioning a set
# in two, i.e. into two disjoint subsets s.t. their sums are
# approximately equal. It produces only one result, in O(N*log N)
# time. A remarkable property is that it loves large sets: in
# general, the more numbers you feed it, the better it does.
def __init__(self, nums):
self.nums = nums
sorted = [_Num(nums[i], i) for i in range(len(nums))]
self.sorted = sorted
sorted = self.sorted[:]
N = len(sorted)
connections = [ for i in range(N)]
while len(sorted) > 1:
bigger = sorted.pop()
smaller = sorted.pop()
# Force these into different sets, by "drawing a
# line" connecting them.
i, j = bigger.i, smaller.i
diff = bigger.value - smaller.value
assert diff >= 0
bisect.insort(sorted, _Num(diff, i))
# Now sorted contains only 1 element x, and x.value is
# the difference between the subsets' sums.
# Theorem: The connections matrix represents a spanning tree
# on the set of index nodes, and any tree can be 2-colored.
# 2-color this one (with "colors" 0 and 1).
index2color = [None] * N
def color(i, c):
if index2color[i] is not None:
assert index2color[i] == c
index2color[i] = c
for j in connections[i]:
# Partition the indices by their colors.
subsets = [, ]
for i in range(N):
N = 50
x = [math.sqrt(i) for i in range(1, N+1)]
p = Partition(x)
s, t = p.run()
sum1 = 0L
sum2 = 0L
for i in s:
sum1 += x[i]
for i in t:
sum2 += x[i]
print "Set 1 sum", repr(sum1)
print "Set 2 sum", repr(sum2)
print "difference", repr(abs(sum1 - sum2))