[Tutor] Dumb Luck with the Partitioning Problem

Sat, 18 Aug 2001 00:48:01 -0400

This is a multi-part message in MIME format.

------=_NextPart_000_0008_01C1277F.6EAF5A00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Hmm.  Dumb brute force is still winning?  OK, so let's go to the other
extreme <wink>.

The attached implements the Karmarkar-Karp heuristic for this problem.  It's
very, very, very clever, and runs in an eyeblink.  I won't try to explain it
in full.  Suffice it to say that at each step it decides to put the two
largest numbers remaining into different sets -- but doesn't decide *which*
sets until the very end!  To a first approximation, this isn't entirely
unlike the effect Steven got by splitting the numbers according to
even-or-odd indices (which split the two largest numbers from the start).

For N=50, it prints

Set 1 sum 119.51791973220098
Set 2 sum 119.51788087131982
difference 3.8860881161895122e-005

and the "distance to target" measure is half of the last number printed, a
bit less than 2e-5.

A truly remarkable property of KK is that it does better the *larger* the
input set.  Run it with N=3000, for example, and after a few seconds it
prints

Set 1 sum 54785.845251704624
Set 2 sum 54785.845251704624
difference 0.0

An interesting online paper about this approach is korf-ckk.pdf (or .ps), at

    http://www.cs.pdx.edu/~bart/cs510cs/papers/

Note that dumb brute force *still* did better than KK, although it took a
lot more computer time to find something better.  On the other hand, dumb
brute force was very easy to program, easy to dream up, and easy to get
right the first time.  I made several errors while implementing KK, its
inventors must have spent days dreaming it up, and it took at least an hour
of my time to get the bugs out.  Is two minutes of computer time worth an
hour of mine?  Sure -- but only when it's just for fun <0.9 wink>.

------=_NextPart_000_0008_01C1277F.6EAF5A00
Content-Type: text/plain;
	name="Karp.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="Karp.py"

from __future__ import nested_scopes

import bisect

class _Num:
    def __init__(self, value, index):
        self.value = value
        self.i = index

    def __lt__(self, other):
        return self.value < other.value

# This implements the Karmarkar-Karp heuristic for partitioning a set
# in two, i.e. into two disjoint subsets s.t. their sums are
# approximately equal.  It produces only one result, in O(N*log N)
# time.  A remarkable property is that it loves large sets:  in
# general, the more numbers you feed it, the better it does.

class Partition:
    def __init__(self, nums):
        self.nums = nums
        sorted = [_Num(nums[i], i) for i in range(len(nums))]
        sorted.sort()
        self.sorted = sorted

    def run(self):
        sorted = self.sorted[:]
        N = len(sorted)
        connections = [[] for i in range(N)]

        while len(sorted) > 1:
            bigger  = sorted.pop()
            smaller = sorted.pop()

            # Force these into different sets, by "drawing a
            # line" connecting them.
            i, j = bigger.i, smaller.i
            connections[i].append(j)
            connections[j].append(i)

            diff = bigger.value - smaller.value
            assert diff >= 0
            bisect.insort(sorted, _Num(diff, i))

        # Now sorted contains only 1 element x, and x.value is
        # the difference between the subsets' sums.

        # Theorem:  The connections matrix represents a spanning tree
        # on the set of index nodes, and any tree can be 2-colored.
        # 2-color this one (with "colors" 0 and 1).

        index2color = [None] * N

        def color(i, c):
            if index2color[i] is not None:
                assert index2color[i] == c
                return
            index2color[i] = c
            for j in connections[i]:
                color(j, 1-c)

        color(0, 0)

        # Partition the indices by their colors.
        subsets = [[], []]
        for i in range(N):
            subsets[index2color[i]].append(i)

        return subsets

N = 50
import math
x = [math.sqrt(i) for i in range(1, N+1)]
p = Partition(x)
s, t = p.run()
sum1 = 0L
sum2 = 0L
for i in s:
    sum1 += x[i]
for i in t:
    sum2 += x[i]
print "Set 1 sum", repr(sum1)
print "Set 2 sum", repr(sum2)
print "difference", repr(abs(sum1 - sum2))

------=_NextPart_000_0008_01C1277F.6EAF5A00--