Improve this recursive code please!

Tue May 6 16:14:29 EDT 2003

This is great stuff.  I should have realized that the results dictionary was
the memory hog, and not all the recursive calls.

At first I thought the shelve module might be a good alternative to a standard
dict, but that won't work because I'm using tuples as keys, and shelve
requires that keys are strings.

I printed out your message and I'll study it some more tonight until I
understand it completely.

I also want to write a generator/iterator approach for this problem, just to
further my python skills.

Thanks for the valuable message.

Steven Taschuk wrote:

> Quoth waxmop:
>   [...]
> > The program works ok, especially when numbricks and numbins are under 5
> > each.  However, I need to run this program when numbricks = 100 or more,
> > and I'm concerned about memory issues.
> >
> > Is there a way I can avoid building a giant stack?  This program doesn't
> > need to run fast at all.  [...]
>
> You're right to be concerned about memory consumption, but not in
> the stack per se.  Your search is equivalent to this:
>
>     def findarrangements(bricksleft, bins, results):
>         if bricksleft > 0:
>             # some bricks yet to be assigned
>             for i in range(len(bins)):
>                 # try putting next brick in bin i
>                 bins[i] += 1
>                 findarrangements(bricksleft - 1, bins, results)
>                 bins[i] -= 1
>         else:
>             # all bricks have been put in bins
>             results[tuple(bins)] = 1
>
> In this version it is clear that, at each recursive call, the
> value of bricksleft decreases by one, and the recursion stops when
> there are no bricks left.  Thus the depth of the recursion is no
> more than the total number of bricks, which is not so bad.
>
> The real memory hog is the results dict, which maintains in memory
> every result tuple.  That's a lot of tuples for even modest
> numbers of bricks and bins:
>
>     1 brick in 1 bin:   1 arrangement
>     2 bricks in 2 bins: 3 arrangements
>     3 bricks in 3 bins: 10 arrangements
>     4 bricks in 4 bins: 35 arrangements
>     5 bricks in 5 bins: 126 arrangements
>     6 bricks in 6 bins: 462 arrangements
>     7 bricks in 7 bins: 1716 arrangements
>
> As you can see, the number of arrangements grows much more rapidly
> than the number of bricks or bins, which makes it the larger
> concern if you want to scale up to hundreds of bricks or bins.
>
> So, to reduce the memory consumption of this program, I'd start by
> not storing the whole set of results.  A quick and dirty way to do
> this is simply to print the tuple in mk_tree_r and forget about
> it, rather than storing it to be printed later.  For example:
>
>     def mk_tree(numbricks, numbins):
>         bin = [0]*numbins
>         mk_tree_r(numbricks, numbins, bin)
>
>     def mk_tree_r(numbricks, numbins, bin):
>         if countbricks(bin) < numbricks:
>             for i in range(numbins):
>                 bin[i] += 1
>                 mk_tree_r(numbricks, numbins, bin)
>                 bin[i] -= 1
>         else:
>             print tuple(bin)
>
> Unfortunately this approach loses functionality: it prints
> duplicate results, and it does not print them in sorted order.  To
> recover this functionality, I'd suggest a change of algorithm.
>
> First let's look at the duplicates: the above version produces
> them because it tries assigning each brick to each bin; for
> example, when putting two bricks in two bins, it can produce (1,
> 1) in two ways:
>     1:  Put brick 1 in bin 1, then put brick 2 in bin 2.
>     2:  Put brick 1 in bin 2, then put brick 2 in bin 1.
> But because we don't care about the difference between brick 1 and
> brick 2, these produce the same arrangement.
>
> So, instead of assigning bricks one by one, let's try assigning to
> the bins one by one.  Something like this:
>
>     def findarrangements(bricksleft, binsleft, bins):
>         if binsleft > 1:
>             # More than one bin left, and we need to distribute
>             # bricksleft bricks among them.
>             for n in range(bricksleft+1):
>                 # Find arrangements in which next bin has n bricks.
>                 bins.append(n)
>                 findarrangements(binsleft - 1, bricksleft - n, bins)
>                 bins.pop()
>         else:
>             # Only one bin left; put all remaining bricks there.
>             bins.append(bricksleft)
>             print tuple(binsbuilt)
>
> This version's recursion depth is no more than the number of bins;
> it prints no duplicates, as desired, and only keeps one result in
> memory at a time.  It also, happily, prints the results in a
> sorted order; this order happens to be the reverse of what your
> program prints.  (I leave fixing this to you.  You *don't* need to
> put all the tuples in a list and .reverse() it.)
>
> Also note that this version, like your program, considers (5,0,0)
> to be distinct from (0,5,0) and (0,0,5), for example.  I don't
> know whether that's what you want, but it too could be changed
> without much trouble.
>
> One disadvantage of these functions as I've written them is that
> they do the printing themselves; this is undesirable if you ever
> want to do different things with the results, since then you have
> to change the search code.  There's ways to separate the
> what-to-do-with-results part from the search-for-results part
> without consuming excessive amounts of memory (chiefly generators
> or callbacks); if you're interested, ask.
>
> And for large enough values, you'll hit the maximum recursion
> depth of the interpreter; this varies by version and
> implementation.  (In CPython 2.3 it's 1000 out of the box, I
> think, so with what I propose above, you won't be able to find
> arrangements for more than about a thousand bins.)  There are ways
> to deal with this too (such as managing your own search stack
> instead of using the function call stack).  Again, if you're
> interested, ask.
>
> --
> Steven Taschuk                            staschuk at telusplanet.net
> Every public frenzy produces legislation purporting to address it.
>                                                   (Kinsley's Law)