Improve this recursive code please!
waxmop
waxmop at sarcastic-horse.com
Tue May 6 16:14:29 EDT 2003
This is great stuff. I should have realized that the results dictionary was
the memory hog, and not all the recursive calls.
At first I thought the shelve module might be a good alternative to a standard
dict, but that won't work because I'm using tuples as keys, and shelve
requires that keys are strings.
I printed out your message and I'll study it some more tonight until I
understand it completely.
I also want to write a generator/iterator approach for this problem, just to
further my python skills.
Thanks for the valuable message.
Steven Taschuk wrote:
> Quoth waxmop:
> [...]
> > The program works ok, especially when numbricks and numbins are under 5
> > each. However, I need to run this program when numbricks = 100 or more,
> > and I'm concerned about memory issues.
> >
> > Is there a way I can avoid building a giant stack? This program doesn't
> > need to run fast at all. [...]
>
> You're right to be concerned about memory consumption, but not in
> the stack per se. Your search is equivalent to this:
>
> def findarrangements(bricksleft, bins, results):
> if bricksleft > 0:
> # some bricks yet to be assigned
> for i in range(len(bins)):
> # try putting next brick in bin i
> bins[i] += 1
> findarrangements(bricksleft - 1, bins, results)
> bins[i] -= 1
> else:
> # all bricks have been put in bins
> results[tuple(bins)] = 1
>
> In this version it is clear that, at each recursive call, the
> value of bricksleft decreases by one, and the recursion stops when
> there are no bricks left. Thus the depth of the recursion is no
> more than the total number of bricks, which is not so bad.
>
> The real memory hog is the results dict, which maintains in memory
> every result tuple. That's a lot of tuples for even modest
> numbers of bricks and bins:
>
> 1 brick in 1 bin: 1 arrangement
> 2 bricks in 2 bins: 3 arrangements
> 3 bricks in 3 bins: 10 arrangements
> 4 bricks in 4 bins: 35 arrangements
> 5 bricks in 5 bins: 126 arrangements
> 6 bricks in 6 bins: 462 arrangements
> 7 bricks in 7 bins: 1716 arrangements
>
> As you can see, the number of arrangements grows much more rapidly
> than the number of bricks or bins, which makes it the larger
> concern if you want to scale up to hundreds of bricks or bins.
>
> So, to reduce the memory consumption of this program, I'd start by
> not storing the whole set of results. A quick and dirty way to do
> this is simply to print the tuple in mk_tree_r and forget about
> it, rather than storing it to be printed later. For example:
>
> def mk_tree(numbricks, numbins):
> bin = [0]*numbins
> mk_tree_r(numbricks, numbins, bin)
>
> def mk_tree_r(numbricks, numbins, bin):
> if countbricks(bin) < numbricks:
> for i in range(numbins):
> bin[i] += 1
> mk_tree_r(numbricks, numbins, bin)
> bin[i] -= 1
> else:
> print tuple(bin)
>
> Unfortunately this approach loses functionality: it prints
> duplicate results, and it does not print them in sorted order. To
> recover this functionality, I'd suggest a change of algorithm.
>
> First let's look at the duplicates: the above version produces
> them because it tries assigning each brick to each bin; for
> example, when putting two bricks in two bins, it can produce (1,
> 1) in two ways:
> 1: Put brick 1 in bin 1, then put brick 2 in bin 2.
> 2: Put brick 1 in bin 2, then put brick 2 in bin 1.
> But because we don't care about the difference between brick 1 and
> brick 2, these produce the same arrangement.
>
> So, instead of assigning bricks one by one, let's try assigning to
> the bins one by one. Something like this:
>
> def findarrangements(bricksleft, binsleft, bins):
> if binsleft > 1:
> # More than one bin left, and we need to distribute
> # bricksleft bricks among them.
> for n in range(bricksleft+1):
> # Find arrangements in which next bin has n bricks.
> bins.append(n)
> findarrangements(binsleft - 1, bricksleft - n, bins)
> bins.pop()
> else:
> # Only one bin left; put all remaining bricks there.
> bins.append(bricksleft)
> print tuple(binsbuilt)
>
> This version's recursion depth is no more than the number of bins;
> it prints no duplicates, as desired, and only keeps one result in
> memory at a time. It also, happily, prints the results in a
> sorted order; this order happens to be the reverse of what your
> program prints. (I leave fixing this to you. You *don't* need to
> put all the tuples in a list and .reverse() it.)
>
> Also note that this version, like your program, considers (5,0,0)
> to be distinct from (0,5,0) and (0,0,5), for example. I don't
> know whether that's what you want, but it too could be changed
> without much trouble.
>
> One disadvantage of these functions as I've written them is that
> they do the printing themselves; this is undesirable if you ever
> want to do different things with the results, since then you have
> to change the search code. There's ways to separate the
> what-to-do-with-results part from the search-for-results part
> without consuming excessive amounts of memory (chiefly generators
> or callbacks); if you're interested, ask.
>
> And for large enough values, you'll hit the maximum recursion
> depth of the interpreter; this varies by version and
> implementation. (In CPython 2.3 it's 1000 out of the box, I
> think, so with what I propose above, you won't be able to find
> arrangements for more than about a thousand bins.) There are ways
> to deal with this too (such as managing your own search stack
> instead of using the function call stack). Again, if you're
> interested, ask.
>
> --
> Steven Taschuk staschuk at telusplanet.net
> Every public frenzy produces legislation purporting to address it.
> (Kinsley's Law)
More information about the Python-list
mailing list