Classical FP problem in python : Hamming problem

Mon Jan 24 22:29:17 EST 2005

[Francis Girard]
> For all the algorithms that run after their tail in an FP way, like the
> Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
> -- there's a subtle difference), i.e. all those algorithms that typically
> rely upon recursion to get the beginning of the generated elements in order
> to generate the next one ...
>
> ... so for this family of algorithms, we have, somehow, to abandon recursion,
> and to explicitely maintain one or many lists of what had been generated.
> 
> One solution for this is suggested in "test_generators.py". But I think that
> this solution is evil as it looks like recursion is used but it is not and it
> depends heavily on how the m235 function (for example) is invoked.

Well, yes -- "Heh.  Here's one way to get a shared list, complete with
an excruciating namespace renaming trick" was intended to warn you in
advance that it wasn't pretty <wink>.

> Furthermore, it is _NOT_ memory efficient as pretended : it leaks !

Yes.  But there are two solutions to the problem in that file, and the
second one is in fact extremely memory-efficient compared to the first
one.  "Efficiency" here was intended in a relative sense.

> It internally maintains a lists of generated results that grows forever while it
> is useless to maintain results that had been "consumed". Just for a gross
> estimate, on my system, percentage of memory usage for this program grows
> rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
> Ugly and inefficient.

Try the first solution in the file for a better understanding of what
"inefficient" means <wink>.

> The solution of Jeff Epler is far more elegant. The "itertools.tee" function
> does just what we want. It internally maintain a list of what had been
> generated and deletes the "consumed" elements as the algo unrolls. To follow
> with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
> (probably only affected by the growing size of Huge Integer). CPU usage
> varies between 27%-29%.
> Beautiful and effecient.

Yes, it is better.  tee() didn't exist when generators (and
test_generators.py) were written, so of course nothing in the test
file uses them.

> You might think that we shouldn't be that fussy about memory usage on
> generating 100 digits numbers but we're talking about a whole family of
> useful FP algorithms ; and the best way to implement them should be
> established.

Possibly -- there really aren't many Pythonistas who care about this.

> For this family of algorithms, itertools.tee is the way to go.
>
> I think that the semi-tutorial in "test_generators.py" should be updated to
> use "tee". Or, at least, a severe warning comment should be written.

Please submit a patch.  The purpose of that file is to test
generators, so you should add a third way of doing it, not replace the
two ways already there.  It should also contain prose explaining why
the third way is better (just as there's prose now explaining why the
second way is better than the first).