Classical FP problem in python : Hamming problem

Tue Jan 25 01:27:21 EST 2005

Ok I'll submit the patch with the prose pretty soon.
Thank you
Francis Girard
FRANCE

Le mardi 25 Janvier 2005 04:29, Tim Peters a écrit :
> [Francis Girard]
>
> > For all the algorithms that run after their tail in an FP way, like the
> > Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
> > Eratosthene -- there's a subtle difference), i.e. all those algorithms
> > that typically rely upon recursion to get the beginning of the generated
> > elements in order to generate the next one ...
> >
> > ... so for this family of algorithms, we have, somehow, to abandon
> > recursion, and to explicitely maintain one or many lists of what had been
> > generated.
> >
> > One solution for this is suggested in "test_generators.py". But I think
> > that this solution is evil as it looks like recursion is used but it is
> > not and it depends heavily on how the m235 function (for example) is
> > invoked.
>
> Well, yes -- "Heh.  Here's one way to get a shared list, complete with
> an excruciating namespace renaming trick" was intended to warn you in
> advance that it wasn't pretty <wink>.
>
> > Furthermore, it is _NOT_ memory efficient as pretended : it leaks !
>
> Yes.  But there are two solutions to the problem in that file, and the
> second one is in fact extremely memory-efficient compared to the first
> one.  "Efficiency" here was intended in a relative sense.
>
> > It internally maintains a lists of generated results that grows forever
> > while it is useless to maintain results that had been "consumed". Just
> > for a gross estimate, on my system, percentage of memory usage for this
> > program grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies
> > between 31%-36%. Ugly and inefficient.
>
> Try the first solution in the file for a better understanding of what
> "inefficient" means <wink>.
>
> > The solution of Jeff Epler is far more elegant. The "itertools.tee"
> > function does just what we want. It internally maintain a list of what
> > had been generated and deletes the "consumed" elements as the algo
> > unrolls. To follow with my gross estimate, memory usage grows from 1.2%
> > to 1.9% after 5 minutes (probably only affected by the growing size of
> > Huge Integer). CPU usage varies between 27%-29%.
> > Beautiful and effecient.
>
> Yes, it is better.  tee() didn't exist when generators (and
> test_generators.py) were written, so of course nothing in the test
> file uses them.
>
> > You might think that we shouldn't be that fussy about memory usage on
> > generating 100 digits numbers but we're talking about a whole family of
> > useful FP algorithms ; and the best way to implement them should be
> > established.
>
> Possibly -- there really aren't many Pythonistas who care about this.
>
> > For this family of algorithms, itertools.tee is the way to go.
> >
> > I think that the semi-tutorial in "test_generators.py" should be updated
> > to use "tee". Or, at least, a severe warning comment should be written.
>
> Please submit a patch.  The purpose of that file is to test
> generators, so you should add a third way of doing it, not replace the
> two ways already there.  It should also contain prose explaining why
> the third way is better (just as there's prose now explaining why the
> second way is better than the first).