Itertools generator injection

This is closely related to https://bugs.python.org/issue40230 - "Itertools.product() Out of Memory Errors." `itertools.product()` completely consumes all input iterables before yielding any values, which can cause memory issues in certain (extreme) cases. I recently needed to partially iterate through the product of two very large iterators which were themselves power sets of some input values. For unlucky inputs these iterators were enormous (but not infinite) and `itertools.product()` used up all the memory on my machine, even though I sometimes only needed the first few values. My solution was to write my own version of `itertools.product()` that takes generators instead of iterables. This allowed me to recreate and loop through the required iterables multiple times without ever having to store all their values. Product tuples are produced immediately and the iteration can be completed for large inputs without runaway memory consumption. I'm proposing that equivalent functions accepting functions/generators: maybe `fproduct()`, `fcombinations()` etc, be added to `itertools`. e.g. simplified `fproduct` for 2 inputs: ``` def fproduct(f1, f2): for a in f1(): for b in f2(): yield (a, b) ``` Would others find this useful? Are there any drawbacks I'm missing? Best, Alastair

I'm assuming things like this are what you're talking about: def lazy_product2(*args, repeat=1): "recursive algorithm" args = args * repeat if not args: yield () return for all_but_last in lazy_product2(*args[:-1]): for last in args[-1](): yield all_but_last + (last,) def lazy_product(*args, repeat=1): "non-recursive algorithm" funcs = args * repeat iterators = [iter(f()) for f in funcs] try: result = [next(it) for it in iterators] except StopIteration: return rev_range = range(len(iterators))[::-1] sentinel = object() while True: yield tuple(result) # get the next one for i in rev_range: result[i] = next(iterators[i], sentinel) if result[i] is sentinel: iterators[i] = iter(funcs[i]()) result[i] = next(iterators[i], sentinel) if result[i] is sentinel: # No new data this time around return else: # stop "carrying", we incremented one. # ready to yield. break else: # no break return import itertools assert (list(lazy_product(lambda: range(3), lambda: range(2), repeat=2)) == list(itertools.product(range(3), range(2), repeat=2)) == list(lazy_product2(lambda: range(3), lambda: range(2), repeat=2))) infinite = lazy_product(itertools.count, lambda: "abc") print([next(infinite) for i in range(10)]) # [(0, 'a'), (0, 'b'), (0, 'c'), # (1, 'a'), (1, 'b'), (1, 'c'), # (2, 'a'), (2, 'b'), (2, 'c'), # (3, 'a')] I think these don't generally fit with the "feel" of the itertools module though since almost everything there accepts iterables and returns iterators, and it's very focused on performance. Accepting no-arg callables that return iterators doesn't feel like an obvious API to me. Another option is to accept iterables that repeatedly get iter() called on them, but that's hard to make work with generators, and people confuse iterables and iterators anyway, so that might be too subtle. Maybe it's the sort of thing that just goes into a personal library of small utility functions.

Hi I see two issues. The first is present behaviour. The second is alternative ways of ordering the elements of a Cartesian product. Here's an example of the present behaviour. >>> iter_range = iter(range(100)) >>> prod = itertools.product(iter_range, "abcdef") >>> next(iter_range) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration It's not what I expected before I read this thread, and it is what I expected after I read this thread. I regard it as a bug. I expect iterators to be lazy rather than greedy when consuming input. This helps make for efficient and non-blocking pipelines. I see no reason for the product to consume any items until it has been asked to yield a member of the product. The second issue is that there are at least three sensible ways to iterate a Cartesian product. The itertools.product function is, as said in the docs, equivalent to nested for loops. On ordered input it's equivalent to lexicographic ordering. The squares on a chessboard are labelled a1, a2, a3, .... b1, b2, b3, ... , g7, g8 (using lexicographic order. There are as many fractions as there are counting numbers. The usual way to prove this is to order unsimplified fractions p/q via the TRIPLE (p + q, p, q). This gives a diagonal ordering. The third order is via subsquares. In other words order via the TRIPLE (max(p,q), p, q). I suggest on the basis of batteries included that all three methods be included in itertools, perhaps with names lexicproduct, diagproduct and maxproduct. Finally, product is a synonym for lexicproduct. This allows the programmer to say without writing a comment that they've considered the alternatives and chosen the function also known as product. with best regards Jonathan

On Fri, Mar 19, 2021 at 8:59 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I think these don't generally fit with the "feel" of the itertools module though since almost everything there accepts iterables and returns iterators
itertools.product as written barely seems to fit. It might as well accept only sequences, since it immediately converts everything into a sequence anyway. It's also documented as "equivalent to a nested for-loop" which is plainly untrue for several different reasons. A simple improvement would be to keep the first argument as an uncached iterator if repeat=1. Not only would that improve memory usage in many cases, but it would allow for things like product(count(1), ...) which ought to work in my opinion. Another option is to accept iterables that repeatedly get iter() called on
them, but that's hard to make work with generators
That variant of itertools.product (one which is truly equivalent to a nested for loop) would have to be a new function, since it wouldn't be backward compatible. The lack of a convenient way to make it work with generators feels like a hole in the standard library that should be fixed: class ReiterableGenerator: # defined in itertools def __init__(self, f, *args, **kwargs): self._f, self._args, self._kwargs = f, args, kwargs def __iter__(self): return self._f(*self._args, **self._kwargs) primes = ReiterableGenerator(lambda: (n for n in count(2) if isprime(n)))

I have very often wanted versions of itertools.product(), itertools.permutations(), and itertools.combinations() that don't concretize their iterators. They very much violate the spirit of itertools, and hard-lock my machine when I forget the bad design and try infinite iterators. I don't think we necessarily need variations for every sensible ordering of produced items. But not freezing works be a good start. On Fri, Mar 19, 2021, 4:06 PM Alastair Stanley via Python-ideas < python-ideas@python.org> wrote:
This is closely related to https://bugs.python.org/issue40230 - "Itertools.product() Out of Memory Errors." `itertools.product()` completely consumes all input iterables before yielding any values, which can cause memory issues in certain (extreme) cases.
I recently needed to partially iterate through the product of two very large iterators which were themselves power sets of some input values. For unlucky inputs these iterators were enormous (but not infinite) and `itertools.product()` used up all the memory on my machine, even though I sometimes only needed the first few values.
My solution was to write my own version of `itertools.product()` that takes generators instead of iterables. This allowed me to recreate and loop through the required iterables multiple times without ever having to store all their values. Product tuples are produced immediately and the iteration can be completed for large inputs without runaway memory consumption.
I'm proposing that equivalent functions accepting functions/generators: maybe `fproduct()`, `fcombinations()` etc, be added to `itertools`.
e.g. simplified `fproduct` for 2 inputs:
``` def fproduct(f1, f2): for a in f1(): for b in f2(): yield (a, b) ```
Would others find this useful? Are there any drawbacks I'm missing?
Best, Alastair _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RFX7C2... Code of Conduct: http://python.org/psf/codeofconduct/

Hm maybe there is something worthwhile here. I implemented a Python version with the same semantics as itertools.product, except only consuming the iterators when absolutely necessary; still only consuming each iterator once, but building its pools incrementally. See https://gist.github.com/sweeneyde/fb6734d7b9f7d17e132c28af9ecb6270 from itertools import count it = LazyProductObject(count(0), count(0), "ab", repeat=3) assert next(it) == (0, 0, "a", 0, 0, "a", 0, 0, "a") assert next(it) == (0, 0, "a", 0, 0, "a", 0, 0, "b") assert next(it) == (0, 0, "a", 0, 0, "a", 0, 1, "a") assert next(it) == (0, 0, "a", 0, 0, "a", 0, 1, "b") assert next(it) == (0, 0, "a", 0, 0, "a", 0, 2, "a") assert next(it) == (0, 0, "a", 0, 0, "a", 0, 2, "b") It looks like it could be made to have a similar tight loop as the current implementation if we decided to re-write it in C.
participants (5)
-
Alastair Stanley
-
Ben Rudiak-Gould
-
David Mertz
-
Dennis Sweeney
-
Jonathan Fine