On Tue, Oct 26, 2010 at 3:10 AM, Steven D'Aprano
Perhaps I'm missing something, but to my mind, that's an awfully complicated solution for such a simple problem. Here's my attempt:
def multi_reduce(iterable, funcs): it = iter(iterable) collectors = [next(it)]*len(funcs) for i, f in enumerate(funcs): x = next(it) collectors[i] = f(collectors[i], x) return collectors
I've called it multi_reduce rather than parallel_reduce, because it doesn't execute the functions in parallel. By my testing on Python 3.1.1, multi_reduce is consistently ~30% faster than the generator based solution for lists with 1000 - 10,000,000 items.
So what am I missing? What does your parallel_reduce give us that multi_reduce doesn't?
You're right, the point I wanted to prove was that generators are better than threads, but the code was based on emulating reduce(). The generalization that I was aiming for was that it is convenient to write a generator that does some kind of computation over a sequence of items and returns a result at the end, and then have a driver that feeds a single sequence to a bunch such generators. This is more powerful once you try to use reduce to compute e.g. the average of the numbers fed to it -- of course you can do it using a function of (state, value) but it is so much easier to write as a loop! (At least for me -- if you do nothing but write Haskell all day I'm sure it comes naturally. :-) def avg(): total = 0 count = 0 try: while True: value = yield total += value count += 1 except GeneratorExit: raise StopIteration(total / count) The essential boilerplate here is try: while True: value = yield <use value> except GeneratorExit: raise StopIteration(<compute result>) No doubt functional aficionados will snub this, but in Python, this should run much faster than the same thing written as a reduce-ready function, due to the function overhead (which wasn't a problem in the min/max example since those are built-ins). BTW This episode led me to better understand my objection against reduce() as the universal hammer: my problem with writing avg() using reduce is that the function one feeds into reduce is asymmetric -- its first argument must be some state, e.g. a tuple (total, count), and the second argument must be the next value. This is the moment that my head reliably explodes -- even though it has no problem visualizing reduce() using a *symmetric* function like +, min or max. Also note that the reduce() based solution would have to have a separate function to extract the desired result (total / count) from the state (total, count), and for multi_reduce() you would have to supply a separate list of functions for these or some other hacky approach. -- --Guido van Rossum (python.org/~guido)