[Python-ideas] [Python-Dev] minmax() function returning (minimum, maximum) tuple of a sequence
Guido van Rossum
guido at python.org
Tue Oct 26 17:43:00 CEST 2010
On Tue, Oct 26, 2010 at 3:10 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> Perhaps I'm missing something, but to my mind, that's an awfully complicated
> solution for such a simple problem. Here's my attempt:
>
> def multi_reduce(iterable, funcs):
> it = iter(iterable)
> collectors = [next(it)]*len(funcs)
> for i, f in enumerate(funcs):
> x = next(it)
> collectors[i] = f(collectors[i], x)
> return collectors
>
> I've called it multi_reduce rather than parallel_reduce, because it doesn't
> execute the functions in parallel. By my testing on Python 3.1.1,
> multi_reduce is consistently ~30% faster than the generator based solution
> for lists with 1000 - 10,000,000 items.
>
> So what am I missing? What does your parallel_reduce give us that
> multi_reduce doesn't?
You're right, the point I wanted to prove was that generators are
better than threads, but the code was based on emulating reduce(). The
generalization that I was aiming for was that it is convenient to
write a generator that does some kind of computation over a sequence
of items and returns a result at the end, and then have a driver that
feeds a single sequence to a bunch such generators. This is more
powerful once you try to use reduce to compute e.g. the average of the
numbers fed to it -- of course you can do it using a function of
(state, value) but it is so much easier to write as a loop! (At least
for me -- if you do nothing but write Haskell all day I'm sure it
comes naturally. :-)
def avg():
total = 0
count = 0
try:
while True:
value = yield
total += value
count += 1
except GeneratorExit:
raise StopIteration(total / count)
The essential boilerplate here is
try:
while True:
value = yield
<use value>
except GeneratorExit:
raise StopIteration(<compute result>)
No doubt functional aficionados will snub this, but in Python, this
should run much faster than the same thing written as a reduce-ready
function, due to the function overhead (which wasn't a problem in the
min/max example since those are built-ins).
BTW This episode led me to better understand my objection against
reduce() as the universal hammer: my problem with writing avg() using
reduce is that the function one feeds into reduce is asymmetric -- its
first argument must be some state, e.g. a tuple (total, count), and
the second argument must be the next value. This is the moment that my
head reliably explodes -- even though it has no problem visualizing
reduce() using a *symmetric* function like +, min or max.
Also note that the reduce() based solution would have to have a
separate function to extract the desired result (total / count) from
the state (total, count), and for multi_reduce() you would have to
supply a separate list of functions for these or some other hacky
approach.
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list