[Python-ideas] [Python-Dev] minmax() function returning (minimum, maximum) tuple of a sequence

Steven D'Aprano steve at pearwood.info
Tue Oct 26 12:10:59 CEST 2010


Guido van Rossum wrote:

> This should not require threads.
> 
> Here's a bare-bones sketch using generators:
> 
> def reduce_collector(func):
>     outcome = None
>     while True:
>         try:
>             val = yield
>         except GeneratorExit:
>             break
>         if outcome is None:
>             outcome = val
>         else:
>             outcome = func(outcome, val)
>     raise StopIteration(outcome)
> 
> def parallel_reduce(iterable, funcs):
>     collectors = [reduce_collector(func) for func in funcs]
>     values = [None for _ in collectors]
>     for i, coll in enumerate(collectors):
>         try:
>             next(coll)
>         except StopIteration as err:
>             values[i] = err.args[0]
>             collectors[i] = None
>     for val in iterable:
>         for i, coll in enumerate(collectors):
>             if coll is not None:
>                 try:
>                     coll.send(val)
>                 except StopIteration as err:
>                     values[i] = err.args[0]
>                     collectors[i] = None
>     for i, coll in enumerate(collectors):
>         if coll is not None:
>             try:
>                 coll.throw(GeneratorExit)
>             except StopIteration as err:
>                 values[i] = err.args[0]
>     return values


Perhaps I'm missing something, but to my mind, that's an awfully 
complicated solution for such a simple problem. Here's my attempt:

def multi_reduce(iterable, funcs):
     it = iter(iterable)
     collectors = [next(it)]*len(funcs)
     for i, f in enumerate(funcs):
         x = next(it)
         collectors[i] = f(collectors[i], x)
     return collectors

I've called it multi_reduce rather than parallel_reduce, because it 
doesn't execute the functions in parallel. By my testing on Python 
3.1.1, multi_reduce is consistently ~30% faster than the generator based 
solution for lists with 1000 - 10,000,000 items.

So what am I missing? What does your parallel_reduce give us that 
multi_reduce doesn't?



-- 
Steven




More information about the Python-ideas mailing list