[Python-ideas] [Python-Dev] minmax() function returning (minimum, maximum) tuple of a sequence

Wed Oct 13 23:54:31 CEST 2010

On Mon, Oct 11, 2010 at 10:18 PM, Tal Einat wrote:
> Masklinn wrote:
>> On 2010-10-11, at 02:55 , Zac Burns wrote:
>>>
>>> Unfortunately this solution seems incompatable with the implementations with
>>> for loops in min and max (EG: How do you switch functions at the right
>>> time?) So it might take some tweaking.
>> As far as I know, there is no way to force lockstep iteration of arbitrary functions in Python. Though an argument could be made for adding coroutine capabilities to builtins and library functions taking iterables, I don't think that's on the books.
>>
>> As a result, this function would devolve into something along the lines of
>>
>>    def apply(iterable, *funcs):
>>        return map(lambda c: c[0](c[1]), zip(funcs, tee(iterable, len(funcs))))
>>
>> which would run out of memory on very long or nigh-infinite iterables due to tee memoizing all the content of the iterator.
>
> We recently needed exactly this -- to do several running calculations
> in parallel on an iterable. We avoided using co-routines and just
> created a RunningCalc class with a simple interface, and implemented
> various running calculations as sub-classes, e.g. min, max, average,
> variance, n-largest. This isn't very fast, but since generating the
> iterated values is computationally heavy, this is fast enough for our
> uses.
>
> Having a standard method to do this in Python, with implementations
> for common calculations in the stdlib, would have been nice.
>
> I wouldn't mind trying to work up a PEP for this, if there is support
> for the idea.

After some thought, I've found a way to make running several "running
calculations" in parallel fast. Speed should be comparable to having
used the non-running variants.

The method is to give each running calculation "blocks" of values
instead of just one at a time. The apply_in_parallel(iterable,
block_size=1000, *running_calcs) function would get blocks of values
from the iterable and pass them to each running calculation
separately. So RunningMax would look something like this:

class RunningMax(RunningCalc):
    def __init__(self):
        self.max_value = None

    def feed(self, value):
        if self.max_value is None or value > self.max_value:
            self.max_value = value

    def feedMultiple(self, values):
        self.feed(max(values))

feedMultiple() would have a naive default implementation in the base class.

Now this is non-trivial and can certainly be useful. Thoughts? Comments?

- Tal Einat