[Python-ideas] [Python-Dev] minmax() function returning (minimum, maximum) tuple of a sequence
Tal Einat
taleinat at gmail.com
Wed Oct 13 23:54:31 CEST 2010
On Mon, Oct 11, 2010 at 10:18 PM, Tal Einat wrote:
> Masklinn wrote:
>> On 2010-10-11, at 02:55 , Zac Burns wrote:
>>>
>>> Unfortunately this solution seems incompatable with the implementations with
>>> for loops in min and max (EG: How do you switch functions at the right
>>> time?) So it might take some tweaking.
>> As far as I know, there is no way to force lockstep iteration of arbitrary functions in Python. Though an argument could be made for adding coroutine capabilities to builtins and library functions taking iterables, I don't think that's on the books.
>>
>> As a result, this function would devolve into something along the lines of
>>
>> def apply(iterable, *funcs):
>> return map(lambda c: c[0](c[1]), zip(funcs, tee(iterable, len(funcs))))
>>
>> which would run out of memory on very long or nigh-infinite iterables due to tee memoizing all the content of the iterator.
>
> We recently needed exactly this -- to do several running calculations
> in parallel on an iterable. We avoided using co-routines and just
> created a RunningCalc class with a simple interface, and implemented
> various running calculations as sub-classes, e.g. min, max, average,
> variance, n-largest. This isn't very fast, but since generating the
> iterated values is computationally heavy, this is fast enough for our
> uses.
>
> Having a standard method to do this in Python, with implementations
> for common calculations in the stdlib, would have been nice.
>
> I wouldn't mind trying to work up a PEP for this, if there is support
> for the idea.
After some thought, I've found a way to make running several "running
calculations" in parallel fast. Speed should be comparable to having
used the non-running variants.
The method is to give each running calculation "blocks" of values
instead of just one at a time. The apply_in_parallel(iterable,
block_size=1000, *running_calcs) function would get blocks of values
from the iterable and pass them to each running calculation
separately. So RunningMax would look something like this:
class RunningMax(RunningCalc):
def __init__(self):
self.max_value = None
def feed(self, value):
if self.max_value is None or value > self.max_value:
self.max_value = value
def feedMultiple(self, values):
self.feed(max(values))
feedMultiple() would have a naive default implementation in the base class.
Now this is non-trivial and can certainly be useful. Thoughts? Comments?
- Tal Einat
More information about the Python-ideas
mailing list