reduce()--what is it good for? (was: Re: reduce() anomaly?)

Fri Nov 7 18:17:26 EST 2003

Erik Max Francis wrote:

> Alex Martelli wrote:
> 
>> However, so many of reduce's practical use cases are eaten up by sum,
>> that reduce is left without real use cases to justify its existence.
> 
> Any reduction that doesn't involve summing won't be handled by sum.
> Flattening a list of (non-recursive) lists is a good example.  reduce is

If this a good example, what's a bad one?

>>> sum([ ['a list'], ['of', 'non'], ['recursive', 'lists'] ], [])
['a list', 'of', 'non', 'recursive', 'lists']

sum, exactly like reduce(operator.add ... , has bad performance for this
kind of one-level flattening, by the way.  A plain loop on .extend is much
better than either (basically, whenever a loop of x+=y has much better 
performance than one of x = x + y -- since the latter is what both sum
and reduce(operator.add... are condemned to do -- you should consider
choosing a simple loop if performance has any importance at all).

> already in the language; removing an existing, builtin function seems
> totally inappropriate given that it's there for a reason and there will
> be no replacement.

Existing, builtin functions _will_ be removed in 3.0: Guido is on record
as stating that (at both Europython and OSCON -- I don't recall if he
had already matured that determination at PythonUK time).  They
exist for a reason, but when that reason is: "once upon a time, we
thought (perhaps correctly, given the way the rest of the language and 
library was at the time) that they were worth having", that's not 
sufficient reason to weigh down the language forever with their 
not-useful-enough weight.  The alternatives to removing those parts that 
aren't useful enough any more are, either to stop Python's development 
forever, or to make Python _bloated_ with several ways to perform the
same tasks.  I much prefer to "lose" the "legacy only real use" built-ins
(presumably to some kind of legacy.py module whence they can easily
be installed to keep some old and venerable piece of code working) than
to choose either of those tragic alternatives.

3.0 is years away, but functions that are clearly being aimed at for 
obsolencence then should, IMHO, already be better avoided in new code;
particularly because the obsolescence is due to the existence of better
alternatives.  You claim "there will be no replacement", but in fact I have
posted almost a dozen possible replacements for 'reduce' for a case that
was being suggested as a good one for it and _every_ one of them is
better than reduce; I have posted superior replacements for ALL uses of
reduce in the standard library (except those that are testing reduce itself,
but if that's the only good use of reduce -- testing itself -- well, you see
my point...:-).  I don't intend to spend much more time pointing out how
reduce can be replaced by better code in real use cases, by the way:-).

>> But comparing plain Python code to a built-in that's almost bereft of
>> good use cases, and finding the plain Python code _faster_ on such a
>> regular basis, is IMHO perfectly legitimate.
> 
> reduce will be at least as fast as writing an explicit loop.

You are wrong: see the almost-a-dozen cases I posted to try and demolish
one suggested "good use" of reduce.

> Potentially more if the function object used is itself a builtin
> function.

In one of the uses of reduce in the standard library, for which I posted
superior replacements today, the function object was int.__mul__ -- builtin
enough for you?  Yet I showed how using recursion and memoization instead of 
reduce would speed up that case by many, many times.

Another classic example of reduce being hopelessly slow, despite using
a built-in function, is exactly the "flatten a 1-level list" case mentioned 
above.  Try:
x = reduce(operator.add, listoflists, x)
vs:
for L in listoflists: x.extend(L)
for a sufficiently big listoflists, and you'll see... (the latter if need be 
can get another nice little multiplicative speedup by hoisting the x.extend
lookup, but the key issue is O(N**2) reduce vs O(N) loop...).

[alex at lancelot src]$ timeit.py -c -s'xs=[[x] for x in range(999)]' -s'import 
operator' 'x=[]' 'x=reduce(operator.add, xs, x)'
100 loops, best of 3: 8.7e+03 usec per loop
[alex at lancelot src]$ timeit.py -c -s'xs=[[x] for x in range(999)]' -s'import 
operator' 'x=[]' 'for L in xs: x.extend(L)'
1000 loops, best of 3: 860 usec per loop
[alex at lancelot src]$ timeit.py -c -s'xs=[[x] for x in range(999)]' -s'import 
operator' 'x=[]; xex=x.extend' 'for L in xs: xex(L)'
1000 loops, best of 3: 560 usec per loop

See what I mean?  Already for a mere 999 1-item lists, the plain Python
code is 10 times faster than reduce, and if that's not quite enough and
you want it 15 times faster instead, that's trivial to get, too.

Alex