reduce()--what is it good for? (was: Re: reduce() anomaly?)

Fri Nov 7 12:13:33 EST 2003

Erik Max Francis wrote:

> But reduce isn't simply intended for adding up numbers.  It's for doing
> any kind of reduction.

However, so many of reduce's practical use cases are eaten up by sum,
that reduce is left without real use cases to justify its existence.

If you consider the couple of uses left in the standard library, for
example... e.g., take cvs.py:

quotechar = reduce(lambda a, b, quotes = quotes:
                  (quotes[a] > quotes[b]) and a or b, quotes.keys())

...a masterpiece of clarity, right?  The simple Python alternative,

quotechar = None
for k in quotes:
    if not quotechar or quotes[k]>quotes[quotechar]:
        quotechar = k

may be deemed boring, but then why not go for speed & concision with...:

quotechar = max([ (v,k) for k,v in quotes.iteritems() ])[-1]

...?-)  All 4 uses in csv.py are similar to this, and the one
in difflib.py:

matches = reduce(lambda sum, triple: sum + triple[-1],
                 self.get_matching_blocks(), 0)

is clearly best expressed in Python 2.3 as:

matches = sum([ triple[-1] for triple in self.get_matching_blocks() ])

The only other uses are in Lib/test/ -- and even there, apart from
tests of reduce itself, all are "reduce(add..." [i.e., sum!] save
for *one*...:

def factorial(n):
    return reduce(int.__mul__, xrange(1, n), 1)

even in that one case (and apart from the confusing choice of having
factorial(n) return the factorial of n-1...), the most simplistic
implementation:

def fac(n):
    result = 1
    for i in xrange(2, n):
        result *= n
    return result

is only 3 times slower, and, if one is in a hurry, recursion and
memoization are obviously preferable:

def facto(n, _memo={1:1}):
    try: return _memo[n]
    except KeyError:
        result = _memo[n] = (n-1) * facto(n-1)
        return result

the performance numbers being:

[alex at lancelot bo]$ timeit.py -c -s'import facs' 'facs.factorial(13)'
100000 loops, best of 3: 10.3 usec per loop

[alex at lancelot bo]$ timeit.py -c -s'import facs' 'facs.fac(13)'
10000 loops, best of 3: 32 usec per loop

[alex at lancelot bo]$ timeit.py -c -s'import facs' 'facs.facto(13)'
1000000 loops, best of 3: 1.26 usec per loop

> Yes, I'm sure specific reductions based on summing are faster than using
> reduce.  But that goes without saying.  Furthermore, Numeric is not
> builtin to Python, so that seems a red herring to me.  Either you should
> compare builtins to builtins, or not.

If you want APL-ish functionality with Python, Numeric is where you'll
find it (one day numarray, Numeric's successor, may finally gain entrance
into the standard library, but, don't hold your breath...).

But comparing plain Python code to a built-in that's almost bereft of
good use cases, and finding the plain Python code _faster_ on such a
regular basis, is IMHO perfectly legitimate.  If a built-in gives me
obfuscated or slow code, where plain good old "let's code it out in
Python" gains clarity or speed, then it's time for that built-in to
go.  'reduce' exists for purely legacy reasons, and, IMHO, would count
as a wart were it not for Python's admirable determination to keep
old code running (however, even that determination can be overdone,
and I look forwards to the 3.0 release where old by-now-unwarranted
built-ins can be put out to pasture...).

Alex