map/filter/reduce/lambda opinions and background unscientific mini-survey
rrr at ronadam.com
Mon Jul 4 03:36:29 CEST 2005
Steven D'Aprano wrote:
> On Sun, 03 Jul 2005 19:31:02 +0000, Ron Adam wrote:
>>First on removing reduce:
>>1. There is no reason why reduce can't be put in a functional module
> Don't disagree with that.
>>you can write the equivalent yourself. It's not that hard to do, so it
>>isn't that big of a deal to not have it as a built in.
> Same goes for sum. Same goes for product, ...
Each item needs to stand on it's own. It's a much stronger argument for
removing something because something else fulfills it's need and is
easier or faster to use than just saying we need x because we have y.
In this case sum and product fulfill 90% (estimate of course) of reduces
use cases. It may actually be as high as 99% for all I know. Or it may
be less. Anyone care to try and put a real measurement on it?
which doesn't have that many
> common usages apart from calculating the geometric mean, and let's face
> it, most developers don't even know what the geometric mean _is_.
I'm neutral on adding product myself.
> If you look back at past discussions about sum, you will see that there is
> plenty of disagreement about how it should work when given non-numeric
> arguments, eg strings, lists, etc. So it isn't so clear what sum should do.
Testing shows sum() to be over twice as fast as either using reduce or a
for-loop. I think the disagreements will be sorted out.
>>2. Reduce calls a function on every item in the list, so it's
>>performance isn't much better than the equivalent code using a for-loop.
> That is an optimization issue. Especially when used with the operator
> module, reduce and map can be significantly faster than for loops.
I tried it... it made about a 1% improvement in the builtin reduce and
an equal improvement in the function that used the for loop.
The inline for loop also performed about the same.
>> *** (note, that list.sort() has the same problem. I would support
>>replacing it with a sort that uses an optional 'order-list' as a sort
>>key. I think it's performance could be increased a great deal by
>>removing the function call reference. ***
>>Second, the addition of sum & product:
>>1. Sum, and less so Product, are fairly common operations so they have
>>plenty of use case arguments for including them.
> Disagree about product, although given that sum is in the language, it
> doesn't hurt to put product as well for completion and those few usages.
I'm not convinced about product either, but if I were to review my
statistics textbooks, I could probably find more uses for it. I suspect
that there may be a few common uses for it that are frequent enough to
make it worth adding. But it might be better in a module.
>>2. They don't need to call a pre-defined function between every item, so
>>they can be completely handled internally by C code. They will be much
>>much faster than equivalent code using reduce or a for-loop. This
>>represents a speed increase for every program that totals or subtotals a
>>list, or finds a product of a set.
> I don't object to adding sum and product to the language. I don't object
> to adding zip. I don't object to list comps. Functional, er, functions
> are a good thing. We should have more of them, not less.
Yes, we should have lots of functions to use, in the library, but not
necessarily in builtins.
>>>But removing reduce is just removing
>>>functionality for no other reason, it seems, than spite.
>>No, not for spite. It's more a matter of increasing the over all
>>performance and usefulness of Python without making it more complicated.
>> In order to add new stuff that is better thought out, some things
>>will need to be removed or else the language will continue to grow and
>>be another visual basic.
> Another slippery slope argument.
Do you disagree or agree? Or are you undecided?
>>Having sum and product built in has a clear advantage in both
>>performance and potential frequency of use, where as reduce doesn't have
>>the same performance advantage and most poeple don't use it anyway, so
>>why have it built in if sum and product are?
> Because it is already there.
Hmm.. I know a few folks, Good people, but they keep everything to the
point of not being able to find anything because they have so much.
They can always think of reasons to keep things, "It's worth something",
"it means something to me", "I'm going to fix it", "I'm going to sell
it", "I might need it". etc..
"Because it is already there" sound like one of those type of reasons.
>>Why not just code it as a
>>function and put it in your own module?
> Yes, let's all re-invent the wheel in every module! Why bother having a
> print statement, when it is so easy to write your own:
> def myprint(obj):
Yes, Guido wants to make print a function in Python 3000. The good
thing about this is you can call your function just 'p' and save some
Actually, I think i/o functions should be grouped in an interface
module. That way you choose the interface that best fits your need. It
may have a print if it's a console, or it may have a widget if it's a gui.
> Best of all, you can customize print to do anything you like, _and_ it is
> a function.
>> def reduce( f, seq):
>> x = 0
>> for y in seq:
>> x = f(x,y)
>> return x
> Because that is far less readable, and you take a performance hit.
They come out pretty close as far as I can tell.
def reduce_f( f, seq):
x = seq
for y in seq[1:]:
x = f(x,y)
t = time.time()
r2 = reduce(lambda x,y: x*y, range(1,10000))
t2 = time.time()-t
print 'reduce builtin:', t2
t = time.time()
r1 = reduce_f(lambda x,y: x*y, range(1,10000))
t2 = time.time()-t
print 'reduce_f: ', t2
if r1!=r2: print "results not equal"
reduce builtin: 0.156000137329
reduce builtin: 0.15700006485
reduce builtin: 0.141000032425
>>But I suspect that most people would just do what I currently do and
>>write the for-loop to do what they want directly instead of using lambda
> That's your choice. I'm not suggesting we remove for loops and force you
> to use reduce. Or even list comps.
Just don't force me to use decorators! ;-)
Nah, they're ok too, but it did take me a little while to understand
their finer points.
More information about the Python-list