clarification

Sun Aug 19 08:03:39 EDT 2007

Alex Martelli wrote:

> Of course, hoisting the unbound method out of the loops can afford the
> usual small optimization.  But my point is that, in Python, these
> operations (like, say, the concatenation of a sequence of lists, etc)
> are best performed "in place" via loops calling mutator methods such as
> update and intersection_update (or a list's extend, etc), rather than
> "functionally" (building and tossing away many intermediate results).
> E.g., consider:
> 
> brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)'
> 100000 loops, best of 3: 18.8 usec per loop
> 
> brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> range(0, 100, 3)]' 'r=reduce(set.union, sos, set())'    
> 10000 loops, best of 3: 87.2 usec per loop
> 
> Even in a case as tiny as this one, "reduce" is taking over 4 times
> longer than the loop with the in-place mutator -- and it only gets
> worse, as we're talking about O(N squared) vs O(N) performance!  Indeed,
> this is part of what makes reduce an "attractive nuisance"...;-).  [[And
> so is sum, if used OTHERWISE than for the documented purpose, computing
> "the sum of a sequence of numbers": a loop with r.extend is similarly
> faster, to concatenate a sequence of lists, when compared to sum(sol,
> [])...!!!]]

Curiously, when I attempt the actual problem at hand (set intersection) 
rather than something altogether different (set union) on my Dell laptop 
running Win2K Pro, I get the following results:

stmt = 'r=reduce(set.intersection, los)'
best of 3: 21.9 usec per loop
stmt = 'r=intersect_all(los)'
best of 3: 29.3 usec per loop

So, the "nuisance" is more attractive than you thought.  Apparently, one 
can't really depend on "in place" mutator methods being more efficient 
that using functional techniques.  BTW, the above figures reflect 10,000 
iterations; using larger counts makes the difference even larger.  I 
suspect that this is because the Python interpreter has fewer chances to 
be interrupted by external processes when it's using 'reduce'.  This in 
turn indicates that both implementations actually work about same and 
your "O(n squared)" argument is irrelevant.

P.S. Here's the source I used for the timings:

from timeit import Timer

setup = """
def intersect_all(los):
     it = iter(los)
     result = set(it.next())
     for s in it: result.intersection_update(s)
     return result

not7=range(2,11)
not7.remove(7)
los=[set(range(0,360,step)) for step in not7]
"""

number = 10000
repeat = 3
precision = 3

for stmt in 'r=reduce(set.intersection, los)', 'r=intersect_all(los)':
     t = Timer(stmt, setup)
     r = t.repeat(repeat, number)
     best = min(r)
     usec = best * 1e6 / number
     print "stmt =", repr(stmt)
     print "best of %d: %.*g usec per loop" % (repeat, precision, usec)