# My own accounting python euler problem

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Nov 10 14:59:56 CET 2009

On Sun, 08 Nov 2009 12:31:26 -0500, geremy condra wrote:

> What you're describing is the powerset operation. Here's the example
> from the python docs:
[...]
> What I find interesting is that running it through timeit, it is much
> slower than the code suggested by Dan Bishop.

Your test doesn't show what you think it shows. You shouldn't just
blindly apply timeit without testing to see that the functions return
what you think they return. Your test is, I'm afraid, totally bogus and
the conclusion you draw is completely wrong.

[...]
> #timeit.timeit("subsets1(x)", setup) doesn't appear to terminate
> timeit.timeit("subsets2(x)", setup)
> timeit.timeit("subsets3(x)", setup)

For the sake of speed, I've used a smaller x. Here are the results of
calling the three functions:

>>> x = range(3)
>>> subsets1(x)
[[2], [2, 0], [2, 1], [2, 1, 0], [], [0], [1], [1, 0]]
>>> subsets2(x)
<generator object subsets2 at 0xb7c6311c>
>>> subsets3(x)
<itertools.chain object at 0xb7c608ac>

The reason subsets1() "doesn't appear to terminate" is that you are
trying to list all the subsets of x = range(100). That's a LOT of
subsets: 2**100 to be precise, or approximately 1.2e30.

subsets2() and subsets3() return a generator function and an iterator
object respectively. There may be some overhead difference between those,
but that's trivial compared to the cost of generating the subsets
themselves.

A better way of doing the timing is as follows:

from itertools import chain, combinations
from timeit import Timer

# use a number small enough to calculate in a reasonable time
x = list(range(10))

def subsets1(L):
S = []
if (len(L) == 1):
return [L, []]
else:
for s in subsets1(L[1:]):
S.append(s)
S.append(s + [ L[0]])
return S

def subsets2(L):
if L:
for s in subsets2(L[1:]):
yield s
yield s + [L[0]]
else:
yield []

def subsets3(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in
range(len(s)+1))

setup = "from __main__ import subsets1, subsets2, subsets3, x"

# Check the three functions return the same results:
x1 = sorted(subsets1(x))
x2 = sorted(subsets2(x))
x3 = sorted(list(t) for t in subsets1(x))
assert x1 == x2 == x3

# Set up some timers.
t1 = Timer("subsets1(x)", setup)
t2 = Timer("list(subsets2(x))", setup)
t3 = Timer("list(subsets3(x))", setup)

# And run them!
for t in (t1, t2, t3):
print min(t.repeat(number=1000, repeat=5))

The results I get are:

1.19647693634
0.901714801788
0.175387859344

Which as you can see, shows that the recipe in the docs is nearly ten
times faster than Dan's version. That's not surprising -- the docs
version uses highly optimized C code from itertools, while Dan's version
uses slow-ish Python code and recursion.

To show this is no fluke, I increased the size of x and ran the tests
again:

>>> x = list(range(15))  # 32000+ subsets
>>> x1 = sorted(subsets1(x))
>>> x2 = sorted(subsets2(x))
>>> x3 = sorted(list(t) for t in subsets1(x))
>>> assert x1 == x2 == x3
>>>
>>> t1 = Timer("subsets1(x)", setup)
>>> t2 = Timer("list(subsets2(x))", setup)
>>> t3 = Timer("list(subsets3(x))", setup)
>>> for t in (t1, t2, t3):
...     print min(t.repeat(number=1000, repeat=5))
...
45.283468008
33.9274909496
7.40781188011

--
Steven