Generators/iterators, Pythonicity, and primes
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Apr 5 07:24:44 EDT 2009
On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote:
> def prime_gen():
> primes = []
> return (primes.append(n) or n for n in count(2) if all(n%p for p
> in primes if p<=sqrt(n)))
>
> That version is only marginally faster than your original version. The
> biggest performance penalty is that the (... for p in primes ...)
> generator isn't aborted once p > sqrt(n).
That's a big performance penalty in practice. For n=100, it means that
ninety percent of the loop iterations are pointless; for n = 10,000 it
means that 99% of the loops are pointless; asymptotically, it means that
almost all the work done is wasted.
http://en.wikipedia.org/wiki/Almost_all
Consequently, retrieving the next prime becomes slower and slower as you
iterate over the generator. By my tests, it is about 40 times slower to
go from the 2000th prime to the 2001st prime (17389 and 17393) than it is
to go from the sixth prime to the seventh (13 and 17).
In other words, this is a crappy way to generate primes. It's a trick,
not a good algorithm: slow and inefficient, as well as obfuscated.
> Here's a less nifty but much more efficient version:
>
> def prime_gen():
> prime_list = [2]
> for p in prime_list: yield p
> for n in itertools.count(prime_list[-1] + 1):
> for p in prime_list:
> if p * p > n:
> prime_list.append(n)
> yield n
> break
> elif n % p == 0:
> break
> else:
> raise Exception("Shouldn't have run out of primes!")
That's pretty sweet, but we can make it even faster. We can speed things
up by incrementing by two instead of one, to avoid pointlessly testing
even numbers that we know must fail. We can also speed things up a tad by
making the common test first, and by storing squares rather than
recalculating them each time through the loop.
def prime_gen2():
yield 2
n = 3
yield n
prime_list = [(2, 4), (3, 9)]
it = itertools.count(4)
for n in it:
n = it.next()
for p,p2 in prime_list:
if n % p == 0:
break
elif p2 > n:
prime_list.append((n, n*n))
yield n
break
else:
raise RuntimeError("Shouldn't have run out of primes!")
Generating the first million primes is 10% faster using this compared to
your improved version.
--
Steven
More information about the Python-list
mailing list