Generators/iterators, Pythonicity, and primes

Steven D'Aprano steve at
Sun Apr 5 13:24:44 CEST 2009

On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote:

> def prime_gen():
>     primes = []
>     return (primes.append(n) or n for n in count(2) if all(n%p for p
> in primes if p<=sqrt(n)))
> That version is only marginally faster than your original version. The
> biggest performance penalty is that the (... for p in primes ...)
> generator isn't aborted once p > sqrt(n).

That's a big performance penalty in practice. For n=100, it means that 
ninety percent of the loop iterations are pointless; for n = 10,000 it 
means that 99% of the loops are pointless; asymptotically, it means that 
almost all the work done is wasted.

Consequently, retrieving the next prime becomes slower and slower as you 
iterate over the generator. By my tests, it is about 40 times slower to 
go from the 2000th prime to the 2001st prime (17389 and 17393) than it is 
to go from the sixth prime to the seventh (13 and 17).

In other words, this is a crappy way to generate primes. It's a trick, 
not a good algorithm: slow and inefficient, as well as obfuscated.

> Here's a less nifty but much more efficient version:
> def prime_gen():
>     prime_list = [2]
>     for p in prime_list: yield p
>     for n in itertools.count(prime_list[-1] + 1):
>         for p in prime_list:
>             if p * p > n:
>                 prime_list.append(n)
>                 yield n
>                 break
>             elif n % p == 0:
>                 break
>         else:
>             raise Exception("Shouldn't have run out of primes!")

That's pretty sweet, but we can make it even faster. We can speed things 
up by incrementing by two instead of one, to avoid pointlessly testing 
even numbers that we know must fail. We can also speed things up a tad by 
making the common test first, and by storing squares rather than 
recalculating them each time through the loop.

def prime_gen2():
    yield 2
    n = 3
    yield n
    prime_list = [(2, 4), (3, 9)]
    it = itertools.count(4)
    for n in it:
        n =
        for p,p2 in prime_list:
            if n % p == 0:
            elif p2 > n:
                prime_list.append((n, n*n))
                yield n
            raise RuntimeError("Shouldn't have run out of primes!")

Generating the first million primes is 10% faster using this compared to 
your improved version.


More information about the Python-list mailing list