Generators/iterators, Pythonicity, and primes

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Apr 5 07:24:44 EDT 2009


On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote:

> def prime_gen():
>     primes = []
>     return (primes.append(n) or n for n in count(2) if all(n%p for p
> in primes if p<=sqrt(n)))
> 
> That version is only marginally faster than your original version. The
> biggest performance penalty is that the (... for p in primes ...)
> generator isn't aborted once p > sqrt(n).

That's a big performance penalty in practice. For n=100, it means that 
ninety percent of the loop iterations are pointless; for n = 10,000 it 
means that 99% of the loops are pointless; asymptotically, it means that 
almost all the work done is wasted.

http://en.wikipedia.org/wiki/Almost_all


Consequently, retrieving the next prime becomes slower and slower as you 
iterate over the generator. By my tests, it is about 40 times slower to 
go from the 2000th prime to the 2001st prime (17389 and 17393) than it is 
to go from the sixth prime to the seventh (13 and 17).

In other words, this is a crappy way to generate primes. It's a trick, 
not a good algorithm: slow and inefficient, as well as obfuscated.


> Here's a less nifty but much more efficient version:
> 
> def prime_gen():
>     prime_list = [2]
>     for p in prime_list: yield p
>     for n in itertools.count(prime_list[-1] + 1):
>         for p in prime_list:
>             if p * p > n:
>                 prime_list.append(n)
>                 yield n
>                 break
>             elif n % p == 0:
>                 break
>         else:
>             raise Exception("Shouldn't have run out of primes!")

That's pretty sweet, but we can make it even faster. We can speed things 
up by incrementing by two instead of one, to avoid pointlessly testing 
even numbers that we know must fail. We can also speed things up a tad by 
making the common test first, and by storing squares rather than 
recalculating them each time through the loop.


def prime_gen2():
    yield 2
    n = 3
    yield n
    prime_list = [(2, 4), (3, 9)]
    it = itertools.count(4)
    for n in it:
        n = it.next()
        for p,p2 in prime_list:
            if n % p == 0:
                break
            elif p2 > n:
                prime_list.append((n, n*n))
                yield n
                break
        else:
            raise RuntimeError("Shouldn't have run out of primes!")

Generating the first million primes is 10% faster using this compared to 
your improved version.



-- 
Steven



More information about the Python-list mailing list