Prime number algo... what's wrong?

Sat Mar 29 22:44:08 EST 2003

[Alex Martelli, adds a cool refinement to the Python Cookbook sieve]
> ...
> Alternatively, therefore, one might choose a tiny variant for D's
> values, keeping them to single numbers but also ensuring that an
> entry is never duplicated:  it suffices to change the "while True:"
> loop's body to:
>
>         p = D.pop(q, None)
>         if p:
>             x = p + q
>             while x in D: x += p
>             D[x] = p
>         else:
>             D[q*q] = q
>             yield q
>         q += 1

The coolest thing about this dict-based sieve is that it takes storage
proportional to the number of primes less than N (== the maximum prime
desired), instead of storage proportional to N (which "the usual" sieve
algorithms require).  There's a well-known priority-queue sieve algorithm
that takes storage proportional to the number of primes less than sqrt(N).
Knuth Volume 3 spells that one out in an exercise (look in the index for
"prime", and follow the first reference).

Just to be extreme, here's a trick-laden version of that one using a dict
instead.  The tricks have mostly to do with avoiding multiples of 2 and 3.
Modulo the trickery (common to both), I find the dict-based version much
easier to follow than the priority-queue based one (which I won't show
here) -- and it turns out it's much faster too, at least in Python:

def sieve3(maximum):
    "Generate all primes <= maximum."
    if maximum >= 2:
        yield 2
    if maximum >= 3:
        yield 3
    # candidate takes on all integer values > 3 that aren't divisible by
    # 2 or 3.
    candidate = 5
    delta = 2  # flips between 2 and 4
    # Map i to (d, 6*p), where d is 2*p or 4*p, p is a prime dividing i,
    # and i+d is the next multiple of p larger than i not divisible by
    # 2 or 3.  As the algorithm proceeds, f will contain one entry for
    # each prime <= sqrt(maximum) discovered so far (excepting 2 and 3).
    f = {}
    fetch = f.pop
    adding = candidate**2 <= maximum
    while candidate <= maximum:
        if candidate in f:
            d, s = fetch(candidate)
            # candidate + d is next multiple of s/6 that isn't divisible
            # by 2 or 3
            i = candidate + d
            d = s-d
            while i in f:
                i += d
                d = s-d
            f[i] = d, s
        else:
            yield candidate
            if adding:
                sq = candidate**2
                f[sq] = delta * candidate, 6 * candidate
                adding = sq <= maximum
        candidate += delta
        delta = 6 - delta

This kind of algorithm seems nearly inconceivable if dicts aren't part of
your mental toolkit.  Every good algorithm contains a dict <wink>.