Prime number module

Tue Sep 30 09:47:15 EDT 2003

Tim Hochberg wrote:
   ...
> I believe you could implement a hybrid scheme that would be quite fast
> and still maintain nearly the same level of compression that you
> describe above. In addition to the above compressed data, also store,
> uncompressed, every Nth prime. A binary search will get you within N
> primes of your answer, to find the exact value, recreate those N-primes.
> For a N of, for instance, 64 the level of compression would be minimally
> affected but should make finding the number of primes less than a given
> number number much faster than the basic compressed scheme.

Still thinking of gmpy's primitives, I came up with a variant of this...:

import gmpy
import array
import bisect

highest_number_of_interest = gmpy.mpz(100*1000*1000)

def make_primes(put_them_here, every=1000):
    current_prime = gmpy.mpz(put_them_here[-1])
    count = 0
    while current_prime < highest_number_of_interest:
        current_prime = current_prime.next_prime()
        count += 1
        if count == every:
            put_them_here.append(int(current_prime))
            count = 0

try:
    savefile = open('primes.dat', 'rb')
except:
    every_1000th_prime = array.array('l', [1])
    make_primes(every_1000th_prime)
    savefile = open('primes.dat', 'wb')
    every_1000th_prime.tofile(savefile)
    savefile.close()
else:
    every_1000th_prime = array.array('l', [])
    try: every_1000th_prime.fromfile(savefile, 6000)
    except EOFError: pass
    else:
        warnings.warn("Only %d primes (?)" % len(every_1000th_prime))
        warnings.warn("Highest is %d" % every_1000th_prime[-1])
        # could fill out and re-save the array here, if needed
    savefile.close()

print "%d primes stored (1 every 1000), highest is %d" % (
    len(every_1000th_prime), every_1000th_prime[-1])

def nth_prime(N):
    N_div_1000, N_mod_1000 = divmod(N, 1000)
    try: prime = every_1000th_prime[N_div_1000]
    except IndexError:
        raise ValueError, "must be N<%d" %
(1000*len(every_1000th_prime)+999)
    prime = gmpy.mpz(prime)
    for x in xrange(N_mod_1000):
        prime = prime.next_prime()
    return int(prime)

print "55555th prime is", nth_prime(55555)

import bisect
def primes_upto(N):
    if N>highest_number_of_interest:
        raise ValueError, "must be N<%d" % highest_number_of_interest
    x = bisect.bisect_left(every_1000th_prime, N)
    if N == every_1000th_prime[x]: return x*1000+1
    prime = gmpy.mpz(every_1000th_prime[x-1])
    x = (x-1)*1000
    while prime < N:
        x += 1
        prime = prime.next_prime()
    if prime == N: x += 1
    return x

print "Up to 555555, there are %d primes" % primes_upto(555555)

The first run, building the file primes.dat (just 23048 bytes), took
almost 7 minutes elapsed time on my machine (250 CPU seconds).  But
any successive run is extremely fast:

[alex at lancelot perex]$ time python -O prio.py
5762 primes stored (1 every 1000), highest is 99991609
55555th prime is 686671
Up to 555555, there are 45741 primes
0.06user 0.00system 0:00.14elapsed 41%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (494major+297minor)pagefaults 0swaps
[alex at lancelot perex]$

I've coded this rapidly and sloppily, so I wouldn't be surprised if
there were off-by-one errors here and there (needs to be checked against
some other, independent way to generate/check primes!), but I think the
fundamental ideas are sound.

Alex