How can I speed up a script that iterates over a large range (600 billion)?

Chris Torek nospam at torek.net
Wed Jun 22 01:58:51 EDT 2011


Now that the exercise has been solved...

Instead of "really short code to solve the problem", how about 
some "really long code"? :-)

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

    $ factor 99999999999999999
    99999999999999999: 3 3 2071723 5363222357

The python code is rather too slow for this particular number (I
gave up on it finding 5363222357) but works quite well on 600851475143,
or even, e.g., 12186606004219:

    $ python factor.py 600851475143 12186606004219
    600851475143: 71 839 1471 6857
    12186606004219: 2071723 5882353

While searching for speed hacks I came across a few interesting
tricks, particularly TZ<omega>TZIOY's mod-30 scheme (erat3) at
stackoverflow.com (see URLs in the comments), which only really
works well in Python 2.7 and later (using itertools.compress).
Here is the end result (run with 2.5 and 2.7, I have no 3.x installed
anywhere convenient, and of course the print calls would change):

import itertools
import sys

def count(start = 0, step = 1):
    """
    Yield count starting from <start> and counting up by <step>.
    Same as itertools.count() except for the <step> argument, and
    allowing non-"int" arguments.
    
    Python 2.7 and later provides this directly, via itertools.

    Note: it's usually faster to use itertools.islice(itertools.count(...))
    than to run the "while True" loop below, so we do that if we can.
    """
    if (sys.version_info[0] > 2 or
            (sys.version_info[0] == 2 and sys.version_info[1] >= 7)):
        return itertools.count(start, step)
    if isinstance(start, int) and isinstance(step, int):
        if step == 1:
            return itertools.count(start)
        if 1 < step < 5: # 5 upper bound is a guess
            return itertools.islice(itertools.count(start), 0, None, step)
    def f(start, step):
        while True:
            yield start
            start += step
    return f(start, step)

# Sieve of Eratosthenes-based prime-number generator.
#
# Based on David Eppstein's for python 2.3(?) and subsequent
# discussion -- see
# http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
#
# See also:
# http://oreilly.com/pub/a/python/excerpt/pythonckbk_chap1/index1.html?page=last
#
# http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/3796442#3796442
def primes():
    """
    Yields sequence of prime numbers via Sieve of Eratosthenes.
    """
    # Keep the state from the last time we abandoned the
    # generator, in primes.primes and primes.D.  We can then
    # very quickly re-yield previously-saved primes.  We're
    # going to skip even numbers below, we so prime the
    # "primes so far" list with 2.
    #
    # For the fancy (erat3) version, we need to pre-provide
    # 2, 3, and 5, and pre-load D.  Having done that we can
    # start either version at 7.
    try:
        primes.primes
    except AttributeError:
        primes.primes = [2, 3, 5]
    for p in primes.primes:
        yield p
    # OK, now we need a mapping holding known-composite values
    # (numbers "struck from the sieve").
    try:
        D = primes.D
    except AttributeError:
        D = primes.D = { 9: 3, 25: 5 }
    # D[q] exists if q is composite; its value is the first prime
    # number that proves that q is composite.  (However, only odd
    # numbers appear in D.)
    q = p + 2 # where we start sieve-ing, below
    if sys.version_info[0] == 2 and sys.version_info[1] < 7:
        for q in count(q, 2):
            p = D.pop(q, None)
            if p is None:
                # q was not marked composite, so it's prime; moreover,
                # q-squared is composite (and odd) and its first prime
                # factor is q.
                primes.primes.append(q)
                D[q * q] = q
                yield q
            else:
                # q is composite, p is its first prime factor -- e.g.,
                # q=9 and p=3, or q=15 and p=3.  Extend the sieve:
                # find first natural number k where q + 2kp is not already
                # already known as composite.  (e.g., if q=9 and p=3, k=1
                # so that we mark D[15] as composite, with 3 as its first
                # prime factor.)  Note that we are skipping even numbers,
                # so p and q are both odd, so q+p is even, q+2p is odd,
                # q+3p is even, q+4p is odd, etc.  We don't need to mark
                # even-numbered composites in D, which is why we only look
                # at q + 2kp.
                twop = p + p
                x = q + twop # next odd multiple of p
                while x in D: # skip already-known composites
                    x += twop
                D[x] = p
    else:
        #       7  9 11 13 15 17 19 21 23 25 27 29 31 33 35
        MASK = (1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0)
        MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

        # If we started counting from 7, we'd want:
        #   itertools.compress(itertools.count(7,2), itertools.cycle(MASK))
        # But we start counting from q which means we need to clip off
        # the first ((q - 7) % 30) // 2 items:
        offset = ((q - 7) % 30) // 2
        for q in itertools.compress(itertools.count(q, 2),
                itertools.islice(itertools.cycle(MASK), offset, None, 1)):
            p = D.pop(q, None)
            if p is None:
                D[q * q] = q
                primes.primes.append(q)
                yield q
            else:
                twop = p + p
                x = q + twop
                while x in D or (x % 30) not in MODULOS:
                    x += twop
                D[x] = p

def factors(num):
    """
    Return all the prime factors of the given number.
    """
    if num < 0:
        num = -num
    if num < 2:
        return
    for p in primes():
        q, r = divmod(num, p)
        while r == 0:
            yield p
            if q == 1:
                return
            num = q
            q, r = divmod(num, p)

if __name__ == '__main__':
    for arg in (sys.argv[1:] if len(sys.argv) > 1 else ['600851475143']):
        try:
            arg = int(arg)
        except ValueError, error:
            print error
        else:
            print '%d:' % arg,
            for fac in factors(arg):
                print fac,
                sys.stdout.flush()
            print
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list