[Tutor] Prime Factorization Tool

Steven D'Aprano steve at pearwood.info
Fri Dec 2 04:19:53 CET 2011


Robert Sjoblom wrote:
>> So you can roughly double the speed of your function by skipping even
>> numbers other than two. Remember that 2 itself is prime, but any other
>> multiple of 2 is not. Take that test outside of the loop, and then loop over
>> every second number starting with 3.
>>
> 
> So, if I understood this right, my function should look like this then:
> def isprime(n):
>     if n == 1:
>         return False
>     elif n == 2:
>         return True
>     for x in range(2, round(sqrt(n))+1):
>         if n % x == 0:
>             return False
>     else:
>         return True

Not quite; you're still starting the loop at 2, and testing every number.

Firstly, you need to exclude all even numbers apart from two.

     elif n == 2:  # the only even prime
         return True
     elif n%2 == 0:  # even numbers have no remainder when divided by 2
         return False


Secondly, you should replace the loop testing every number starting at two:

for x in range(2, round(sqrt(n))+1):
     ...


with one that starts at three and only tests odd numbers:

for x in range(3, int(sqrt(n))+1, 2):
     ...


> This works like expected, but I think my factorization function has a
> really bad problem somewhere.

Let's start by doing a factorization by hand. Suppose we want to factorize 
195942. It makes sense to start dividing by the smallest numbers we can, and 
move up. And remember to check for repeated factors.

Is 2 a factor? Yes, because 195942 % 2 == 0. So our first factor is 2, and 
we're left with 195942/2 = 97971 still to be factorized. That is, we have 
partially factorized 195942 = 2 * 97971.

Is 2 a factor of 97971? No, so we can move on to the next potential factor.

Is 3 a factor of 97971? Yes, so we add 3 to our list of factors, and we have a 
partial factorization of 195942 = 2*3 * 32657.

Is 3 a factor of 32657? No, so move on.

There is no point in checking whether 4 is a factor, because if it were, we 
would have caught it earlier when we divided by 2. Same for 6, 8, 10, and all 
other even numbers. So we skip all even numbers apart from two.

Is 5 a factor of 32657? No.
Is 7 a factor of 32657? No.
Is 9 a factor of 32657? No.
Is 11 a factor of 32657? No.
Is 13 a factor of 32657? No.
Is 15 a factor of 32657? No.

Is 17 a factor of 32657? Yes, so we add 17 to our list of factors, and we have 
a partial factorization of 195942 = 2*3*17 * 1921.
Is 17 a factor of 1921? Yes, so we add 17 to the factors (it is a repeated 
factor), and have 195942 = 2*3*17*17 * 113.

And we are done: 113 is a prime number, and 195942 = 2*3*17*17*113.

The tricky part is the end. How do we know 113 is a prime number without 
dividing by all the possible factors? Well, we know it isn't divisible by 2, 
because if it were, we would have found those factors earlier when we divided 
by 2. And it isn't divisible by 3, for the same reason. And so on -- we know 
113 isn't divisible by anything smaller than 17. So the smallest *potential* 
factor is 19: if there were a smaller factor, we would have already found it.

But how do we know that 113 is prime without even testing? Why couldn't 19 be 
a factor? Because 19 is larger than sqrt(113), which is a little more than 10. 
Another way to say it, is that 19**2 > 113. Since 19 is the smallest potential 
factor, and 19*19 is too big, that tells us that 19 isn't a factor and 
therefore there are no more factors to test.

So our first version looks like this:

def factorize(n):
     factors = []
     while n%2 == 0:
         factors.append(2)
         n = n/2
     d = 3
     while d**2 <= n:
         while n % d == 0:
             factors.append(d)
             n = n/d
         d = d+2
     if n != 1:
         factors.append(n)
     return factors



Notice that we don't use the isprime() function!

Potential improvements:

* As it stands, the function does the wrong thing for numbers less than two. 
Fix that problem.

* We test for potential factors 9, 21, 27, ... which are multiples of 3, and 
therefore cannot be a factor. Is there a way to skip them?

* Likewise for potential factors which are multiples of 5, 7, 11, etc. We 
really only want to test using prime numbers 2, 3, 5, 7, 11, 13, ... and skip 
all multiples. Is there a way to have the factorize() function only check with 
prime factors? What if you had a function called primes() which returned all 
the prime numbers as needed? How would you write primes()?

* Can we use the isprime() function to speed this up? Or will it slow it down?


Food for experimentation.


-- 
Steven


More information about the Tutor mailing list