[Tutor] pickle problems

Richard D. Moores rdmoores at gmail.com
Thu Aug 23 01:32:57 CEST 2012


On Wed, Aug 22, 2012 at 11:54 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 23/08/12 02:17, Richard D. Moores wrote:
>>
>> I've incorporated many of the suggestions I've received here, plus some important tips from Case Van Horsen, the gmpy2 maintainer, and I believe one of its developers.
>>
>> Here's a function, factor_integer(), for quickly factoring any integer
>> up to 1e17:<http://pastebin.com/BLfV0EMw>.
>
>
> That relies on a pre-existing cache of prime numbers. If you don't use
> those cached prime numbers, do you know how long your code takes?

Much longer.

>> Larger integers will be factored eventually -- the wait can be long or
>> short. Probably long if they require the services of lines 82-91.
>>
>> Examples of extremes, both between 1e25 and 1e26:
>> 2,835,334,663,465,375,591,838,337  [3, 19, 37, 71, 947,
>> 19994908447741489]  1.172 secs
>> 9,349,337,574,247,205,482,125,105  [3, 5, 2027, 2311296661,
>> 133039358281] 402.5 secs
>
>
> I'm astounded by the times you quote there.

Excellent!

> The first example includes the prime 19994908447741489, which is *not*
> in the cache, since it is bigger than 1e17. And yet it only takes about
> a second to find all six primes. If true, that's astonishingly good for
> something written in pure Python.

My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.

C:\Windows\System32>python -m timeit -v -r 5 -s "from gmpy2 import
is_prime" "is_prime(19994908447741489)"
10 loops -> 0.0008 secs
100 loops -> 0.0073 secs
1000 loops -> 0.0409 secs
10000 loops -> 0.394 secs
raw times: 0.392 0.393 0.394 0.392 0.393
10000 loops, best of 5: 39.2 usec per loop

and to show its speed for enormous prime ints:
>>> is_prime(987987987199949084477414897654765465435634564357034523045023453475389457937283)
True

C:\Windows\System32>python -m timeit -v -r 5 -s "from gmpy2 import
is_prime" "is_prime(987987987199949084477414897654765465435634564357034523045023453475389457937283)"
10 loops -> 0.0077 secs
100 loops -> 0.0765 secs
1000 loops -> 0.758 secs
raw times: 0.757 0.758 0.757 0.757 0.758
1000 loops, best of 5: 757 usec per loop

Also, of the prime factors of  2,835,334,663,465,375,591,838,337, all
but the last one, 19994908447741489, are small and quickly found.
Once factors 3, 19, 37, 71, 947 are found, 19994908447741489 is simply
the quotient of
 2835334663465375591838337 // (3*19*37*71*947), and is_prime confirms
its primacy.

> The second example includes the prime 133039358281, which is smaller
> than 1e17 and so should be in the cache and therefore found (almost)
> instantly.

No, the cache has only primes up to 100,000,000, the last one being 99,999,989.

> And yet you claim it takes nearly seven minutes to find it.
> If correct, that's astonishingly awful given that you have the prime
> numbers already cached.

The delay is caused by the 2nd largest factor being a big one,
2311296661 which is greater than 100 million, whereas in the first
example, the 2nd largest factor is a puny 947 -- but most important, I
think, is that it smaller than 100 million.

> Can you explain those timings? How did you measure them?

Instead of just line 99 at the end of the pasted script, I had

from time import clock as t

t0 = t()
print(factor_integer(2835334663465375591838337))
t1 = t()
print("time =", round((t1-t0), 3))
"""
OUTPUT:
2,835,334,663,465,375,591,838,337 [3, 19, 37, 71, 947, 19994908447741489]
1.121
"""

Essentially the same algorithm is employed in the other script I
pasted at <http://pastebin.com/itqvuMXj>, factor_random_integers().
I've run factor_random_integers a couple of more times, with
num_integers, min_length, max_length = 10, 25, 25. See
<http://pastebin.com/LbdP2wjL>, "More output from
factor_random_integers".

Dick


More information about the Tutor mailing list