A question.

Alex Martelli aleaxit at yahoo.com
Tue Dec 12 07:43:02 EST 2000


"DeHa" <deha at poczta.onet.pl> wrote in message
news:9152a0$acb$1 at panorama.wcss.wroc.pl...
> Hi All!
>
>     I am new to Python, and just try to write my first program. But I have
a
> small problem.
>
>     My program should find first numbers in the set from 1 to 100000. I
made

I assume you mean 'prime' numbers.

> such a code:
>
> first = range(100001)
> first = first[1:]

first = range(1,100001)

would be equivalent.  The presence of '1' there, right at
the start, incidentally means that if the following code
worked at intended it would presumably delete everything
else from the list (as everything is divisible by one).


> for i in first:
>     if i != []:
> #until end of the list

i will never equal an empty-list, as first's items are just integers.

>         for j in first[first.index(i):]:                           #find
all
> numbers divided by i
>             if j % i == 0:
>                 first[first.index(j): first.index(j)] = []    #and delete
it
> from the list

Modifying the same list on which you are looping leads to
undefined behavior.

The 'for j' starts by assigning to j the value i -- that is what
first[first.index(i)] is worth -- and of course j % i WILL then
be zero and the if will succeed -- if the body of the if did
anything, such as removing something from the list, it would
start by removing the very thing it has found.

But the body of the if does nothing.  first[x:x] is an empty
list for any x, so assigning an emptylist to it is a no-op.

All in all, then, there are many problems here.  Perhaps
foundational, the fact that you're looping on items but
then repeatedly find out you need *indices* instead.  So,
why not loop on indices -- should be faster, might be
clearer.

Let's start with a range of candidates-for-primality that
does NOT start from 1, but from 2:

prime_candidates = range(2,100001)

i = 0
while i<len(prime_candidates):
    # loop invariant: prime_candidates[i] is prime
    aprime = prime_candidates[i]
    j = i + 1    # we must NOT remove the i-th item itself!!!
    while j<len(prime_candidates):
        # note we EITHER delete one item OR increment j,
        # as len(prime_candidates)-j goes down by ONE AT A TIME...
        if prime_candidates[j] % aprime == 0:    # divisible by the prime
            del prime_candidates[j]
        else:
            j += 1
    i += 1


The remaining problem with this is its woeful inefficiency.  The
algorithm is quadratic with the top of the range, and each step
of it is horribly slow, too -- dividing, deleting from the middle
of a long list, etc.  Here are a couple of runtimes in seconds:

With a top-of-range of 1001:

D:\PySym>python prim.py
0.145141311217
found 168 primes

and with a top-of-range of 10001:

D:\PySym>python prim.py
7.39378988285
found 1229 primes

I'm not patient enough to run it for 100001, but it would
surely be well over 350 seconds.


A very similar idea can be exploited, *without* such inefficiencies,
if the 'prime_candidates' list is NOT continuously modified.  The
'deletion' can then take the form, for example, of just setting a
bit in another list 'parallel' to the main one.  This also lets us
apply Eratosthenes' original ideas -- rather than doing any
division to check the remainders, we just use the size of the
latest prime number found as the 'step' to use for further walking
on the list!  Again, we'll loop on indices -- more convenient.

Also, we don't actually need to keep 'prime_candidates' around,
since any 'prime_candidates[i]' is just i+2 -- easier to compute
it on the fly (again, Eratosthenes' idea).

total_length = 100001 - 2 + 1        # 'len(prime_candidates)'
primality_bits = [1]*total_length

for i in range(0, total_length):
    if not primality_bits[i]: continue
    # we proceed iff "prime_candidates[i]" is prime
    aprime = i + 2
    for j in range(i+aprime, total_length, aprime):
        primality_bits[j] = 0


Notice how much simpler this is now.

And, correspondingly faster... again for top values of
1001, 10001, 100001:

D:\PySym>python prima.py
0.009498131886
found 168 primes

D:\PySym>python prima.py
0.0985063468943
found 1229 primes

D:\PySym>python prima.py
1.07798139764
found 9592 primes

Not quite linear, but almost. Also: about 145 times faster
than the first version for 1001, about 740 times for 10001
(probably more still, for 100001).


Dynamical, interpreted languages such as Python are
not *mostly* about performance -- but hundreds or even
thousands of times of slowdown/speedup can easily be
had in simple cases by paying a _little_ attention to
performance issues.  I think it can be worth it...


Alex






More information about the Python-list mailing list