[Tutor] Generate Prime Numbers

Danny Yoo dyoo at hashcollision.org
Sat May 30 21:56:47 CEST 2015


I'll review the code a bit.


> import time
>
> starttime=time.time()
>
> class Number:
>     def __init__(self,number,p=None,n=None):
>         self.no=number
>         self.marked=None
>         self.p=p
>         self.n=n

It would be helpful to document what the types of 'p' and 'n' are
here.  Without any comments, I don't have good expectations on what to
expect yet.  And without types, documentation in Python is doubly
crucial to help folks understand what to expect.



> node=Number(2,None,None)
> start=node
>
> counter=1
> for i in range(3,110000):
>     counter+=1
>     newnode=Number(i,node)
>     node.n=newnode
>     node=newnode


Ok, I see.  Your structure earlier is meant to represent a
doubly-linked linked data structure.  The fields 'n' stands for
"next", and 'p' stands for "previous", and you're constructing a
linked list of numbers, each of which should have links to the next
and previous elements.  You're basically building up a thread of
nodes:

    3 -> 4 -> 5 -> 6 -> ...

The code above doesn't seem to do the backwards linking with the 'p'
previous attribute.  You might want to remove that from your data
structure definition if you're not using it.


... but that being said: do you need to represent linked structure
here?  Linked structure is very flexible, but with certain costs:
"random" access is slower because you have to keep following links to
get from one end of the list to another point.  If you want to mark
every 5th element in the collection, a linked list will force you to
walk every intermediate element in between, as you do later on in your
'while' loop.  In contrast, an array-like structure will let you
access nodes by index very quickly: you can jump to every fifth
element directly, skipping over the other ones.


That is, it's important to note that the version with linked lists
isn't slow because it's using classes: it's slow fundamentally because
it's not actually taking advantage of the structure of numbers and the
ability to do quick, random access based on numbers.


You should see improvement by using an array-like list.  Something like:

###################################
class Number:
    def __init__(self,number):
        self.no=number
        self.marked=None

numbers = []
for i in range(110000):
    newnumber = Number(i)
    numbers.append(newnumber)
###################################

to initialize the basic structures.  Here, numbers[3] will be the
Number(3), numbers[6] will be the Number(6), and so on.  This direct
correspondence between index and the value at that index is what makes
array-like structures very useful.  With this ability to directly
index, the code that does the sieving no longer needs to manually
march down links: it can index over the multiples of n.

Try it with the array-like list.  You should find it instructive.
You've pretty much got the rest of the code ready: it should be almost
identical with the dict-based code you present later.


More information about the Tutor mailing list