Thread locking question.

Sat May 9 12:02:17 EDT 2009

On May 9, 8:36 am, Piet van Oostrum <p... at cs.uu.nl> wrote:
> >>>>> grocery_stocker <cdal... at gmail.com> (gs) wrote:
> >gs> The following code gets data from 5 different websites at the "same
> >gs> time".
> >gs> #!/usr/bin/python
> >gs> import Queue
> >gs> import threading
> >gs> import urllib2
> >gs> import time
> >gs> hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
> >gs>          "http://ibm.com", "http://apple.com"]
> >gs> queue = Queue.Queue()
> >gs> class MyUrl(threading.Thread):
> >gs>     def __init__(self, queue):
> >gs>         threading.Thread.__init__(self)
> >gs>         self.queue = queue
> >gs>     def run(self):
> >gs>         while True:
> >gs>             host = self.queue.get()
> >gs>             if host is None:
> >gs>                 break
> >gs>             url = urllib2.urlopen(host)
> >gs>             print url.read(1024)
> >gs>             #self.queue.task_done()
> >gs> start = time.time()
> >gs> def main():
> >gs>     for i in range(5):
> >gs>         t = MyUrl(queue)
> >gs>         t.setDaemon(True)
> >gs>         t.start()
> >gs>     for host in hosts:
> >gs>         print "pushing", host
> >gs>         queue.put(host)
> >gs>     for i in range(5):
> >gs>         queue.put(None)
> >gs>     t.join()
> >gs> if __name__ == "__main__":
> >gs>     main()
> >gs>     print "Elapsed Time: %s" % (time.time() - start)
> >gs> How does the parallel download work if each thread has a lock? When
> >gs> the program openswww.yahoo.com, it places a lock on the thread,
> >gs> right? If so, then doesn't that mean the other 4 sites have to wait
> >gs> for the thread to release the lock?
>
> No. Where does it set a lock? There is only a short lock period in the queue
> when an item is put in the queue or got from the queue. And of course we
> have the GIL, but this is released as soon as a long during operation is
> started - in this case when the Internet communication is done.
> --

Maybe I'm being a bit daft, but what prevents the data from www.yahoo.com
from being mixed up with the data from www.google.com? Doesn't using
queue() prevent the data from being mixed up?