Thread locking question.
Piet van Oostrum
piet at cs.uu.nl
Sat May 9 15:43:57 EDT 2009
>>>>> grocery_stocker <cdalten at gmail.com> (gs) wrote:
>gs> Maybe I'm being a bit daft, but what prevents the data from www.yahoo.com
>gs> from being mixed up with the data from www.google.com? Doesn't using
>gs> queue() prevent the data from being mixed up?
Nothing in your script prevents the data from getting mixed up. Now it
seems from some experimentation that the print statements might be atomic,
although I can't find anything about that in the Python doc, and I think
you shouldn't count on that. I would expect it not to be atomic when it
does a blocking I/O.
If I make your example more complete, printing the documents completely,
host = self.queue.get()
if host is None:
url = urllib2.urlopen(host)
txt = url.read(1024)
if not txt: break
then the document will get mixed up in the output. Likewise if you would
want to put them in a shared datastructure, you must use locking when you
insert them (for example you could put them in another Queue).
The Queue you use here only prevent the urls from getting mixed up, but
it has no effect on the further processing.
As I told my students two days ago: you shouldn't do thread programming
unless you have thoroughly studied the subject.
By the way there is another flaw in your program: you do the join only
on the last spawned thread. Because the threads are daemonic all other
threads that are still working will be killed prematurely when this
The code should be like this:
threads = 
for i in range(5):
t = MyUrl(queue)
for t in threads:
Or just don't make them daemonic.
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org
More information about the Python-list