Threaded Design Question

MRAB google at mrabarnett.plus.com
Thu Aug 9 20:39:23 EDT 2007


On Aug 9, 7:25 pm, half.ital... at gmail.com wrote:
> Hi all!  I'm implementing one of my first multithreaded apps, and have
> gotten to a point where I think I'm going off track from a standard
> idiom.  Wondering if anyone can point me in the right direction.
>
> The script will run as a daemon and watch a given directory for new
> files.  Once it determines that a file has finished moving into the
> watch folder, it will kick off a process on one of the files.  Several
> of these could be running at any given time up to a max number of
> threads.
>
> Here's how I have it designed so far.  The main thread starts a
> Watch(threading.Thread) class that loops and searches a directory for
> files.  It has been passed a Queue.Queue() object (watch_queue), and
> as it finds new files in the watch folder, it adds the file name to
> the queue.
>
> The main thread then grabs an item off the watch_queue, and kicks off
> processing on that file using another class Worker(threading.thread).
>
> My problem is with communicating between the threads as to which files
> are currently processing, or are already present in the watch_queue so
> that the Watch thread does not continuously add unneeded files to the
> watch_queue to be processed.  For example...Watch() finds a file to be
> processed and adds it to the queue.  The main thread sees the file on
> the queue and pops it off and begins processing.  Now the file has
> been removed from the watch_queue, and Watch() thread has no way of
> knowing that the other Worker() thread is processing it, and shouldn't
> pick it up again.  So it will see the file as new and add it to the
> queue again.  PS.. The file is deleted from the watch folder after it
> has finished processing, so that's how i'll know which files to
> process in the long term.
>
I would suggest something like the following in the watch thread:

seen_files = {}

while True:
	# look for new files
	for name in os.listdir(folder):
		if name not in seen_files:
			process_queue.add(name)
		seen_files[name] = True

	# forget any missing files and mark the others as not seen, ready for
next time
	seen_files = dict((name, False) for name, seen in seen_files.items()
if seen)

	time.sleep(1)




More information about the Python-list mailing list