Threaded Design Question
Justin T.
jmtulloss at gmail.com
Thu Aug 9 22:23:09 EDT 2007
On Aug 9, 5:39 pm, MRAB <goo... at mrabarnett.plus.com> wrote:
> On Aug 9, 7:25 pm, half.ital... at gmail.com wrote:
>
> > Hi all! I'm implementing one of my first multithreaded apps, and have
> > gotten to a point where I think I'm going off track from a standard
> > idiom. Wondering if anyone can point me in the right direction.
>
> > The script will run as a daemon and watch a given directory for new
> > files. Once it determines that a file has finished moving into the
> > watch folder, it will kick off a process on one of the files. Several
> > of these could be running at any given time up to a max number of
> > threads.
>
> > Here's how I have it designed so far. The main thread starts a
> > Watch(threading.Thread) class that loops and searches a directory for
> > files. It has been passed a Queue.Queue() object (watch_queue), and
> > as it finds new files in the watch folder, it adds the file name to
> > the queue.
>
> > The main thread then grabs an item off the watch_queue, and kicks off
> > processing on that file using another class Worker(threading.thread).
>
> > My problem is with communicating between the threads as to which files
> > are currently processing, or are already present in the watch_queue so
> > that the Watch thread does not continuously add unneeded files to the
> > watch_queue to be processed. For example...Watch() finds a file to be
> > processed and adds it to the queue. The main thread sees the file on
> > the queue and pops it off and begins processing. Now the file has
> > been removed from the watch_queue, and Watch() thread has no way of
> > knowing that the other Worker() thread is processing it, and shouldn't
> > pick it up again. So it will see the file as new and add it to the
> > queue again. PS.. The file is deleted from the watch folder after it
> > has finished processing, so that's how i'll know which files to
> > process in the long term.
>
> I would suggest something like the following in the watch thread:
>
> seen_files = {}
>
> while True:
> # look for new files
> for name in os.listdir(folder):
> if name not in seen_files:
> process_queue.add(name)
> seen_files[name] = True
>
> # forget any missing files and mark the others as not seen, ready for
> next time
> seen_files = dict((name, False) for name, seen in seen_files.items()
> if seen)
>
> time.sleep(1)
Hmm, this wouldn't work. It's not thread safe and the last line before
you sleep doesn't make any sense.
More information about the Python-list
mailing list