Parallel/Multiprocessing script design question
Ivan Voras
ivoras at _fer.hr_
Thu Sep 13 09:53:45 EDT 2007
Amit N wrote:
> About 800+ 10-15MB files are generated daily that need to be processed. The
> processing consists of different steps that the files must go through:
>
> -Uncompress
> -FilterA
> -FilterB
> -Parse
> -Possibly compress parsed files for archival
You can implement one of two easy straightforward approaches:
1 - Create one program, start N instances of it, where N is the number
of CPUs/cores, and let each process one file to completion. You'll
probably need an "overseer" program to start them and dispatch jobs to
them. The easiest is to start your processes with first N files, then
monitor them for completion and when any of them finishes, start another
with the next file in queue, etc.
2 - Create a program / process for each of these steps and let the steps
operate independently, but feed output from one step to the input of the
next. You'll probably need some buffering and more control, so that if
(for example) "FilterA" is slower then "Uncompress", the "Uncompress"
process is signaled to wait a little until "FilterA" needs more data.
The key is that, as long as all the steps run at approximatly the same
speed, they can run in parallel.
Note that both approaches are in principle independent on whether you
use threads or processes, with the exception of communication between
the steps/stages, but you can't use threads in python if your goal is
parallel execution of threads.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 257 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20070913/b09e8055/attachment.sig>
More information about the Python-list
mailing list