Newbie help for using multiprocessing and subprocess packages for creating child processes
Rob Newman
rlnewman at ucsd.edu
Tue Jun 16 15:13:58 EDT 2009
Hi All,
I am new to Python, and have a very specific task to accomplish. I
have a command line shell script that takes two arguments:
create_graphs.sh -v --sta=STANAME
where STANAME is a string 4 characters long.
create_graphs creates a series of graphs using Matlab (among other 3rd
party packages).
Right now I can run this happily by hand, but I have to manually
execute the command for each STANAME. What I want is to have a Python
script that I pass a list of STANAMEs to, and it acts like a daemon
and spawns as many child processes as there are processors on my
server (64), until it goes through all the STANAMES (about 200).
I posted a message on Stack Overflow (ref: http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro)
and was recommended to use the multiprocessing and subprocess
packages. In the Stack Overflow answers, it was suggested that I use
the process pool class in multiprocessing. However, the server I have
to use is a Sun Sparc (T5220, Sun OS 5.10) and there is a known issue
with sem_open() (ref: http://bugs.python.org/issue3770), so it appears
I cannot use the process pool class.
So, below is my script (controller.py) that I have attempted to use as
a test, that just calls the 'ls' command on a file I know exists
rather than firing off my shell script (which takes ~ 10 mins to run
per STANAME):
#!/path/to/python
import sys
import os
import json
import multiprocessing
import subprocess
def work(verbose,staname):
print 'function:',staname
print 'parent process:', os.getppid()
print 'process id:', os.getpid()
print "ls /path/to/file/"+staname+"_info.pf"
# cmd will eventually get replaced with the shell script with the
verbose and staname options
cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
report_sta_list = ['B10A','B11A','BNLO']
# Print out the complete station list for testing
print report_sta_list
# Get the number of processors available
num_processes = multiprocessing.cpu_count()
print 'Number of processes: %s' % (num_processes)
print 'Now trying to assign all the processors'
threads = []
len_stas = len(report_sta_list)
print "+++ Number of stations to process: %s" % (len_stas)
# run until all the threads are done, and there is no data left
while len(threads) < len(report_sta_list):
# if we aren't using all the processors AND there is still data
left to
# compute, then spawn another thread
print "+++ Starting to set off all child processes"
if( len(threads) < num_processes ):
this_sta = report_sta_list.pop()
print "+++ Station is %s" % (this_sta)
p = multiprocessing.Process(target=work,args=['v',this_sta])
p.start()
print p, p.is_alive()
threads.append(p)
else:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
However, I seem to be running into a whole series of errors:
myhost{rt}62% controller.py
['B10A', 'B11A', 'BNLO']
Number of processes: 64
Now trying to assign all the processors
+++ Number of stations to process: 3
+++ Starting to set off all child processes
+++ Station is BNLO
<Process(Process-1, started)> True
+++ Starting to set off all child processes
+++ Station is B11A
function: BNLO
parent process: 22341
process id: 22354
ls /path/to/file/BNLO_info.pf
<Process(Process-2, started)> True
function: B11A
parent process: 22341
process id: 22355
ls /path/to/file/B11A_info.pf
Process Process-1:
Traceback (most recent call last):
File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
_bootstrap
self.run()
File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in
run
self._target(*self._args, **self._kwargs)
File "controller.py", line 104, in work
return subprocess.call(cmd, shell=False)
File "/opt/csw/lib/python/subprocess.py", line 444, in call
return Popen(*popenargs, **kwargs).wait()
File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
errread, errwrite)
File "/opt/csw/lib/python/subprocess.py", line 1092, in
_execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Process Process-2:
Traceback (most recent call last):
File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
_bootstrap
self.run()
File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in
run
self._target(*self._args, **self._kwargs)
File "controller.py", line 104, in work
return subprocess.call(cmd, shell=False)
File "/opt/csw/lib/python/subprocess.py", line 444, in call
return Popen(*popenargs, **kwargs).wait()
File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
errread, errwrite)
File "/opt/csw/lib/python/subprocess.py", line 1092, in
_execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
The files are there:
mhost{me}11% ls -la /path/to/files/BNLO_info.pf
-rw-rw-r-- 1 me group 391 May 19 22:40 /path/to/files/
BNLO_info.pf
myhost{me}12% ls -la /path/to/file/B11A_info.pf
-rw-rw-r-- 1 me group 391 May 19 22:27 /path/to/files/
B11A_info.pf
I might be doing this completely wrong, but I thought this would be
the way to list the files dynamically. Admittedly this is just a
stepping stone to running the actual shell script I want to run. Can
anyone point me in the right direction or offer any advice for using
these packages?
Thanks in advance for any help or insight.
- Rob
More information about the Python-list
mailing list