Newbie help for using multiprocessing and subprocess packages for creating child processes

Matt HellZFury+Python at gmail.com
Tue Jun 16 21:47:37 CEST 2009


Try replacing:
    cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
with:
    cmd = [ “ls”, “/path/to/file/"+staname+"_info.pf" ]

Basically, the first is the conceptual equivalent of executing the
following in BASH:
‘ls /path/to/file/FOO_info.pf’
The second is this:
‘ls’ ‘/path/to/file/FOO_info.pf’

The first searches for a command in your PATH named ‘ls /path...’. The
second searches for a command names ‘ls’ and gives it the argument
‘/path...’

Also, I think this is cleaner (but it’s up to personal preference):
    cmd = [ "ls", "/path/to/file/%s_info.pf" % staname]

________________________
~Matthew Strax-Haber
Northeastern University, CCIS & CBA
Co-op, NASA Langley Research Center
Student Government Association, Special Interest Senator
Resident Student Association, SGA Rep & General Councilor
Chess Club, Treasurer
E-mail: strax-haber.m=AT=neu.edu

On Tue, Jun 16, 2009 at 3:13 PM, Rob Newman<rlnewman at ucsd.edu> wrote:
> Hi All,
>
> I am new to Python, and have a very specific task to accomplish. I have a
> command line shell script that takes two arguments:
>
> create_graphs.sh -v --sta=STANAME
>
> where STANAME is a string 4 characters long.
>
> create_graphs creates a series of graphs using Matlab (among other 3rd party
> packages).
>
> Right now I can run this happily by hand, but I have to manually execute the
> command for each STANAME. What I want is to have a Python script that I pass
> a list of STANAMEs to, and it acts like a daemon and spawns as many child
> processes as there are processors on my server (64), until it goes through
> all the STANAMES (about 200).
>
> I posted a message on Stack Overflow (ref:
> http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro) and
> was recommended to use the multiprocessing and subprocess packages. In the
> Stack Overflow answers, it was suggested that I use the process pool class
> in multiprocessing. However, the server I have to use is a Sun Sparc (T5220,
> Sun OS 5.10) and there is a known issue with sem_open() (ref:
> http://bugs.python.org/issue3770), so it appears I cannot use the process
> pool class.
>
> So, below is my script (controller.py) that I have attempted to use as a
> test, that just calls the 'ls' command on a file I know exists rather than
> firing off my shell script (which takes ~ 10 mins to run per STANAME):
>
> #!/path/to/python
>
> import sys
> import os
> import json
> import multiprocessing
> import subprocess
>
> def work(verbose,staname):
>  print 'function:',staname
>  print 'parent process:', os.getppid()
>  print 'process id:', os.getpid()
>  print "ls /path/to/file/"+staname+"_info.pf"
>  # cmd will eventually get replaced with the shell script with the verbose
> and staname options
>  cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
>  return subprocess.call(cmd, shell=False)
>
> if __name__ == '__main__':
>
>  report_sta_list = ['B10A','B11A','BNLO']
>
>  # Print out the complete station list for testing
>  print report_sta_list
>
>  # Get the number of processors available
>  num_processes = multiprocessing.cpu_count()
>
>  print 'Number of processes: %s' % (num_processes)
>
>  print 'Now trying to assign all the processors'
>
>  threads = []
>
>  len_stas = len(report_sta_list)
>
>  print "+++ Number of stations to process: %s" % (len_stas)
>
>  # run until all the threads are done, and there is no data left
>  while len(threads) < len(report_sta_list):
>
>    # if we aren't using all the processors AND there is still data left to
>    # compute, then spawn another thread
>
>    print "+++ Starting to set off all child processes"
>
>    if( len(threads) < num_processes ):
>
>      this_sta = report_sta_list.pop()
>
>      print "+++ Station is %s" % (this_sta)
>
>      p = multiprocessing.Process(target=work,args=['v',this_sta])
>
>      p.start()
>
>      print p, p.is_alive()
>
>      threads.append(p)
>
>    else:
>
>      for thread in threads:
>
>        if not thread.is_alive():
>
>          threads.remove(thread)
>
> However, I seem to be running into a whole series of errors:
>
> myhost{rt}62% controller.py
> ['B10A', 'B11A', 'BNLO']
> Number of processes: 64
> Now trying to assign all the processors
> +++ Number of stations to process: 3
> +++ Starting to set off all child processes
> +++ Station is BNLO
> <Process(Process-1, started)> True
> +++ Starting to set off all child processes
> +++ Station is B11A
> function: BNLO
> parent process: 22341
> process id: 22354
> ls /path/to/file/BNLO_info.pf
> <Process(Process-2, started)> True
> function: B11A
> parent process: 22341
> process id: 22355
> ls /path/to/file/B11A_info.pf
> Process Process-1:
> Traceback (most recent call last):
>  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
> _bootstrap
>    self.run()
>  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in run
>    self._target(*self._args, **self._kwargs)
>  File "controller.py", line 104, in work
>    return subprocess.call(cmd, shell=False)
>  File "/opt/csw/lib/python/subprocess.py", line 444, in call
>    return Popen(*popenargs, **kwargs).wait()
>  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
>    errread, errwrite)
>  File "/opt/csw/lib/python/subprocess.py", line 1092, in _execute_child
>    raise child_exception
> OSError: [Errno 2] No such file or directory
> Process Process-2:
> Traceback (most recent call last):
>  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
> _bootstrap
>    self.run()
>  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in run
>    self._target(*self._args, **self._kwargs)
>  File "controller.py", line 104, in work
>    return subprocess.call(cmd, shell=False)
>  File "/opt/csw/lib/python/subprocess.py", line 444, in call
>    return Popen(*popenargs, **kwargs).wait()
>  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
>    errread, errwrite)
>  File "/opt/csw/lib/python/subprocess.py", line 1092, in _execute_child
>    raise child_exception
> OSError: [Errno 2] No such file or directory
>
> The files are there:
>
> mhost{me}11% ls -la /path/to/files/BNLO_info.pf
> -rw-rw-r--   1 me       group     391 May 19 22:40
> /path/to/files/BNLO_info.pf
> myhost{me}12% ls -la /path/to/file/B11A_info.pf
> -rw-rw-r--   1 me       group     391 May 19 22:27
> /path/to/files/B11A_info.pf
>
> I might be doing this completely wrong, but I thought this would be the way
> to list the files dynamically. Admittedly this is just a stepping stone to
> running the actual shell script I want to run. Can anyone point me in the
> right direction or offer any advice for using these packages?
>
> Thanks in advance for any help or insight.
> - Rob
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list