Newbie help for using multiprocessing and subprocess packages for creating child processes

Rob Newman rlnewman at ucsd.edu
Tue Jun 16 17:11:00 EDT 2009


Thanks Matt - that worked.

Kind regards,
- Rob

On Jun 16, 2009, at 12:47 PM, Matt wrote:

> Try replacing:
>    cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
> with:
>    cmd = [ “ls”, “/path/to/file/"+staname+"_info.pf" ]
>
> Basically, the first is the conceptual equivalent of executing the
> following in BASH:
> ‘ls /path/to/file/FOO_info.pf’
> The second is this:
> ‘ls’ ‘/path/to/file/FOO_info.pf’
>
> The first searches for a command in your PATH named ‘ls /path...’. The
> second searches for a command names ‘ls’ and gives it the argument
> ‘/path...’
>
> Also, I think this is cleaner (but it’s up to personal preference):
>    cmd = [ "ls", "/path/to/file/%s_info.pf" % staname]
>
> ________________________
> ~Matthew Strax-Haber
> Northeastern University, CCIS & CBA
> Co-op, NASA Langley Research Center
> Student Government Association, Special Interest Senator
> Resident Student Association, SGA Rep & General Councilor
> Chess Club, Treasurer
> E-mail: strax-haber.m=AT=neu.edu
>
> On Tue, Jun 16, 2009 at 3:13 PM, Rob Newman<rlnewman at ucsd.edu> wrote:
>> Hi All,
>>
>> I am new to Python, and have a very specific task to accomplish. I  
>> have a
>> command line shell script that takes two arguments:
>>
>> create_graphs.sh -v --sta=STANAME
>>
>> where STANAME is a string 4 characters long.
>>
>> create_graphs creates a series of graphs using Matlab (among other  
>> 3rd party
>> packages).
>>
>> Right now I can run this happily by hand, but I have to manually  
>> execute the
>> command for each STANAME. What I want is to have a Python script  
>> that I pass
>> a list of STANAMEs to, and it acts like a daemon and spawns as many  
>> child
>> processes as there are processors on my server (64), until it goes  
>> through
>> all the STANAMES (about 200).
>>
>> I posted a message on Stack Overflow (ref:
>> http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro) 
>>  and
>> was recommended to use the multiprocessing and subprocess packages.  
>> In the
>> Stack Overflow answers, it was suggested that I use the process  
>> pool class
>> in multiprocessing. However, the server I have to use is a Sun  
>> Sparc (T5220,
>> Sun OS 5.10) and there is a known issue with sem_open() (ref:
>> http://bugs.python.org/issue3770), so it appears I cannot use the  
>> process
>> pool class.
>>
>> So, below is my script (controller.py) that I have attempted to use  
>> as a
>> test, that just calls the 'ls' command on a file I know exists  
>> rather than
>> firing off my shell script (which takes ~ 10 mins to run per  
>> STANAME):
>>
>> #!/path/to/python
>>
>> import sys
>> import os
>> import json
>> import multiprocessing
>> import subprocess
>>
>> def work(verbose,staname):
>>  print 'function:',staname
>>  print 'parent process:', os.getppid()
>>  print 'process id:', os.getpid()
>>  print "ls /path/to/file/"+staname+"_info.pf"
>>  # cmd will eventually get replaced with the shell script with the  
>> verbose
>> and staname options
>>  cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
>>  return subprocess.call(cmd, shell=False)
>>
>> if __name__ == '__main__':
>>
>>  report_sta_list = ['B10A','B11A','BNLO']
>>
>>  # Print out the complete station list for testing
>>  print report_sta_list
>>
>>  # Get the number of processors available
>>  num_processes = multiprocessing.cpu_count()
>>
>>  print 'Number of processes: %s' % (num_processes)
>>
>>  print 'Now trying to assign all the processors'
>>
>>  threads = []
>>
>>  len_stas = len(report_sta_list)
>>
>>  print "+++ Number of stations to process: %s" % (len_stas)
>>
>>  # run until all the threads are done, and there is no data left
>>  while len(threads) < len(report_sta_list):
>>
>>    # if we aren't using all the processors AND there is still data  
>> left to
>>    # compute, then spawn another thread
>>
>>    print "+++ Starting to set off all child processes"
>>
>>    if( len(threads) < num_processes ):
>>
>>      this_sta = report_sta_list.pop()
>>
>>      print "+++ Station is %s" % (this_sta)
>>
>>      p = multiprocessing.Process(target=work,args=['v',this_sta])
>>
>>      p.start()
>>
>>      print p, p.is_alive()
>>
>>      threads.append(p)
>>
>>    else:
>>
>>      for thread in threads:
>>
>>        if not thread.is_alive():
>>
>>          threads.remove(thread)
>>
>> However, I seem to be running into a whole series of errors:
>>
>> myhost{rt}62% controller.py
>> ['B10A', 'B11A', 'BNLO']
>> Number of processes: 64
>> Now trying to assign all the processors
>> +++ Number of stations to process: 3
>> +++ Starting to set off all child processes
>> +++ Station is BNLO
>> <Process(Process-1, started)> True
>> +++ Starting to set off all child processes
>> +++ Station is B11A
>> function: BNLO
>> parent process: 22341
>> process id: 22354
>> ls /path/to/file/BNLO_info.pf
>> <Process(Process-2, started)> True
>> function: B11A
>> parent process: 22341
>> process id: 22355
>> ls /path/to/file/B11A_info.pf
>> Process Process-1:
>> Traceback (most recent call last):
>>  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
>> _bootstrap
>>    self.run()
>>  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in  
>> run
>>    self._target(*self._args, **self._kwargs)
>>  File "controller.py", line 104, in work
>>    return subprocess.call(cmd, shell=False)
>>  File "/opt/csw/lib/python/subprocess.py", line 444, in call
>>    return Popen(*popenargs, **kwargs).wait()
>>  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
>>    errread, errwrite)
>>  File "/opt/csw/lib/python/subprocess.py", line 1092, in  
>> _execute_child
>>    raise child_exception
>> OSError: [Errno 2] No such file or directory
>> Process Process-2:
>> Traceback (most recent call last):
>>  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
>> _bootstrap
>>    self.run()
>>  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in  
>> run
>>    self._target(*self._args, **self._kwargs)
>>  File "controller.py", line 104, in work
>>    return subprocess.call(cmd, shell=False)
>>  File "/opt/csw/lib/python/subprocess.py", line 444, in call
>>    return Popen(*popenargs, **kwargs).wait()
>>  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
>>    errread, errwrite)
>>  File "/opt/csw/lib/python/subprocess.py", line 1092, in  
>> _execute_child
>>    raise child_exception
>> OSError: [Errno 2] No such file or directory
>>
>> The files are there:
>>
>> mhost{me}11% ls -la /path/to/files/BNLO_info.pf
>> -rw-rw-r--   1 me       group     391 May 19 22:40
>> /path/to/files/BNLO_info.pf
>> myhost{me}12% ls -la /path/to/file/B11A_info.pf
>> -rw-rw-r--   1 me       group     391 May 19 22:27
>> /path/to/files/B11A_info.pf
>>
>> I might be doing this completely wrong, but I thought this would be  
>> the way
>> to list the files dynamically. Admittedly this is just a stepping  
>> stone to
>> running the actual shell script I want to run. Can anyone point me  
>> in the
>> right direction or offer any advice for using these packages?
>>
>> Thanks in advance for any help or insight.
>> - Rob



More information about the Python-list mailing list