Sun Grid Engine / NFS and Python shell execution question

MRAB python at mrabarnett.plus.com
Thu Jul 22 11:31:55 EDT 2010


J.B. Brown wrote:
> Hello everyone, and thanks for your time to read this.
> 
> For quite some time, I have had a problem using Python's shell
> execution facilities in combination with a cluster computer
> environment (such as Sun Grid Engine (SGE)).
> In particular, I wish to repeatedly execute a number of commands in
> sub-shells or pipes within a single function, and the repeated
> execution is depending on the previous execution, so just writing a
> brute force script file and executing  commands is not an option for
> me.
> 
> To isolate and exemplify my problem, I have created three files:
> (1) one which exemplifies the spirit of the code I wish to execute in Python
> (2) one which serves as the SGE execution script file, and actually
> calls python to execute the code in (1)
> (3) a simple shell script which executes (2) a sufficient number of
> times that it fills all processors on my computing cluster and leaves
> an additional number of jobs in the queue.
> 
> Here is the spirit of the experiment/problem:
> generateTest.py:
> ----------------------------------------------
> # Constants
> numParallelJobs = 100
> testCommand = "continue"   #"os.popen( \"clear\" )"
> loopSize = "1000"
> 
> # First, write file with test script.
> pythonScript = file( "testScript.py", "w" )
> pythonScript.write(
> """
> import os
> for i in range( 0, """ + loopSize + """ ):
>  for j in range( 0, """ + loopSize + """ ):
>   for k in range( 0, """ + loopSize + """ ):
>    for l in range( 0, """ + loopSize + """ ):
>     """ + testCommand + """
> """ )
> pythonScript.close()
> 
> # Second, write SGE script file to execute the Python script.
> sgeScript = file( "testScript.sge", "w" )
> sgeScript.write (
> """
> #$ -cwd
> #$ -N pythonTest
> #$ -e /export/home/jbbrown/errorLog
> #$ -o /export/home/jbbrown/outputLog
> python testScript.py
> """ )
> sgeScript.close()
> 
> # Finally, write script to run SGE script a specified number of times.
> import os
> launchScript = file( "testScript.sh", "w" )
> for i in range( 0, numParallelJobs ):
>  launchScript.write( "qsub testScript.sge" + os.linesep )
> launchScript.close()
> 
> ----------------------------------------------
> 
> Now, let's assume that I have about 50 processors available across 8
> compute nodes, with one NFS-mounted disk.
> If I run the code as above, simply executing Python "continue"
> statements and do nothing, the cluster head node reports no serious
> NFS daemon load.
> 
> However - if I change the code to use the os.popen() call shown as a
> comment above, or use os.system(),
> the NFS daemon load on my system skyrockets within seconds of
> distributing the jobs to the compute nodes -- even though I'm doing
> nothing but executing the clear screen command, which technically
> doesn't pipe any output to the location for logging stdout.
> Even if I change the SGE script file to redirect standard output and
> error to explicitly go to /dev/null, I still have the same problem.
> 
> I believe the source of this problem is that os.popen() or os.system()
> calls spawn subshells which then reference my shell resource files
> (.zshrc, .cshrc, .bashrc, etc.).
> But I don't see an alternative to os.popen{234} or os.system().
> os.exec*() cannot solve my problem, because it transfers execution to
> that program and stops executing the script which called os.exec*().
> 
> Without having to rewrite a considerable amount of code (which
> performs cross validation by repeatedly executing in a subshell) in
> terms of a shell script language filled with a large number of
> conditional statements, does anyone know of a way to execute external
> programs in the middle of a script without referencing the shell
> resource file located on an NFS mounted directory?
> I have read through the >help(os) documentation repeatedly, but just
> can't find a solution.
> 
> Even a small lead or thought would be greatly appreciated.
> 
Have you looked at the 'subprocess' module?



More information about the Python-list mailing list