[IPython-dev] 0.11rc1 : problem with tutorial for PBS in http://ipython.org/ipython-doc/dev/parallel/parallel_process.html
Johann Cohen-Tanugi
johann.cohentanugi at gmail.com
Mon Jul 4 16:01:33 EDT 2011
good evening.... still trying to make the PBS batch parallel code work.
I had to comment the "-t" line in launcher.py, but I am still puzzled by
the fact that there is no loop over n to start n different engines. Is
that because the '-t' was precisely there to create an array of subjobs?
Second question, more general : assuming the use of ipcluster, a
controller and several engines are created; following the tutorial, all
would actually run in batch, which seems strange to me for the
controller : batch queues usually have time limits, and it is
unavoidable that engines would die when the cpu time is exceeded, but I
do not see why the controller should suffer from this. What would be the
rational to execute the controller in batch rather than locally? Second
question, once the engines run in batch, I presume that they listen to
commands sent from any ipython session that I would interactively start,
providing I use the Client() with the correct permissions in terms of
ports,ssh etc.... Is that correct, id est is that indeed the idea?
sorry to be dense about all that... I think it would be useful if the
batch doc page was supplemented with the final step which amounts to
starting an interactive ipython session and connecting to the batch engines.
will continue digging,
best.
Johann
On 07/04/2011 05:07 PM, Johann Cohen-Tanugi wrote:
> hi there, my problem is in the fact that a line seems to be added to the
> template I am defining following the tutorial :
> the template proposed in the tutorial is modified at runtime as :
>
> #!/bin/sh
> #PBS -t 1-4<----------------- incorrect?
> #PBS -V
> #PBS -N ipengine
> /usr/local/bin/python
> /sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/ipengineapp.py
> profile_dir=/afs/in2p3.fr/home/t/tanugi/\
> .ipython/profile_pbs
>
>
> The problem I believe is in the job_array_template in :
>
> class PBSLauncher(BatchSystemLauncher):
> """A BatchSystemLauncher subclass for PBS."""
>
> submit_command = List(['qsub'], config=True,
> help="The PBS submit command ['qsub']")
> delete_command = List(['qdel'], config=True,
> help="The PBS delete command ['qsub']")
> job_id_regexp = Unicode(r'\d+', config=True,
> help="Regular expresion for identifying the job ID [r'\d+']")
>
> batch_file = Unicode(u'')
> job_array_regexp = Unicode('#PBS\W+-t\W+[\w\d\-\$]+')
> job_array_template = Unicode('#PBS -t 1-{n}')
> queue_regexp = Unicode('#PBS\W+-q\W+\$?\w+')
> queue_template = Unicode('#PBS -q {queue}')
>
>
> I looked at the PBS doc for version 10 and 11 and I did not see any '-t'
> option. When I try to run, I get :
> [tanugi at ccali28 test_directory]$ ipcluster start profile=pbs n=4
> [IPClusterStart] Using existing profile dir:
> u'/afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs'
> [IPClusterStart] Starting ipcluster with [daemon=False]
> [IPClusterStart] Creating pid file:
> /afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pid/ipcluster.pid
> [IPClusterStart] Starting PBSControllerLauncher: ['qsub',
> u'/afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pbs_controller']
> [IPClusterStart] adding job array settings to batch script
> [IPClusterStart] Writing instantiated batch script:
> /afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pbs_controller
> unknown -t option
> ERROR:root:Error in periodic callback
> Traceback (most recent call last):
> File
> "/sps/glast/users/cohen/IPYDEV/local/lib/python2.6/site-packages/zmq/eventloop/ioloop.py",
> line 432, in _run
> self.callback()
> File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/ipclusterapp.py",
> line 364, in start_controller
> self.profile_dir.location
> File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 943, in start
> return super(PBSControllerLauncher, self).start(1, profile_dir)
> File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 902, in start
> job_id = self.parse_job_id(output)
> File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 854, in parse_job_id
> raise LauncherError("Job id couldn't be determined: %s" % output)
> LauncherError: Job id couldn't be determined:
>
> Not sure yet about the traceback, but the "unknown -t option" is clear.
> Furthermore, I wonder if it is really what we want to add lines to a
> template file provided by the user?
>
> best,
> Johann
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
More information about the IPython-dev
mailing list