<div dir="ltr"><div class="markdown-here-wrapper" id="markdown-here-wrapper-573798" style><p style="margin:1.2em 0px!important">Can you inspect the <code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:nowrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline">pbs_engines</code> template, and see if anything looks wrong? Can you submit it manually, with <code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:nowrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline">qsub ./pbs_engines</code>?</p>
</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Sep 13, 2013 at 3:38 AM, James <span dir="ltr"><<a href="mailto:jamesresearching@gmail.com" target="_blank">jamesresearching@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Dear all,<br><br>I'm having a lot of trouble setting up IPython parallel on a PBS cluster, and I would really appreciate any help.<br>
<br>The architecture is a standard PBS cluster - a head node with slave nodes. I connect to the head node from my laptop over ssh.<br>
<br>The client (laptop) -> Head node connection seems simple enough. The problem is the engines.<br><br>Ignoring
the laptop for a moment, I'll just focus on running ipython on the head
node, with the engines on a slave node. I assume this is a correct
method of working?<br>
<br>I did the following on the head node, following instructions at <a href="http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode" target="_blank">http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode</a> :<br>
<br>$ ipython profile create --parallel --profile=pbs<br><br>Files are as follows:<br><br>$cat ipcluster_config.py<br>c = get_config()<br>c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'<br>c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'<br>
c.PBSLauncher.queue = 'long'<br>c.IPClusterEngines.n = 2 # Run 2 cores on 1 node or 2 nodes with all cores? Not sure.<br><br>$ cat ipengine_config.py<br>c = get_config()<br><br>Then execute on the head node:<br>$ ipcluster start --profile=pbs -n 2<br>
2013-09-10 15:02:46,771.771 [IPClusterStart] Using existing profile dir: u'/home/username/.ipython/profile_pbs'<br>2013-09-10 15:02:46.777 [IPClusterStart] Starting ipcluster with [daemon=False]<br>2013-09-10 15:02:46.778 [IPClusterStart] Creating pid file: /home/username/.ipython/profile_pbs/pid/ipcluster.pid<br>
2013-09-10 15:02:46.778 [IPClusterStart] Starting Controller with PBSControllerLauncher<br>2013-09-10 15:02:46.792 [IPClusterStart] Job submitted with job id: '2830'<br>2013-09-10 15:02:47.793 [IPClusterStart] Starting 2 Engines with PBSEngineSetLauncher<br>
2013-09-10 15:02:47.808 [IPClusterStart] Job submitted with job id: '2831'<br><br>Then the queue shows<br>$ qstat<br>Job id Name User Time Use S Queue<br>------------------------- ---------------- --------------- -------- - -----<br>
2830[].master ipcontroller username 0 Q long <br>2831[].master ipengine username 0 Q long <br><br>And
they just hang there, queued forever. I assume the engines at least
should be running? Full information through "qstat -f" doesn't give the
reason for the queuing. Normally it would do. There are more than 4
nodes available.<br>
<br></div>$qstat -f<br>Job Id: 2831[].master.domain<br> Job_Name = ipengine<br> Job_Owner = username@master.domain<br> job_state = Q<br> queue = long<br> server = [head node's domain address]<br> Checkpoint = u<br>
ctime = Tue Sep 10 15:02:47 2013<br> Error_Path = master.domain:/home/username/<div dir="ltr">ipengine.e2831<br> Hold_Types = n<br> Join_Path = n<br> Keep_Files = n<br> Mail_Points = a<br> mtime = Tue Sep 10 15:02:47 2013<br>
Output_Path = master.domain:/home/username/ipengine.o2831<br> Priority = 0<br> qtime = Tue Sep 10 15:02:47 2013<br> Rerunable = True<br> [...]<br> etime = Tue Sep 10 15:02:47 2013<br> submit_args = ./pbs_engines<br>
job_array_request = 1-2<br> fault_tolerant = False<br> submit_host = master.domain<br> init_work_dir = /home/username<br><div><br>It
also seems strange that the ipcontroller is launched through PBS. I
thought this should be on the head node, so I changed
'PBSControllerLauncher' to 'LocalControllerLauncher'. Then it doesn't
queue, but I don't know if what I'm doing is correct.<br>
<br>Any help would be really greatly appreciated.<br><br>Thank you.<span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>James<br></div></font></span></div>
</div>
<br>_______________________________________________<br>
IPython-dev mailing list<br>
<a href="mailto:IPython-dev@scipy.org">IPython-dev@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-dev" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-dev</a><br>
<br></blockquote></div><br></div>