[IPython-dev] SciPy Sprint summary

Justin Riley justin.t.riley at gmail.com
Sun Jul 18 15:05:16 EDT 2010


Matthieu,

I agree that password-less ssh is a common configuration on HPC clusters 
and it would be useful to have the option of using SSH to copy the furl 
file to each host before launching engines with SGE/PBS/LSF. I'll see 
about hacking this in when I get some more time.

BTW, I just added experimental support for LSF to my fork. I can't test 
the code given that I don't have access to a LSF system but in theory it 
should work (again using job arrays) provided the ~/.ipython/security 
folder is shared.

~Justin

On 07/18/2010 02:24 PM, Matthieu Brucher wrote:
> Hi,
>
> I also prefer the first option, as it is the configuration I'm most
> confortable with. Besides, people may have this already configured.
>
> Matthieu
>
> 2010/7/18 Justin Riley<justin.t.riley at gmail.com>:
>> Hi Matthieu,
>>
>> At least for the modifications I made, no not yet. This is exactly what I'm
>> asking about in the second paragraph of my response. The new SGE/PBS support
>> will work with multiple hosts assuming the ~/.ipython/security folder is
>> NFS-shared on the cluster.
>>
>> If that's not the case, then AFAIK we have two options:
>>
>> 1. scp the furl file from ~/.ipython/security to each host's
>> ~/.ipython/security folder.
>>
>> 2. put the contents of the furl file directly inside the job script
>> used to start the engines
>>
>> The first option relies on the user having password-less configured properly
>> to each node on the cluster. ipcluster would first need to scp the furl and
>> then launch the engines using PBS/SGE.
>>
>> The second option is the easiest approach given that it only requires SGE to
>> be installed, however, it's probably not the best idea to put the furl file
>> in the job script itself for security reasons. I'm curious to get opinions
>> on this. This would require slight code modifications.
>>
>> ~Justin
>>
>> On 07/18/2010 01:13 PM, Matthieu Brucher wrote:
>>>
>>> Hi,
>>>
>>> Does IPython support now sending engines to nodes that do not have the
>>> same $HOME as the main instance? This is what kept me from testing
>>> correctly IPython with LSF some months ago :|
>>>
>>> Matthieu
>>>
>>> 2010/7/18 Justin Riley<justin.t.riley at gmail.com>:
>>>>
>>>> Hi Satra/Brian,
>>>>
>>>> I modified your code to use the job array feature of SGE. I've also made
>>>> it so that users don't need to specify --sge-script if they don't need a
>>>> custom SGE launch script. My guess is that most users will choose not to
>>>> specify --sge-script first and resort to using --sge-script when the
>>>> generated launch script no longer meets their needs. More details in the
>>>> git log here:
>>>>
>>>> http://github.com/jtriley/ipython/tree/0.10.1-sge
>>>>
>>>> Also, I need to test this, but I believe this code will fail if the
>>>> folder containing the furl file is not NFS-mounted on the SGE cluster.
>>>> Another option besides requiring NFS is to scp the furl file to each
>>>> host as is done in the ssh mode of ipcluster, however, this would
>>>> require password-less ssh to be configured properly (maybe not so bad).
>>>> Another option is to dump the generated furl file into the job script
>>>> itself. This has the advantage of only needing SGE installed but
>>>> certainly doesn't seem like the safest practice. Any thoughts on how to
>>>> approach this?
>>>>
>>>> Let me know what you think.
>>>>
>>>> ~Justin
>>>>
>>>> On 07/18/2010 12:05 AM, Brian Granger wrote:
>>>>>
>>>>> Is the array jobs feature what you want?
>>>>>
>>>>> http://wikis.sun.com/display/gridengine62u6/Submitting+Jobs
>>>>>
>>>>> Brian
>>>>>
>>>>> On Sat, Jul 17, 2010 at 9:00 PM, Brian Granger<ellisonbg at gmail.com>
>>>>>   wrote:
>>>>>>
>>>>>> On Sat, Jul 17, 2010 at 6:23 AM, Satrajit Ghosh<satra at mit.edu>
>>>>>>   wrote:
>>>>>>>
>>>>>>> hi ,
>>>>>>>
>>>>>>> i've pushed my changes to:
>>>>>>>
>>>>>>> http://github.com/satra/ipython/tree/0.10.1-sge
>>>>>>>
>>>>>>> notes:
>>>>>>>
>>>>>>> 1. it starts cleanly. i can connect and execute things. when i kill
>>>>>>> using
>>>>>>> ctrl-c, the messages appear to indicate that everything shut down
>>>>>>> well.
>>>>>>> however, the sge ipengine jobs are still running.
>>>>>>
>>>>>> What version of Python and Twisted are you running?
>>>>>>
>>>>>>> 2. the pbs option appears to require mpi to be present. i don't think
>>>>>>> one
>>>>>>> can launch multiple engines using pbs without mpi or without the
>>>>>>> workaround
>>>>>>> i've applied to the sge engine. basically it submits an sge job for
>>>>>>> each
>>>>>>> engine that i want to run. i would love to know if a single job can
>>>>>>> launch
>>>>>>> multiple engines on a sge/pbs cluster without mpi.
>>>>>>
>>>>>> I think you are right that pbs needs to use mpirun/mpiexec to start
>>>>>> multiple engines using a single PBS job.  I am not that familiar with
>>>>>> SGE, can you start mulitple processes without mpi and with just a
>>>>>> single SGE job?  If so, let's try to get that working.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>> cheers,
>>>>>>>
>>>>>>> satra
>>>>>>>
>>>>>>> On Thu, Jul 15, 2010 at 8:55 PM, Satrajit Ghosh<satra at mit.edu>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> hi justin,
>>>>>>>>
>>>>>>>> i hope to test it out tonight. from what fernando and i discussed,
>>>>>>>> this
>>>>>>>> should be relatively straightforward. once i'm done i'll push it to
>>>>>>>> my fork
>>>>>>>> of ipython and announce it here for others to test.
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>>
>>>>>>>> satra
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 15, 2010 at 4:33 PM, Justin
>>>>>>>> Riley<justin.t.riley at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> This is great news. Right now StarCluster just takes advantage of
>>>>>>>>> password-less ssh already being installed and runs:
>>>>>>>>>
>>>>>>>>> $ ipcluster ssh --clusterfile /path/to/cluster_file.py
>>>>>>>>>
>>>>>>>>> This works fine for now, however, having SGE support would allow
>>>>>>>>> ipcluster's load to be accounted for by the queue.
>>>>>>>>>
>>>>>>>>> Is Satra on the list? I have experience with SGE and could help with
>>>>>>>>> the
>>>>>>>>> code if needed. I can also help test this functionality.
>>>>>>>>>
>>>>>>>>> ~Justin
>>>>>>>>>
>>>>>>>>> On 07/15/2010 03:34 PM, Fernando Perez wrote:
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 15, 2010 at 10:34 AM, Brian
>>>>>>>>>> Granger<ellisonbg at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the post.  You should also know that it looks like
>>>>>>>>>>> someone
>>>>>>>>>>> is going to add native SGE support to ipcluster for 0.10.1.
>>>>>>>>>>
>>>>>>>>>> Yes, Satra and I went over this last night in detail (thanks to
>>>>>>>>>> Brian
>>>>>>>>>> for the pointers), and he said he might actually already have some
>>>>>>>>>> code for it.  I suspect we'll get this in soon.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> f
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> IPython-dev mailing list
>>>>>>>>> IPython-dev at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> IPython-dev mailing list
>>>>>>> IPython-dev at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Brian E. Granger, Ph.D.
>>>>>> Assistant Professor of Physics
>>>>>> Cal Poly State University, San Luis Obispo
>>>>>> bgranger at calpoly.edu
>>>>>> ellisonbg at gmail.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> IPython-dev mailing list
>>>> IPython-dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>
>>>
>>>
>>>
>>
>>
>
>
>




More information about the IPython-dev mailing list