[IPython-dev] SciPy Sprint summary

Brian Granger ellisonbg at gmail.com
Mon Jul 19 01:06:31 EDT 2010


Justin,

Here is a quick code review:

* I like the design of the BatchEngineSet.  This will be easy to port to
  0.11.
* I think if we are going to have default submission templates, we need to
  expose the queue name to the command line.  This shouldn't be too tough.
* Have you tested this with Python 2.6.  I saw that you mentioned that
  the engines were shutting down cleanly now.  What did you do to fix that?
  I am even running into that in 0.11 so any info you can provide would
  be helpful.
* For now, let's stick with the assumption of a shared $HOME for the furl files.
* The biggest thing is if people can test this thoroughly.  I don't have
  SGE/PBS/LSF access right now, so it is a bit difficult for me to help. I
  have a cluster coming later in the summer, but it is not here yet.  Once
  people have tested it well and are satisfied with it, let's merge it.
* If we can update the documentation about how the PBS/SGE support works
  that would be great.  The file is here:

http://github.com/jtriley/ipython/blob/8fef6d80ee4f69898351653b773029b36e118a64/docs/source/parallel/parallel_process.txt

Once these small changes have been made and everyone has tested, me
can merge it for the 0.10.1 release.
Thanks for doing this work Justin and Satra!  It is fantastic!  Just
so you all know where this is going in 0.11:

* We are going to get rid of using Twisted in ipcluster.  This means we have
  to re-write the process management stuff to use things like popen.
* We have a new configuration system in 0.11.  This allows users to maintain
  cluster profiles that are a set of configuration files for a particular
  cluster setup.  This makes it easy for a user to have multiple clusters
  configured, which they can then start by name.  The logging, security, etc.
  is also different for each cluster profile.
* It will be quite a bit of work to get everything working in 0.11, so I am
  glad we are getting good PBS/SGE support in 0.10.1.

Cheers,

Brian

On Sun, Jul 18, 2010 at 11:18 AM, Justin Riley <justin.t.riley at gmail.com> wrote:
> Hi Matthieu,
>
> At least for the modifications I made, no not yet. This is exactly what
> I'm asking about in the second paragraph of my response. The new SGE/PBS
> support will work with multiple hosts assuming the ~/.ipython/security
> folder is NFS-shared on the cluster.
>
> If that's not the case, then AFAIK we have two options:
>
> 1. scp the furl file from ~/.ipython/security to each host's
> ~/.ipython/security folder.
>
> 2. put the contents of the furl file directly inside the job script
> used to start the engines
>
> The first option relies on the user having password-less configured
> properly to each node on the cluster. ipcluster would first need to scp
> the furl and then launch the engines using PBS/SGE.
>
> The second option is the easiest approach given that it only requires
> SGE to be installed, however, it's probably not the best idea to put the
> furl file in the job script itself for security reasons. I'm curious to
> get opinions on this. This would require slight code modifications.
>
> ~Justin
>
> On 07/18/2010 01:13 PM, Matthieu Brucher wrote:
>> Hi,
>>
>> Does IPython support now sending engines to nodes that do not have the
>> same $HOME as the main instance? This is what kept me from testing
>> correctly IPython with LSF some months ago :|
>>
>> Matthieu
>>
>> 2010/7/18 Justin Riley<justin.t.riley at gmail.com>:
>>> Hi Satra/Brian,
>>>
>>> I modified your code to use the job array feature of SGE. I've also made
>>> it so that users don't need to specify --sge-script if they don't need a
>>> custom SGE launch script. My guess is that most users will choose not to
>>> specify --sge-script first and resort to using --sge-script when the
>>> generated launch script no longer meets their needs. More details in the
>>> git log here:
>>>
>>> http://github.com/jtriley/ipython/tree/0.10.1-sge
>>>
>>> Also, I need to test this, but I believe this code will fail if the
>>> folder containing the furl file is not NFS-mounted on the SGE cluster.
>>> Another option besides requiring NFS is to scp the furl file to each
>>> host as is done in the ssh mode of ipcluster, however, this would
>>> require password-less ssh to be configured properly (maybe not so bad).
>>> Another option is to dump the generated furl file into the job script
>>> itself. This has the advantage of only needing SGE installed but
>>> certainly doesn't seem like the safest practice. Any thoughts on how to
>>> approach this?
>>>
>>> Let me know what you think.
>>>
>>> ~Justin
>>>
>>> On 07/18/2010 12:05 AM, Brian Granger wrote:
>>>> Is the array jobs feature what you want?
>>>>
>>>> http://wikis.sun.com/display/gridengine62u6/Submitting+Jobs
>>>>
>>>> Brian
>>>>
>>>> On Sat, Jul 17, 2010 at 9:00 PM, Brian Granger<ellisonbg at gmail.com>    wrote:
>>>>> On Sat, Jul 17, 2010 at 6:23 AM, Satrajit Ghosh<satra at mit.edu>    wrote:
>>>>>> hi ,
>>>>>>
>>>>>> i've pushed my changes to:
>>>>>>
>>>>>> http://github.com/satra/ipython/tree/0.10.1-sge
>>>>>>
>>>>>> notes:
>>>>>>
>>>>>> 1. it starts cleanly. i can connect and execute things. when i kill using
>>>>>> ctrl-c, the messages appear to indicate that everything shut down well.
>>>>>> however, the sge ipengine jobs are still running.
>>>>>
>>>>> What version of Python and Twisted are you running?
>>>>>
>>>>>> 2. the pbs option appears to require mpi to be present. i don't think one
>>>>>> can launch multiple engines using pbs without mpi or without the workaround
>>>>>> i've applied to the sge engine. basically it submits an sge job for each
>>>>>> engine that i want to run. i would love to know if a single job can launch
>>>>>> multiple engines on a sge/pbs cluster without mpi.
>>>>>
>>>>> I think you are right that pbs needs to use mpirun/mpiexec to start
>>>>> multiple engines using a single PBS job.  I am not that familiar with
>>>>> SGE, can you start mulitple processes without mpi and with just a
>>>>> single SGE job?  If so, let's try to get that working.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Brian
>>>>>
>>>>>> cheers,
>>>>>>
>>>>>> satra
>>>>>>
>>>>>> On Thu, Jul 15, 2010 at 8:55 PM, Satrajit Ghosh<satra at mit.edu>    wrote:
>>>>>>>
>>>>>>> hi justin,
>>>>>>>
>>>>>>> i hope to test it out tonight. from what fernando and i discussed, this
>>>>>>> should be relatively straightforward. once i'm done i'll push it to my fork
>>>>>>> of ipython and announce it here for others to test.
>>>>>>>
>>>>>>> cheers,
>>>>>>>
>>>>>>> satra
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 15, 2010 at 4:33 PM, Justin Riley<justin.t.riley at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> This is great news. Right now StarCluster just takes advantage of
>>>>>>>> password-less ssh already being installed and runs:
>>>>>>>>
>>>>>>>> $ ipcluster ssh --clusterfile /path/to/cluster_file.py
>>>>>>>>
>>>>>>>> This works fine for now, however, having SGE support would allow
>>>>>>>> ipcluster's load to be accounted for by the queue.
>>>>>>>>
>>>>>>>> Is Satra on the list? I have experience with SGE and could help with the
>>>>>>>> code if needed. I can also help test this functionality.
>>>>>>>>
>>>>>>>> ~Justin
>>>>>>>>
>>>>>>>> On 07/15/2010 03:34 PM, Fernando Perez wrote:
>>>>>>>>> On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger<ellisonbg at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> Thanks for the post.  You should also know that it looks like someone
>>>>>>>>>> is going to add native SGE support to ipcluster for 0.10.1.
>>>>>>>>>
>>>>>>>>> Yes, Satra and I went over this last night in detail (thanks to Brian
>>>>>>>>> for the pointers), and he said he might actually already have some
>>>>>>>>> code for it.  I suspect we'll get this in soon.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> f
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> IPython-dev mailing list
>>>>>>>> IPython-dev at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> IPython-dev mailing list
>>>>>> IPython-dev at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Brian E. Granger, Ph.D.
>>>>> Assistant Professor of Physics
>>>>> Cal Poly State University, San Luis Obispo
>>>>> bgranger at calpoly.edu
>>>>> ellisonbg at gmail.com
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> IPython-dev mailing list
>>> IPython-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>
>>
>>
>>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com



More information about the IPython-dev mailing list