[IPython-dev] SciPy Sprint summary

Justin Riley justin.t.riley at gmail.com
Sun Jul 18 12:58:45 EDT 2010


Turns out that torque/pbs also support job arrays. I've updated my 
0.10.1-sge branch with PBS job array support. Works well with torque 
2.4.6. Also tested SGE support against 6.2u3.

Since the code is extremely similar between PBS/SGE I decided to update 
the BatchEngineSet base class to handle the core job array logic. Given 
that PBS/SGE are the only subclasses I figured this was OK. If not, 
should be easy to break it out again.

~Justin

On 07/18/2010 03:43 AM, Justin Riley wrote:
> Hi Satra/Brian,
>
> I modified your code to use the job array feature of SGE. I've also made
> it so that users don't need to specify --sge-script if they don't need a
> custom SGE launch script. My guess is that most users will choose not to
> specify --sge-script first and resort to using --sge-script when the
> generated launch script no longer meets their needs. More details in the
> git log here:
>
> http://github.com/jtriley/ipython/tree/0.10.1-sge
>
> Also, I need to test this, but I believe this code will fail if the
> folder containing the furl file is not NFS-mounted on the SGE cluster.
> Another option besides requiring NFS is to scp the furl file to each
> host as is done in the ssh mode of ipcluster, however, this would
> require password-less ssh to be configured properly (maybe not so bad).
> Another option is to dump the generated furl file into the job script
> itself. This has the advantage of only needing SGE installed but
> certainly doesn't seem like the safest practice. Any thoughts on how to
> approach this?
>
> Let me know what you think.
>
> ~Justin
>
> On 07/18/2010 12:05 AM, Brian Granger wrote:
>> Is the array jobs feature what you want?
>>
>> http://wikis.sun.com/display/gridengine62u6/Submitting+Jobs
>>
>> Brian
>>
>> On Sat, Jul 17, 2010 at 9:00 PM, Brian Granger<ellisonbg at gmail.com>
>> wrote:
>>> On Sat, Jul 17, 2010 at 6:23 AM, Satrajit Ghosh<satra at mit.edu> wrote:
>>>> hi ,
>>>>
>>>> i've pushed my changes to:
>>>>
>>>> http://github.com/satra/ipython/tree/0.10.1-sge
>>>>
>>>> notes:
>>>>
>>>> 1. it starts cleanly. i can connect and execute things. when i kill
>>>> using
>>>> ctrl-c, the messages appear to indicate that everything shut down well.
>>>> however, the sge ipengine jobs are still running.
>>>
>>> What version of Python and Twisted are you running?
>>>
>>>> 2. the pbs option appears to require mpi to be present. i don't
>>>> think one
>>>> can launch multiple engines using pbs without mpi or without the
>>>> workaround
>>>> i've applied to the sge engine. basically it submits an sge job for
>>>> each
>>>> engine that i want to run. i would love to know if a single job can
>>>> launch
>>>> multiple engines on a sge/pbs cluster without mpi.
>>>
>>> I think you are right that pbs needs to use mpirun/mpiexec to start
>>> multiple engines using a single PBS job. I am not that familiar with
>>> SGE, can you start mulitple processes without mpi and with just a
>>> single SGE job? If so, let's try to get that working.
>>>
>>> Cheers,
>>>
>>> Brian
>>>
>>>> cheers,
>>>>
>>>> satra
>>>>
>>>> On Thu, Jul 15, 2010 at 8:55 PM, Satrajit Ghosh<satra at mit.edu> wrote:
>>>>>
>>>>> hi justin,
>>>>>
>>>>> i hope to test it out tonight. from what fernando and i discussed,
>>>>> this
>>>>> should be relatively straightforward. once i'm done i'll push it to
>>>>> my fork
>>>>> of ipython and announce it here for others to test.
>>>>>
>>>>> cheers,
>>>>>
>>>>> satra
>>>>>
>>>>>
>>>>> On Thu, Jul 15, 2010 at 4:33 PM, Justin
>>>>> Riley<justin.t.riley at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> This is great news. Right now StarCluster just takes advantage of
>>>>>> password-less ssh already being installed and runs:
>>>>>>
>>>>>> $ ipcluster ssh --clusterfile /path/to/cluster_file.py
>>>>>>
>>>>>> This works fine for now, however, having SGE support would allow
>>>>>> ipcluster's load to be accounted for by the queue.
>>>>>>
>>>>>> Is Satra on the list? I have experience with SGE and could help
>>>>>> with the
>>>>>> code if needed. I can also help test this functionality.
>>>>>>
>>>>>> ~Justin
>>>>>>
>>>>>> On 07/15/2010 03:34 PM, Fernando Perez wrote:
>>>>>>> On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger<ellisonbg at gmail.com>
>>>>>>> wrote:
>>>>>>>> Thanks for the post. You should also know that it looks like
>>>>>>>> someone
>>>>>>>> is going to add native SGE support to ipcluster for 0.10.1.
>>>>>>>
>>>>>>> Yes, Satra and I went over this last night in detail (thanks to
>>>>>>> Brian
>>>>>>> for the pointers), and he said he might actually already have some
>>>>>>> code for it. I suspect we'll get this in soon.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> f
>>>>>>
>>>>>> _______________________________________________
>>>>>> IPython-dev mailing list
>>>>>> IPython-dev at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> IPython-dev mailing list
>>>> IPython-dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Brian E. Granger, Ph.D.
>>> Assistant Professor of Physics
>>> Cal Poly State University, San Luis Obispo
>>> bgranger at calpoly.edu
>>> ellisonbg at gmail.com
>>>
>>
>>
>>
>




More information about the IPython-dev mailing list