[IPython-dev] SciPy Sprint summary

Brian Granger ellisonbg at gmail.com
Tue Jul 20 16:04:04 EDT 2010


Great!  I mean great that you and Justin are testing and debugging this.

Brian

On Tue, Jul 20, 2010 at 1:01 PM, Satrajit Ghosh <satra at mit.edu> wrote:
> hi brian,
>
> i ran into a problem (my engines were not starting) and justin and i are
> going to try and figure out what's causing it.
>
> cheers,
>
> satra
>
>
> On Tue, Jul 20, 2010 at 3:19 PM, Brian Granger <ellisonbg at gmail.com> wrote:
>>
>> Satra,
>>
>> If you could test this as well, that would be great.  Thanks.  Justin,
>> let us know when you think it is ready to go with the documentation
>> and testing.
>>
>> Cheers,
>>
>> Brian
>>
>> On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley <justin.t.riley at gmail.com>
>> wrote:
>> > On 07/19/2010 01:06 AM, Brian Granger wrote:
>> >> * I like the design of the BatchEngineSet.  This will be easy to port
>> >> to
>> >>   0.11.
>> > Excellent :D
>> >
>> >> * I think if we are going to have default submission templates, we need
>> >> to
>> >>   expose the queue name to the command line.  This shouldn't be too
>> >> tough.
>> >
>> > Added --queue option to my 0.10.1-sge branch and tested this with SGE
>> > 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the code
>> > that *should* work with LSF.
>> >
>> >> * Have you tested this with Python 2.6.  I saw that you mentioned that
>> >>   the engines were shutting down cleanly now.  What did you do to fix
>> >> that?
>> >>   I am even running into that in 0.11 so any info you can provide would
>> >>   be helpful.
>> >
>> > I've been testing the code with Python 2.6. I didn't do anything special
>> > other than switch the BatchEngineSet to using job arrays (ie a single
>> > qsub command instead of N qsubs). Now when I run "ipcluster sge -n 4"
>> > the controller starts and the engines are launched and at that point the
>> > ipcluster session is running indefinitely. If I then ctrl-c the
>> > ipcluster session it catches the signal and calls kill() which
>> > terminates the engines by canceling the job. Is this the same situation
>> > you're trying to get working?
>> >
>> >> * For now, let's stick with the assumption of a shared $HOME for the
>> >> furl files.
>> >> * The biggest thing is if people can test this thoroughly.  I don't
>> >> have
>> >>   SGE/PBS/LSF access right now, so it is a bit difficult for me to
>> >> help. I
>> >>   have a cluster coming later in the summer, but it is not here yet.
>> >>  Once
>> >>   people have tested it well and are satisfied with it, let's merge it.
>> >> * If we can update the documentation about how the PBS/SGE support
>> >> works
>> >>   that would be great.  The file is here:
>> >
>> > That sounds fine to me. I'm testing this stuff on my workstation's local
>> > sge/torque queues and it works fine. I'll also test this with
>> > StarCluster and make sure it works on a real cluster. If someone else
>> > can test using LSF on a real cluster (with shared $HOME) that'd be
>> > great. I'll try to update the docs some time this week.
>> >
>> >>
>> >> Once these small changes have been made and everyone has tested, me
>> >> can merge it for the 0.10.1 release.
>> > Excellent :D
>> >
>> >> Thanks for doing this work Justin and Satra!  It is fantastic!  Just
>> >> so you all know where this is going in 0.11:
>> >>
>> >> * We are going to get rid of using Twisted in ipcluster.  This means we
>> >> have
>> >>   to re-write the process management stuff to use things like popen.
>> >> * We have a new configuration system in 0.11.  This allows users to
>> >> maintain
>> >>   cluster profiles that are a set of configuration files for a
>> >> particular
>> >>   cluster setup.  This makes it easy for a user to have multiple
>> >> clusters
>> >>   configured, which they can then start by name.  The logging,
>> >> security, etc.
>> >>   is also different for each cluster profile.
>> >> * It will be quite a bit of work to get everything working in 0.11, so
>> >> I am
>> >>   glad we are getting good PBS/SGE support in 0.10.1.
>> >
>> > I'm willing to help out with the PBS/SGE/LSF portion of ipcluster in
>> > 0.11, I guess just let me know when is appropriate to start hacking.
>> >
>> > Thanks!
>> >
>> > ~Justin
>> >
>>
>>
>>
>> --
>> Brian E. Granger, Ph.D.
>> Assistant Professor of Physics
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu
>> ellisonbg at gmail.com
>
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com



More information about the IPython-dev mailing list