[IPython-dev] SciPy Sprint summary
Satrajit Ghosh
satra at mit.edu
Fri Jul 23 15:19:02 EDT 2010
if i add the following line to sge script to match my shell, it works fine.
perhaps we should allow adding shell as an option like queue and by default
set it to the user's shell?
#$ -S /bin/bash
cheers,
satra
On Wed, Jul 21, 2010 at 11:58 PM, Satrajit Ghosh <satra at mit.edu> wrote:
> hi justin,
>
> 1. By cleanly installed, do you mean SGE in addition to ipython/ipcluster?
>>
>
> no just the python environment.
>
>
>> 2. From the job output you sent me previously (when it wasn't working) it
>> seems that there might have been a mismatch in the shell that was used given
>> that the output was complaining about "Illegal variable name". I've noticed
>> that SGE likes to assign csh by default on my system if I don't specify a
>> shell at install time. What is the output of "qconf -sq all.q | grep -i
>> shell" for you?
>>
>
> (nipype0.3)satra at sub:/tmp$ qconf -sq all.q | grep -i shell
> shell /bin/sh
> shell_start_mode unix_behavior
>
> (nipype0.3)satra at sub:/tmp$ qconf -sq sub | grep -i shell
> shell /bin/csh
> shell_start_mode posix_compliant
>
> (nipype0.3)satra at sub:/tmp$ qconf -sq twocore | grep -i shell
> shell /bin/bash
> shell_start_mode posix_compliant
>
> only twocore worked. all.q and sub didn't. choosing the latter two puts the
> job in qw state.
>
> my default shell is bash.
>
> cheers,
>
> satra
>
>
>> Thanks!
>>
>> ~Justin
>>
>> On Wed, Jul 21, 2010 at 9:05 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>>
>>> hi justin.
>>>
>>> i really don't know what the difference is, but i clean installed
>>> everything and it works beautifully on SGE.
>>>
>>> cheers,
>>>
>>> satra
>>>
>>>
>>>
>>> On Tue, Jul 20, 2010 at 4:04 PM, Brian Granger <ellisonbg at gmail.com>wrote:
>>>
>>>> Great! I mean great that you and Justin are testing and debugging this.
>>>>
>>>> Brian
>>>>
>>>> On Tue, Jul 20, 2010 at 1:01 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>>>> > hi brian,
>>>> >
>>>> > i ran into a problem (my engines were not starting) and justin and i
>>>> are
>>>> > going to try and figure out what's causing it.
>>>> >
>>>> > cheers,
>>>> >
>>>> > satra
>>>> >
>>>> >
>>>> > On Tue, Jul 20, 2010 at 3:19 PM, Brian Granger <ellisonbg at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Satra,
>>>> >>
>>>> >> If you could test this as well, that would be great. Thanks.
>>>> Justin,
>>>> >> let us know when you think it is ready to go with the documentation
>>>> >> and testing.
>>>> >>
>>>> >> Cheers,
>>>> >>
>>>> >> Brian
>>>> >>
>>>> >> On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley <
>>>> justin.t.riley at gmail.com>
>>>> >> wrote:
>>>> >> > On 07/19/2010 01:06 AM, Brian Granger wrote:
>>>> >> >> * I like the design of the BatchEngineSet. This will be easy to
>>>> port
>>>> >> >> to
>>>> >> >> 0.11.
>>>> >> > Excellent :D
>>>> >> >
>>>> >> >> * I think if we are going to have default submission templates, we
>>>> need
>>>> >> >> to
>>>> >> >> expose the queue name to the command line. This shouldn't be
>>>> too
>>>> >> >> tough.
>>>> >> >
>>>> >> > Added --queue option to my 0.10.1-sge branch and tested this with
>>>> SGE
>>>> >> > 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the
>>>> code
>>>> >> > that *should* work with LSF.
>>>> >> >
>>>> >> >> * Have you tested this with Python 2.6. I saw that you mentioned
>>>> that
>>>> >> >> the engines were shutting down cleanly now. What did you do to
>>>> fix
>>>> >> >> that?
>>>> >> >> I am even running into that in 0.11 so any info you can provide
>>>> would
>>>> >> >> be helpful.
>>>> >> >
>>>> >> > I've been testing the code with Python 2.6. I didn't do anything
>>>> special
>>>> >> > other than switch the BatchEngineSet to using job arrays (ie a
>>>> single
>>>> >> > qsub command instead of N qsubs). Now when I run "ipcluster sge -n
>>>> 4"
>>>> >> > the controller starts and the engines are launched and at that
>>>> point the
>>>> >> > ipcluster session is running indefinitely. If I then ctrl-c the
>>>> >> > ipcluster session it catches the signal and calls kill() which
>>>> >> > terminates the engines by canceling the job. Is this the same
>>>> situation
>>>> >> > you're trying to get working?
>>>> >> >
>>>> >> >> * For now, let's stick with the assumption of a shared $HOME for
>>>> the
>>>> >> >> furl files.
>>>> >> >> * The biggest thing is if people can test this thoroughly. I
>>>> don't
>>>> >> >> have
>>>> >> >> SGE/PBS/LSF access right now, so it is a bit difficult for me to
>>>> >> >> help. I
>>>> >> >> have a cluster coming later in the summer, but it is not here
>>>> yet.
>>>> >> >> Once
>>>> >> >> people have tested it well and are satisfied with it, let's
>>>> merge it.
>>>> >> >> * If we can update the documentation about how the PBS/SGE support
>>>> >> >> works
>>>> >> >> that would be great. The file is here:
>>>> >> >
>>>> >> > That sounds fine to me. I'm testing this stuff on my workstation's
>>>> local
>>>> >> > sge/torque queues and it works fine. I'll also test this with
>>>> >> > StarCluster and make sure it works on a real cluster. If someone
>>>> else
>>>> >> > can test using LSF on a real cluster (with shared $HOME) that'd be
>>>> >> > great. I'll try to update the docs some time this week.
>>>> >> >
>>>> >> >>
>>>> >> >> Once these small changes have been made and everyone has tested,
>>>> me
>>>> >> >> can merge it for the 0.10.1 release.
>>>> >> > Excellent :D
>>>> >> >
>>>> >> >> Thanks for doing this work Justin and Satra! It is fantastic!
>>>> Just
>>>> >> >> so you all know where this is going in 0.11:
>>>> >> >>
>>>> >> >> * We are going to get rid of using Twisted in ipcluster. This
>>>> means we
>>>> >> >> have
>>>> >> >> to re-write the process management stuff to use things like
>>>> popen.
>>>> >> >> * We have a new configuration system in 0.11. This allows users
>>>> to
>>>> >> >> maintain
>>>> >> >> cluster profiles that are a set of configuration files for a
>>>> >> >> particular
>>>> >> >> cluster setup. This makes it easy for a user to have multiple
>>>> >> >> clusters
>>>> >> >> configured, which they can then start by name. The logging,
>>>> >> >> security, etc.
>>>> >> >> is also different for each cluster profile.
>>>> >> >> * It will be quite a bit of work to get everything working in
>>>> 0.11, so
>>>> >> >> I am
>>>> >> >> glad we are getting good PBS/SGE support in 0.10.1.
>>>> >> >
>>>> >> > I'm willing to help out with the PBS/SGE/LSF portion of ipcluster
>>>> in
>>>> >> > 0.11, I guess just let me know when is appropriate to start
>>>> hacking.
>>>> >> >
>>>> >> > Thanks!
>>>> >> >
>>>> >> > ~Justin
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Brian E. Granger, Ph.D.
>>>> >> Assistant Professor of Physics
>>>> >> Cal Poly State University, San Luis Obispo
>>>> >> bgranger at calpoly.edu
>>>> >> ellisonbg at gmail.com
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Brian E. Granger, Ph.D.
>>>> Assistant Professor of Physics
>>>> Cal Poly State University, San Luis Obispo
>>>> bgranger at calpoly.edu
>>>> ellisonbg at gmail.com
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20100723/ff21b804/attachment.html>
More information about the IPython-dev
mailing list