[IPython-dev] Before a patch for LSF support

Matthieu Brucher matthieu.brucher at gmail.com
Sun Aug 9 15:34:40 EDT 2009


2009/8/9 Brian Granger <ellisonbg.net at gmail.com>:
>
>> I ran into another issue: on a cluster, the home folder may be
>> different than on the access box. In that case, the .ipython/security
>> does not exist and the engine will not start (I've just tested this).
>
> Currently our model for ipcluster is that:
>
> .ipython/security is shared by all hosts and in the same location.  If you
> don't have this situation, you will have to manually move the .furl files
> around and tell ipengine where the .furl files are located.  You will also
> need to use persistent furl files.  Docs on all this are here:
>
> http://ipython.scipy.org/doc/stable/html/parallel/parallel_process.html

With ssh-based ipcluster, I didn't need to copy the furls, as I
launched it from the host where I launched ipython as well.

> Let us know if you have other questions - this side of things can be very
> subtle.  Another thing to watch out for.  Some batch systems *require* the
> processes on compute nodes to call MPI_Init upon starting.  This can be
> accomplished by using mpi4py.  See how we do this in the mpiexec/mpirun
> versions of ipcluster.  But on some system (depends on which MPI) that is
> not enough.  Some systems require that the *VERY FIRST* things a process
> does is call MPI_Init.  On these systems you will need to build a custom
> version of the python binary that handles this correctly.  Again, mpi4py
> provides such a binary.  Hopefully you won't have to deal with these things!

I hope so! I don't think LSF requires to launch MPI_Init, but first, I
have to get access to the log files (I don't understand why they were
not copied, whereas the job was submitted).


>> Also, I've tried to extract the job id (it seems it is needed), but
>> the BatchEngineSet.parse_job_id extracts everything that is matched by
>> the regexp describing a job (it uses group()). I had to put "Job
>> <(\d+)>" as a regexp, so group() returns, for instance, "Job <1234>"
>> instead of "1234". I may submit a patch to get group(1) and modify the
>> PBS regexp accordingly.
>
> Yes, you will *very* likely have to modify the various regexps.

Is it needed to have the exact job ID ? Perhaps to kill the job?

Cheers,

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher



More information about the IPython-dev mailing list