[IPython-dev] Fwd: interrupt/abort parallel jobs

MinRK benjaminrk at gmail.com
Fri Mar 7 12:50:41 EST 2014


On Fri, Mar 7, 2014 at 6:01 AM, Alistair Miles <alimanfoo at googlemail.com>wrote:

> Apologies for cross-posting, I originally sent this to ipython-user but a
> colleague suggested it would be better posted here.
>
>
> ---------- Forwarded message ----------
> From: Alistair Miles <alimanfoo at googlemail.com>
> Date: Fri, Mar 7, 2014 at 11:36 AM
> Subject: interrupt/abort parallel jobs
> To: ipython-user at scipy.org
>
>
> Hi all,
>
> I know this has been raised before, but I'm using IPython parallel with
> Sun Grid Engine and I could really do with the ability to interrupt one or
> more engines without restarting them. As I understand it this is currently
> not possible?
>

Remote interrupts require on an adjacent process to the kernel in order to
deliver the signal. I have a plan for how to do this (IPEP 12), and hope to
have time to implement it in the next release or two.


>
> I have tried using AsyncResult.abort() or Client.abort(), but both of
> those seem ineffective. The former just blocks and hangs. I had a poke
> around the source code to try and figure out what these are actually doing,
> but it wasn't immediately clear. Some clarification on what these functions
> are expected to do would be great, e.g., do these just abort jobs that are
> queued but not yet executing? Or should it also somehow interrupt running
> jobs?
>

Abort sends a message on the control channel to the engine(s), indicating
that it should not execute the task when it arrives. It cannot abort a
*running* task, because the engine won't process the abort message until
after it has finished the task to be aborted.


>
> Just to give you the background, typically I'm setting up some parallel
> computation, and I set it running but then realise I made a mistake or
> something isn't running as expected, so then want to interrupt all the
> engines to cancel the currently running jobs. Of course I can just qdel all
> the ipengines and then qsub some more, but this is a pain, especially if I
> have to rerun some common setup on all the engines and/or lose my place in
> the SGE queue, or if I only need to interrupt some but not all the engines.
>

The band-aid in the meantime is to keep track of PIDs of engines, and send
signals locally or via SSH:

pid_map = rc[:].apply_async(os.getpid).get_dict()
host_map = c[:].apply_async(socket.gethostname).get_dict()

def interrupt_engine(eid):
    host = host_map[eid]
    pid = pid_map[eid]
    if host == socket.gethostname():
        # local
        os.kill(pid, signal.SIGINT)
    else:
        !ssh $host kill -INT $pid

-MinRK


>
> Thanks,
> Alistair
> --
> Alistair Miles
> Head of Epidemiological Informatics
> Centre for Genomics and Global Health <http://cggh.org>
> The Wellcome Trust Centre for Human Genetics
> Roosevelt Drive
> Oxford
> OX3 7BN
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: alimanfoo at gmail.com
> Tel: +44 (0)1865 287721 ***new number***
>
>
>
> --
> Alistair Miles
> Head of Epidemiological Informatics
> Centre for Genomics and Global Health <http://cggh.org>
> The Wellcome Trust Centre for Human Genetics
> Roosevelt Drive
> Oxford
> OX3 7BN
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: alimanfoo at gmail.com
> Tel: +44 (0)1865 287721 ***new number***
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140307/b69ea3b8/attachment.html>


More information about the IPython-dev mailing list