[IPython-dev] IPython Parallel: Dealing with code changes
adgaudio at gmail.com
Thu Oct 17 23:54:13 EDT 2013
Thanks everyone for building such a great tool! I have a question about
how one should deal with code changes when using IPython parallel.
The general problem I have is this: I run several deal tasks continuously
on IPython engines, and I'm also constantly pushing code changes to the
machines running my IPython parallel cluster(s). How should I architect my
system such that the engines handle code changes?
Here are a couple solutions that seem like good ideas but aren't to my
knowledge currently possible (please correct me if I'm wrong!):
- When a certain event happens (such as a code deploy), have all
existing engines restart themselves after they finish their currently
executing task. The engines will then run all outstanding jobs using the
new code base. More specifically, the first few of those pending jobs will
execute with an engine that hasn't imported any code yet.
- The other option I could see: Execute the tasks with an option that
tells the engine to treat the tasks as a proper python process so that: 1)
things like sys.exitfunc() are triggered properly when the task ends, and
2) imported modules are actually imported for the first time. Regarding
2), I know that currently, we can just import modules inside a function
call (ie view.apply(my_func_with_imports) ), but if the modules loaded by
the engine have any globally modified state, that modified state persists
across tasks. This has bitten me quite a few times.
I believe both of these options can also be solved with a soft restart of
the engines, but client.shutdown(targets=[...], restart=True) is currently
To work around the problem, I basically implemented a soft ipcluster
restart: Whenever I deploy new code to the machines, I simply start a new
ipython cluster on the machines, and I distinguish these clusters using the
--cluster-id option. (This is via a (significantly) modified version of
the default StarCluster IPython plugin). This plugin lets the old
ipcluster complete it's queue (and then it eventually gets shut down), and
all new incoming tasks go to the new ipcluster. This work around was
great, but we are starting to deploy code more frequently, which means that
there are multiple ipcluster instances on our machines at any given time,
and I foresee two problems: Multiple ipcluster instances on the same nodes
means the engines will begin competing for the limited cpu resources on one
node; And having multiple different task queues does make it a bit annoying
to keep track of which job is associated to which ipcluster, and it's also
difficult to know how many jobs are queued in general.
Do you have any suggestions as to what approach I should take? I would be
happy to work on a PR to support client.shutdown(restart=True) if you think
that is a good way to solve this problem, but I'd also like to know what
that PR would require, as I'm only slightly familiar with the
IPython.parallel code base.
Thank you in advance for your input!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-dev