[SciPy-Dev] General discussion on parallelisation

Mon Jan 8 19:35:22 EST 2018

As part of https://github.com/scipy/scipy/pull/8259 I'm proposing that a
`workers` keyword is added to optimize.differential_evolution to
parallelise some computation.

The proposal is that:

1. the workers keyword accepts either an integer or an object with a
map-like method.
2. If an integer is supplied then the parallelisation is taken care of by
scipy (more on that later), with -1 signifying that all processors are to
be used.
3. If an object with a map-like method is supplied, e.g.
`multiprocessing.Pool.map`, `mpi4py.futures.MPIPoolExecutor.map`, etc, then
the parallelisation is taken care of by that object. This allows the user
to specify the parallelisation configuration for their problem.
4. If workers=1, then computation will be done by the builtin `map`
function.

Now we come to the under the hood part. I've written something called
PoolWrapper (
https://github.com/andyfaff/scipy/blob/b14bb513c0ffb9807a67663d39b9ab399375d37d/scipy/_lib/_util.py#L343)
which wraps `multiprocessing.Pool` to achieve the behaviour outlined above.
It can be used as a context manager, or the user of the object can decide
when to close the resources opened by PoolWrapper.

I've looked at using joblib instead of PoolWrapper and it seems useful but
it doesn't have a couple of bits of functionality that are needed for this
specific problem:

viz:
5. joblib.Parallel doesn't have a map method (desirable to allow 3) so a
small wrapper would have to be created anyway.
6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the
Parallel object is `__call__`ed. This leads to significant overhead. One
can use the Parallel object with a context manager, which allows reuse of
the Pool, but I don't think that's do-able in the context of using the
DifferentialEvolutionSolver (DES) object as an iterator:

>>> solver = DifferentialEvolutionSolver(func, bounds)
>>> # use DES object as an iterator
>>> for it in solver:
...        res = next(solver)
>>> print(res)
>>> # use the DES.solve method
>>> res = solver.solve()

Whilst the DES object is not currently public (it's called by the
differential_evolution function) it would be nice to expose it in the
future, and people will want to use both approaches. Unfortunately with the
first approach if we used joblib.Parallel we'd have to use
Parallel.__call__ in DES.next() which has the overhead of
creating/destroying Pools. For efficient use of resources the Pool should
persist for the lifetime of the DES object.

I also looked into `concurrent.futures.ProcessPoolExecutor`, but it's not
available for Python 2.7.

The purpose of this email is to elicit feedback for developing
parallelisation strategy for scipy - what does the public interface look
like, what does scipy do under the hood?

Under the hood I think a mixture of PoolWrapper and joblib.Parallel could
be used (with scipy vendoring joblib).

A.
--
_____________________________________
Dr. Andrew Nelson

_____________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180109/2b0c51de/attachment.html>