Mailman 3 General discussion on parallelisation - SciPy-Dev

General discussion on parallelisation

Andrew Nelson

9 Jan 2018 9 Jan '18

1:35 a.m.

As part of https://github.com/scipy/scipy/pull/8259 I'm proposing that a `workers` keyword is added to optimize.differential_evolution to parallelise some computation. The proposal is that: 1. the workers keyword accepts either an integer or an object with a map-like method. 2. If an integer is supplied then the parallelisation is taken care of by scipy (more on that later), with -1 signifying that all processors are to be used. 3. If an object with a map-like method is supplied, e.g. `multiprocessing.Pool.map`, `mpi4py.futures.MPIPoolExecutor.map`, etc, then the parallelisation is taken care of by that object. This allows the user to specify the parallelisation configuration for their problem. 4. If workers=1, then computation will be done by the builtin `map` function. Now we come to the under the hood part. I've written something called PoolWrapper ( https://github.com/andyfaff/scipy/blob/b14bb513c0ffb9807a67663d39b9ab399375d...) which wraps `multiprocessing.Pool` to achieve the behaviour outlined above. It can be used as a context manager, or the user of the object can decide when to close the resources opened by PoolWrapper. I've looked at using joblib instead of PoolWrapper and it seems useful but it doesn't have a couple of bits of functionality that are needed for this specific problem: viz: 5. joblib.Parallel doesn't have a map method (desirable to allow 3) so a small wrapper would have to be created anyway. 6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the Parallel object is `__call__`ed. This leads to significant overhead. One can use the Parallel object with a context manager, which allows reuse of the Pool, but I don't think that's do-able in the context of using the DifferentialEvolutionSolver (DES) object as an iterator:

...

...
...
solver = DifferentialEvolutionSolver(func, bounds) # use DES object as an iterator for it in solver: ... res = next(solver) print(res) # use the DES.solve method res = solver.solve()

Whilst the DES object is not currently public (it's called by the differential_evolution function) it would be nice to expose it in the future, and people will want to use both approaches. Unfortunately with the first approach if we used joblib.Parallel we'd have to use Parallel.__call__ in DES.next() which has the overhead of creating/destroying Pools. For efficient use of resources the Pool should persist for the lifetime of the DES object. I also looked into `concurrent.futures.ProcessPoolExecutor`, but it's not available for Python 2.7. The purpose of this email is to elicit feedback for developing parallelisation strategy for scipy - what does the public interface look like, what does scipy do under the hood? Under the hood I think a mixture of PoolWrapper and joblib.Parallel could be used (with scipy vendoring joblib). A. -- _____________________________________ Dr. Andrew Nelson _____________________________________

Attachments:

attachment.htm (text/html — 3.6 KB)

Show replies by date

Eric Larson

9 Jan 9 Jan

3:48 a.m.

...

6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the Parallel object is `__call__`ed. This leads to significant overhead. One can use the Parallel object with a context manager, which allows reuse of the Pool, but I don't think that's do-able in the context of using the DifferentialEvolutionSolver (DES) object as an iterator:

If you can use `__iter__` instead of `__next__` in your DifferentialEvolutionSolver class I think you can avoid this problem with something along the lines of: class DifferentialEvolutionSolver(object): ... def __iter__(self): with Parallel(...) ...: for it in ...: yield it As for the map problem I don't know, but it's probably worth asking the `joblib` people if there is a suitable solution or workaround. Eric

Andrew Nelson

5:22 a.m.

...

On 9 January 2018 at 13:48, Eric Larson <larson.eric.d@gmail.com> wrote:

...
If you can use `__iter__` instead of `__next__` in your DifferentialEvolutionSolver class I think you can avoid this problem with something along the lines of:

class DifferentialEvolutionSolver(object): ... def __iter__(self): with Parallel(...) ...: for it in ...: yield it

That seems Pythonic, I'll look into that in more detail. One hurdle might be when the two approaches are mixed, starting off with a generator, uses the solver method, etc.

Andrew Nelson

10 Jan 10 Jan

1:01 a.m.

On 9 January 2018 at 13:48, Eric Larson <larson.eric.d@gmail.com> wrote:

...

...
If you can use `__iter__` instead of `__next__` in your

DifferentialEvolutionSolver class I think you can avoid this problem with something along the lines of:

class DifferentialEvolutionSolver(object): ... def __iter__(self): with Parallel(...) ...: for it in ...: yield it

As for the map problem I don't know, but it's probably worth asking the `joblib` people if there is a suitable solution or workaround.

In the scheme of things I'd prefer to retain the use of `__next__`, see https://github.com/scipy/scipy/pull/6923 for the direction of where I think things are headed.

Gael Varoquaux

9 Jan 9 Jan

2:34 p.m.

...

5. joblib.Parallel doesn't have a map method (desirable to allow 3) so a small

joblib has a custom backend framework that can be used for such purpose (if I understnad you well): https://pythonhosted.org/joblib/parallel.html#custom-backend-api-experimenta... There are currently a Yarn and a dask.distributed backend that are getting better and better.

...

6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the Parallel object is `__call__`ed. This leads to significant overhead. One can use the Parallel object with a context manager, which allows reuse of the Pool, but I don't think that's do-able in the context of using the DifferentialEvolutionSolver (DES) object as an iterator:

This is evolving. However, the reason behind this is that Pool get corrupted and lead to deadlock. Olivier Grisel and Thomas Moreau are working on fixing this in the Python standard library (first PR merged recently)! One of the vision of joblib is to provide very light mid-layer that can connect to multiprocessing and threading (though we are considering switching to concurrent.futures) as well as other backends. Hopefully this common language makes it easier to do things like embedding dask in numerical algorithms without a hard dependencies (yes we are working with the dask team on this). Gaël

Ralf Gommers

3 Sep 3 Sep

5:58 a.m.

reviving an old thread because there's movement on the PR ( https://github.com/scipy/scipy/pull/8259) and would be nice to get that merged. On Wed, Jan 10, 2018 at 2:34 AM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:

...

...
5. joblib.Parallel doesn't have a map method (desirable to allow 3) so a small

joblib has a custom backend framework that can be used for such purpose (if I understnad you well):

https://pythonhosted.org/joblib/parallel.html#custom-backend-api-experimenta...

Updated link (status is still experimental): https://joblib.readthedocs.io/en/latest/parallel.html#custom-backend-api-exp...

...

There are currently a Yarn and a dask.distributed backend that are getting better and better.

There's also this JoblibPool that can be taken over: https://github.com/adrn/schwimmbad/blob/master/schwimmbad/jl.py#L14 Seems simpler than a backend still tagged experimental.

...

...
6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the Parallel object is `__call__`ed. This leads to significant overhead. One can use the Parallel object with a context manager, which allows reuse of the Pool, but I don't think that's do-able in the context of using the DifferentialEvolutionSolver (DES) object as an iterator:

This is evolving. However, the reason behind this is that Pool get corrupted and lead to deadlock. Olivier Grisel and Thomas Moreau are working on fixing this in the Python standard library (first PR merged recently)!

Anyone know the status of this? And can this issue be avoided by the new loky backend to joblib? Ralf

...

One of the vision of joblib is to provide very light mid-layer that can connect to multiprocessing and threading (though we are considering switching to concurrent.futures) as well as other backends. Hopefully this common language makes it easier to do things like embedding dask in numerical algorithms without a hard dependencies (yes we are working with the dask team on this).

Gaël _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Gael Varoquaux

11:22 a.m.

On Sun, Sep 02, 2018 at 08:58:12PM -0700, Ralf Gommers wrote:

...

joblib has a custom backend framework that can be used for such purpose (if I understnad you well): https://pythonhosted.org/joblib/parallel.html# custom-backend-api-experimental

...

Updated link (status is still experimental): https://joblib.readthedocs.io/en/latest/parallel.html#custom-backend-api-exp...

Still experimental, but less and less :). We are using this in anger these days.

...

There's also this JoblibPool that can be taken over: https://github.com/adrn/schwimmbad/blob/master/schwimmbad/jl.py#L14 Seems simpler than a backend still tagged experimental.

Well, we have a fairly stringent definition of experimental. This feature is no longer very experimental.

...

This is evolving. However, the reason behind this is that Pool get corrupted and lead to deadlock. Olivier Grisel and Thomas Moreau are working on fixing this in the Python standard library (first PR merged recently)!

...

Anyone know the status of this? And can this issue be avoided by the new loky backend to joblib?

This is merged and released since a while. Loky is now used in anger. So far, I think that people are happy with it. In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7). Joblib has lately been getting a lot of improvements to make it more robust and scaleable [1]. It will still have some overhead, due to pickling. Pickling speed should be solved by coordinating upstream changes in Python with implementations in numpy. Olivier Grisel has been coordinating with Python for this. I believe that PEP 574 [2] is related to these efforts. The specific challenges are to enable fast code paths in cloud pickle, which is necessary to pickle arbitrary objects and code. While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle. I think that the best way to move forward from here would be to do some prototyping and experimentation. Gaël [1] Joblib changelog: https://joblib.readthedocs.io/en/latest/developing.html#latest-changes [2] Pickling improvement PEP: https://www.python.org/dev/peps/pep-0574/

Andrew Nelson

11:30 a.m.

SNIP

...

In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

SNIP

...

While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle.

Is there anything I can read to learn a bit more about this? Is it mainly related to how easy things are to pickle?

Gael Varoquaux

11:50 a.m.

On Mon, Sep 03, 2018 at 07:30:22PM +1000, Andrew Nelson wrote:

...

SNIP

...
In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

...

SNIP

...
While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle.

...

Is there anything I can read to learn a bit more about this? Is it mainly related to how easy things are to pickle?

The major problem are related to state inside multiprocessing.Pool that can be broken when child processes go astray. As a result, multiprocessing.Pool often gets into deadlock (any heavy user of multiprocessing has probably experienced this). See for instance https://tommoral.github.io/talks/pyparis17/#37 Gaël

Andrew Nelson

12:34 p.m.

...

This is merged and released since a while. Loky is now used in anger. So far, I think that people are happy with it. In particular it is more robust

<sheepishly> Ok, I managed to search and AFAICT the main issue is that if a child process terminates then the master process can hang waiting for it to return (https://bugs.python.org/issue22393). than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7). So, reading the `concurrent.futures.ProcessPoolExecutor` documentation indicates that it is resistant to this issue. concurrent.futures is available in Python3, but wasn't ported to 2.7. On Mon, 3 Sep 2018 at 19:30, Andrew Nelson <andyfaff@gmail.com> wrote:

...

SNIP

...
In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

SNIP

...
While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle.

Is there anything I can read to learn a bit more about this? Is it mainly related to how easy things are to pickle?

-- _____________________________________ Dr. Andrew Nelson _____________________________________

Ralf Gommers

7:16 p.m.

On Mon, Sep 3, 2018 at 3:34 AM Andrew Nelson <andyfaff@gmail.com> wrote:

...

<sheepishly> Ok, I managed to search and AFAICT the main issue is that if a child process terminates then the master process can hang waiting for it to return (https://bugs.python.org/issue22393).

...
This is merged and released since a while. Loky is now used in anger. So far, I think that people are happy with it. In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

Thanks for the details Gael! So, reading the `concurrent.futures.ProcessPoolExecutor` documentation

...

indicates that it is resistant to this issue. concurrent.futures is available in Python3, but wasn't ported to 2.7.

The PR now uses multiprocessing on Python 2.7 and concurrent.futures on 3.x - this seems fine to me. We're not supporting 2.7 for that much longer, so the code can be simplified a bit when we drop 2.7 Ralf

Gael Varoquaux

10:05 p.m.

On Mon, Sep 03, 2018 at 10:16:51AM -0700, Ralf Gommers wrote:

...

So, reading the `concurrent.futures.ProcessPoolExecutor` documentation indicates that it is resistant to this issue. concurrent.futures is available in Python3, but wasn't ported to 2.7.

...

The PR now uses multiprocessing on Python 2.7 and concurrent.futures on 3.x - this seems fine to me. We're not supporting 2.7 for that much longer, so the code can be simplified a bit when we drop 2.7

OK. I can think of two quite use features that joblib add: * Support of dask.distributed as a backend, to distribute code across computers. * Fallback to threading in case of nested parallelism, and in case of two levels of nesting, fall back to sequential to avoid over committing. Gaël

Ralf Gommers

10:10 p.m.

On Mon, Sep 3, 2018 at 1:05 PM Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:

...

On Mon, Sep 03, 2018 at 10:16:51AM -0700, Ralf Gommers wrote:

...
So, reading the `concurrent.futures.ProcessPoolExecutor`

documentation

...
indicates that it is resistant to this issue. concurrent.futures is available in Python3, but wasn't ported to 2.7.

...
The PR now uses multiprocessing on Python 2.7 and concurrent.futures on 3.x - this seems fine to me. We're not supporting 2.7 for that much longer, so the code can be simplified a bit when we drop 2.7

OK. I can think of two quite use features that joblib add:

* Support of dask.distributed as a backend, to distribute code across computers.

* Fallback to threading in case of nested parallelism, and in case of two levels of nesting, fall back to sequential to avoid over committing.

Those are both quite useful. How would you add an API for those to SciPy functions (if that's necessary - I assume threading fallback is automatic)? Right now we have a single keyword `workers=1`, which behaves like scikit-learn's n_jobs=1 (# of CPUs), and also accept objects with a map() method like multiprocessing.Pool Ralf

Gael Varoquaux

10:16 p.m.

On Mon, Sep 03, 2018 at 01:10:50PM -0700, Ralf Gommers wrote:

...

Right now we have a single keyword `workers=1`, which behaves like scikit-learn's n_jobs=1 (# of CPUs), and also accept objects with a map() method like multiprocessing.Pool

I think that we wouldn't mind adding a "map" to a Parallel object. Though it feels quite strange, as the Parallel object is meant to be higher-level abstract than the Pool. Gaël

Andrew Nelson

10:26 p.m.

...

...
Right now we have a single keyword `workers=1`, which behaves like scikit-learn's n_jobs=1 (# of CPUs), and also accept objects with a map() method like multiprocessing.Pool

I think that we wouldn't mind adding a "map" to a Parallel object. Though it feels quite strange, as the Parallel object is meant to be higher-level abstract than the Pool.

The design also allows for things like `mpi4py.futures.MPIPoolExecutor` to be supplied.

...

Ralf Gommers

7 Sep 7 Sep

7:49 a.m.

On Mon, Sep 3, 2018 at 1:27 PM Andrew Nelson <andyfaff@gmail.com> wrote:

...

...
Right now we have a single keyword `workers=1`, which behaves like

...
scikit-learn's n_jobs=1 (# of CPUs), and also accept objects with a map() method like multiprocessing.Pool

I think that we wouldn't mind adding a "map" to a Parallel object. Though it feels quite strange, as the Parallel object is meant to be higher-level abstract than the Pool.

I agree that that's strange, and it looks like we don't need it.

...

The design also allows for things like `mpi4py.futures.MPIPoolExecutor` to be supplied.

Update for everyone: at the moment the PR is updated to use concurrent.future.ProcessPoolExecutor on Python 3 by default, and multiprocessing on Python 2.7. The design is compatible with joblib, however vendoring joblib and using it for the workers=int case is left to a future PR. Unless there's more comments/thoughts, I plan to hit the green button after a final review/benchmark. Cheers, Ralf

Ralf Gommers

3 Sep 3 Sep

10:44 p.m.

On Mon, Sep 3, 2018 at 1:10 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...

On Mon, Sep 3, 2018 at 1:05 PM Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:

...
On Mon, Sep 03, 2018 at 10:16:51AM -0700, Ralf Gommers wrote:

...
So, reading the `concurrent.futures.ProcessPoolExecutor`

documentation

...
indicates that it is resistant to this issue. concurrent.futures is available in Python3, but wasn't ported to 2.7.

...
The PR now uses multiprocessing on Python 2.7 and concurrent.futures on 3.x - this seems fine to me. We're not supporting 2.7 for that much longer, so the code can be simplified a bit when we drop 2.7

OK. I can think of two quite use features that joblib add:

* Support of dask.distributed as a backend, to distribute code across computers.

* Fallback to threading in case of nested parallelism, and in case of two levels of nesting, fall back to sequential to avoid over committing.

Those are both quite useful. How would you add an API for those to SciPy functions (if that's necessary - I assume threading fallback is automatic)?

Okay found the answer I think (from http://matthewrocklin.com/blog/work/2017/02/07/dask-sklearn-simple): from joblib import parallel_backend with parallel_backend('dask.distributed', scheduler_host='scheduler-address:8786'): some_scipy_func(..., workers=N) That would be quite nice. If we vendor joblib then the import can be: from scipy import parallel_backend Ralf

Phillip Feldman

9 Sep 9 Sep

12:03 a.m.

I don't understand what is meant by the phase "used in anger". Phillip On Mon, Sep 3, 2018 at 2:22 AM Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:

...

On Sun, Sep 02, 2018 at 08:58:12PM -0700, Ralf Gommers wrote:

...
joblib has a custom backend framework that can be used for such

purpose

...
(if I understnad you well): https://pythonhosted.org/joblib/parallel.html# custom-backend-api-experimental

...
Updated link (status is still experimental): https://joblib.readthedocs.io/en/latest/parallel.html#custom-backend-api-exp...

Still experimental, but less and less :). We are using this in anger these days.

...
There's also this JoblibPool that can be taken over: https://github.com/adrn/schwimmbad/blob/master/schwimmbad/jl.py#L14 Seems simpler than a backend still tagged experimental.

Well, we have a fairly stringent definition of experimental. This feature is no longer very experimental.

...
This is evolving. However, the reason behind this is that Pool get corrupted and lead to deadlock. Olivier Grisel and Thomas Moreau are working on fixing this in the Python standard library (first PR

merged

...
recently)!

...
Anyone know the status of this? And can this issue be avoided by the new loky backend to joblib?

This is merged and released since a while. Loky is now used in anger. So far, I think that people are happy with it. In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

Joblib has lately been getting a lot of improvements to make it more robust and scaleable [1]. It will still have some overhead, due to pickling. Pickling speed should be solved by coordinating upstream changes in Python with implementations in numpy. Olivier Grisel has been coordinating with Python for this. I believe that PEP 574 [2] is related to these efforts. The specific challenges are to enable fast code paths in cloud pickle, which is necessary to pickle arbitrary objects and code.

While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle.

I think that the best way to move forward from here would be to do some prototyping and experimentation.

Gaël

[1] Joblib changelog: https://joblib.readthedocs.io/en/latest/developing.html#latest-changes

[2] Pickling improvement PEP: https://www.python.org/dev/peps/pep-0574/

_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Gael Varoquaux

12:39 a.m.

I meant: used heavily. Gaël ⁣Sent from my phone. Please forgive typos and briefness. On Sep 9, 2018, 00:04, at 00:04, Phillip Feldman <phillip.m.feldman@gmail.com> wrote:

...

I don't understand what is meant by the phase "used in anger".

Phillip

On Mon, Sep 3, 2018 at 2:22 AM Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:

...
On Sun, Sep 02, 2018 at 08:58:12PM -0700, Ralf Gommers wrote:

...
joblib has a custom backend framework that can be used for such

purpose

...
(if I understnad you well): https://pythonhosted.org/joblib/parallel.html# custom-backend-api-experimental

...
Updated link (status is still experimental):

https://joblib.readthedocs.io/en/latest/parallel.html#custom-backend-api-exp...

...
Still experimental, but less and less :). We are using this in anger these days.

...
There's also this JoblibPool that can be taken over: https://github.com/adrn/schwimmbad/blob/master/schwimmbad/jl.py#L14 Seems simpler than a backend still tagged experimental.

Well, we have a fairly stringent definition of experimental. This

...
is no longer very experimental.

...
This is evolving. However, the reason behind this is that Pool

get

...
corrupted and lead to deadlock. Olivier Grisel and Thomas

Moreau are

...
working on fixing this in the Python standard library (first PR

merged

...
recently)!

...
Anyone know the status of this? And can this issue be avoided by

feature the new

...
loky

...
backend to joblib?

This is merged and released since a while. Loky is now used in anger. So far, I think that people are happy with it. In particular it is more robust than multiprocessing.Pool (specifically, robust to segfault), and the improvements have been contributed upstream to Python concurrent.futures's process pool executor (available in Python 3.7).

Joblib has lately been getting a lot of improvements to make it more robust and scaleable [1]. It will still have some overhead, due to pickling. Pickling speed should be solved by coordinating upstream changes in Python with implementations in numpy. Olivier Grisel has been coordinating with Python for this. I believe that PEP 574 [2] is related to these efforts. The specific challenges are to enable fast code paths in cloud pickle, which is necessary to pickle arbitrary objects and code.

While simpler multiprocessing-based code will sometimes give less overhead compared to joblib, it will probably be brittle.

I think that the best way to move forward from here would be to do some prototyping and experimentation.

Gaël

[1] Joblib changelog:

https://joblib.readthedocs.io/en/latest/developing.html#latest-changes

...
[2] Pickling improvement PEP:

https://www.python.org/dev/peps/pep-0574/

...
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

------------------------------------------------------------------------

_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

2118

Age (days ago)

2360

Last active (days ago)

List overview

Download

18 comments

5 participants

participants (5)

Andrew Nelson
Eric Larson
Gael Varoquaux
Phillip Feldman
Ralf Gommers

General discussion on parallelisation

tags

participants (5)