Combinatorial optimization in scipy.optimize
![](https://secure.gravatar.com/avatar/74ebfe7ec181ccb3b053009f7a8ba74d.jpg?s=120&d=mm&r=g)
Hi SciPy-Dev This question is motivated by the recent improvements to scipy.optimize.linear_sum_assignment. [0] Here, we get a much improved algorithm for solving minimum weight perfect matching in bipartite graphs (which is excellent). Now, this problem is one in a large family of combinatorial optimization problems on packing and covering, and what I'm wondering is if SciPy should be the home for a general class of efficient algorithms for solving those? Most of these can be phrased in terms of graph theory, and indeed networkx already contains implementations of many classical algorithms (and I was surprised to find that it didn't have anything matching linear_sum_assignment directly; it does have an implementation of the Blossom algorithm). So there's some overlap of scope here, but I guess that's not a problem in and by itself, but it had me wondering where the split should be? Just to phrase this in terms of (a very incomplete list of) examples (some of which have polynomial solutions, some of which don't), would there be appetite for including into scipy.optimize algorithms for solving e.g. 1. other covering/packing problems (e.g. min set/vertex/edge cover, maximum matching, ...), 2. maximum flow, 3. shortest path, 4. travelling salesman, 5. circuit minimization? Søren [0]: https://github.com/scipy/scipy/pull/10296
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Jun 27, 2019 at 4:08 AM Søren Fuglede Jørgensen < scipy-dev@fuglede.dk> wrote:
Hi SciPy-Dev
This question is motivated by the recent improvements to scipy.optimize.linear_sum_assignment. [0] Here, we get a much improved algorithm for solving minimum weight perfect matching in bipartite graphs (which is excellent).
Now, this problem is one in a large family of combinatorial optimization problems on packing and covering, and what I'm wondering is if SciPy should be the home for a general class of efficient algorithms for solving those?
Most of these can be phrased in terms of graph theory, and indeed networkx already contains implementations of many classical algorithms (and I was surprised to find that it didn't have anything matching linear_sum_assignment directly; it does have an implementation of the Blossom algorithm). So there's some overlap of scope here, but I guess that's not a problem in and by itself, but it had me wondering where the split should be?
Just to phrase this in terms of (a very incomplete list of) examples (some of which have polynomial solutions, some of which don't), would there be appetite for including into scipy.optimize algorithms for solving e.g.
1. other covering/packing problems (e.g. min set/vertex/edge cover, maximum matching, ...), 2. maximum flow, 3. shortest path, 4. travelling salesman,
5. circuit minimization?
Some of these are in scipy.sparse.csgraph already. We indeed don't want to do everything that NetworkX already does, however if it's a well-known algorithm that would fit in with what we already have and focus on performance (which NetworkX does not do) then I think there's scope for more. For sparse.csgraph I think Jake implemented a set of standard algorithms that can be found in a graph algorithms introductory handbook. It also depends to some extent on who writes the code - technical code like that is often not the easiest to maintain, so it would make the conversation and review process a lot easier of there is a person (or two people) with a commitment to maintain the code going forward. Cheers, Ralf
![](https://secure.gravatar.com/avatar/65385cc1c01c2560dca65bc1771bd2d1.jpg?s=120&d=mm&r=g)
Just as information, might be useful. I remember that it was discussed something over a year ago whether or not having numba as a dependency. I think it was Ralf who said that it would be nice if one could achieve with a simple decorator a speed-up with numba - if it is installed, and otherwise just keep the status quo. The following simple wrap resulted in a speed-up of a factor 2; the speed up was pretty constant (in my short tests) for arrays say from a couple of ten-thousands to 1e9 complex elements; for smaller array the gain was even higher. @nb.njit(nogil=True, fastmath=True, cache=True) def l2norm(x): """Jitted version of np.linalg.norm(x, ord=None).""" return np.linalg.norm(x) Again, just for the record, for future discussions on the topic.
![](https://secure.gravatar.com/avatar/e329c69592ab2a0f893458bd058895f9.jpg?s=120&d=mm&r=g)
On 7/3/19 10:19 PM, Dieter Werthmüller wrote:
I think it was Ralf who said that it would be nice if one could achieve with a simple decorator a speed-up with numba - if it is installed, and otherwise just keep the status quo.
This is what we do in poliastro for example: https://github.com/poliastro/poliastro/blob/v0.12.0/src/poliastro/core/_jit.... Also works on PyPy, which in theory should not need numba: install_requires = ... numba>=0.39 ; implementation_name=='cpython' Regards, Juan Luis
![](https://secure.gravatar.com/avatar/550b5f672c119c55882aa0ce14890089.jpg?s=120&d=mm&r=g)
On Thu, Jul 04, 2019 at 09:07:22AM +0200, Juan Luis Cano Rodríguez wrote:
On 7/3/19 10:19 PM, Dieter Werthmüller wrote:
I think it was Ralf who said that it would be nice if one could achieve with a simple decorator a speed-up with numba - if it is installed, and otherwise just keep the status quo.
This is what we do in poliastro for example:
https://github.com/poliastro/poliastro/blob/v0.12.0/src/poliastro/core/_jit....
Also works on PyPy, which in theory should not need numba:
install_requires = ... numba>=0.39 ; implementation_name=='cpython'
(a bit of self-advertisement here) IMHO, this is where Pythran[0] shines: the input code is pure, high-level Python. And unlike numba, there's no need to explicit loop over arrays, numpy operations are supported so the performance loss when falling back to python is acceptable. Just a maintainer opinion though :-) ++ Serge [0] https://github.com/serge-sans-paille/pythran
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Jul 4, 2019 at 12:37 AM Serge Guelton < serge.guelton@telecom-bretagne.eu> wrote:
On Thu, Jul 04, 2019 at 09:07:22AM +0200, Juan Luis Cano Rodríguez wrote:
On 7/3/19 10:19 PM, Dieter Werthmüller wrote:
I think it was Ralf who said that it would be nice if one could achieve with a simple decorator a speed-up with numba - if it is installed, and otherwise just keep the status quo.
I think there was some pushback on this idea, because it only works for a limited set of functionality. If the speed difference is 100x from some pure-Python for-loop code, the non-numba performance may be unacceptably slow. So it's only helpful for cases like speeding up np.linalg (aside: that particular one we shouldn't use, we have scipy.linalg). I've talked with multiple people about Numba recently, and I think the broad consensus was that while @jit is nice during development, Numba needs better support for ahead-of-time compilation.
This is what we do in poliastro for example:
https://github.com/poliastro/poliastro/blob/v0.12.0/src/poliastro/core/_jit....
Also works on PyPy, which in theory should not need numba:
install_requires = ... numba>=0.39 ; implementation_name=='cpython'
(a bit of self-advertisement here)
Thanks Serge. I appreciate your updates on Pythran. My impression is that Pythran is gaining in popularity, and with it being usable within Cython, adopted by Xtensor, and a new project like Transonic (and maybe more?), the maintenance situation and risk of adopting Pythran seems to be going down.
IMHO, this is where Pythran[0] shines: the input code is pure, high-level Python. And unlike numba, there's no need to explicit loop over arrays,
Numba understands those too I believe. Anyway, many of the more interesting cases for Cython/Numba/Pythran are when pure numpy isn't an option because the algorithm isn't nicely expressable in vectorized form. Cheers, Ralf numpy operations
are supported so the performance loss when falling back to python is acceptable.
Just a maintainer opinion though :-)
++ Serge
[0] https://github.com/serge-sans-paille/pythran _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
![](https://secure.gravatar.com/avatar/74ebfe7ec181ccb3b053009f7a8ba74d.jpg?s=120&d=mm&r=g)
On Wed, Jul 03, 2019 at 12:04:38PM -0700, Ralf Gommers wrote:
Some of these are in scipy.sparse.csgraph already.
Thanks for the reference to scipy.sparse.csgraph; I had indeed missed that completely, and had I known about it, I might have actually expected something like linear_sum_assignment to live there, as touched upon in https://github.com/scipy/scipy/pull/10296.
We indeed don't want to do everything that NetworkX already does, however if it's a well-known algorithm that would fit in with what we already have and focus on performance (which NetworkX does not do) then I think there's scope for more. For sparse.csgraph I think Jake implemented a set of standard algorithms that can be found in a graph algorithms introductory handbook.
I think it's definitely possible to find a handful of algorithms that match that description; the first that came to mind here would be the Edmonds--Karp and push--relabel algorithms for computing maximum flow.
It also depends to some extent on who writes the code - technical code like that is often not the easiest to maintain, so it would make the conversation and review process a lot easier of there is a person (or two people) with a commitment to maintain the code going forward.
Makes sense. And I can see how that would also make you favor simpler versions of whatever algorithms might be relevant. Søren
participants (5)
-
Dieter Werthmüller
-
Juan Luis Cano Rodríguez
-
Ralf Gommers
-
Serge Guelton
-
Søren Fuglede Jørgensen