Hi all, I spent some time rewriting the scipy.optimize.nonlin module: http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py The following things changed: - Support tolerance-based stopping conditions (cf. ticket #791) - Improved handling of input and return values from the functions, so that they are now easier to use. - Don't use np.matrix at all. - Jacobian approximations factored into classes; the iteration code is now in only one place, so trust-region handling or other improvements can be added to a single place later on. - There's now a line search to the direction inverting the Jacobian gave. (But there's no checking that it is an actual descent direction for some merit function.) - Rewrote docstrings. The routines should produce the same output as previously. The tests are still not very strong, however. But I have now some questions: * Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8? * Some of the algorithms in there don't appear to work too well, and some appear to be redundant. I'd like to clean up this a bit, leaving only known-good stuff in. * I'd like to remove `broyden2` as the actual Jacobian approximation in this appears to be the same as in `broyden3`, and there does not appear to be much difference in the work involved in the two. Ondrej, since you wrote the original code, do you think there is a reason to keep both? * `broyden1_modified` appears to be, in the end if you work out the matrix algebra, updating the inverse Jacobian in a way that corresponds to J := J + (y - J s / 2) s^T / ||s||^2 for the Jacobian (with s = dx, y = df). Apart from the factor 1/2, it's Broyden's good method. [1] One can also verify that the updated inverse Jacobian does not satisfy the quasi-Newton condition s = J^{-1} y, and that `broyden1_modified` doesn't generate the same sequence as `broyden1`. Hence, I'd like to remove this routine, unless there's some literature that shows that the above works better than Broyden's method; Ondrej, do you agree? .. [1] http://en.wikipedia.org/wiki/Broyden%27s_method http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula * Also, which articles were used as reference for the non-Quasi-Newton algorithms: - `broyden_modified`. This appears to be a bit exotic, and it uses several magic constants (`w0`, `wl`) whose meaning is not clear to me. A reference would be helpful here, also for the user who has to choose the parameters as appropriate for his/her specific problem. - `broyden_generalized`, `anderson`, `anderson2`. These appear to be variants of Anderson mixing, so probably we only want at most one of these. Also, broyden_generalized() == anderson(w0=0), am I correct? `anderson` and `anderson2` don't appear to function equivalently, and I suspect the update formula in the latter is wrong, since this algorithm can't solve any of the test problems. Do you have a reference for this? Is there a rule how `w0` should be chosen for some specific problem? - `excitingmixing`. A reference would be useful, to clarify the heuristics used. - `linearmixing`. I'm not sure that Scipy should advocate this method :) Linearmixing and excitingmixing also seem to require something from the objective function, possibly that the eigenvalues of its Jacobian have the opposite sign than `alpha`. For example, neither of them can find a solution for the equation ``I x = 0`` where ``I`` is the identity matrix (test function F2). So, I'm a bit tempted to remove also these routines, as it seems they probably are not too useful for general problems. * The code in there is still a bit naive implementation of the inexact (Quasi-)Newton method, and one could add eg. add trust-region handling or try to guarantee that the line search direction is always a decrease direction for a merit function. (I don't know if it's possible to do the latter unless one has a matrix representation of the Jacobian approximation.) So, I suspect there are problems for which eg. MINPACK code will find solutions, but for which the nonlin.py code fails. * One could add more algorithms suitable for large-scale problems; for example some limited-memory Broyden methods (eg. [1]) or Secant-Krylov methods [2]. .. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps .. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods. Journal of Computational Physics, 20(2):357–397, 2004. I have implementations for both types of algorithms that could possibly go in after some polishing. -- Pauli Virtanen
Pauli Virtanen wrote:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8?
I would much prefer delaying this after 0.7 release. Unfortunately, I have nothing else to say, since I know nothing to optimization :) David
2008/11/30 David Cournapeau <david@ar.media.kyoto-u.ac.jp>:
Pauli Virtanen wrote:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8?
I would much prefer delaying this after 0.7 release. Unfortunately, I have nothing else to say, since I know nothing to optimization :)
Postponing a patch often leads to the author losing interest, and to the patch never being applied. I don't know if we can afford that. I'd be quite happy to see this going into 0.7 rc2 and trying it out for beta 1 (or whatever the naming scheme is), especially if we can fine-tune the tests a bit. Cheers Stéfan
On Mon, Dec 1, 2008 at 7:10 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
Postponing a patch often leads to the author losing interest, and to the patch never being applied.
The patch should have been applied before. We are already in beta phase, a few days away from releasing scipy. I don't see the point of taking the time to do beta if we keep adding code, specially after the first beta. Also, rushing to add new code may lead to oversight some limitation, some API bug, etc... David
On Sun, Nov 30, 2008 at 8:32 PM, David Cournapeau <cournape@gmail.com>wrote:
On Mon, Dec 1, 2008 at 7:10 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
Postponing a patch often leads to the author losing interest, and to the patch never being applied.
The patch should have been applied before. We are already in beta phase, a few days away from releasing scipy. I don't see the point of taking the time to do beta if we keep adding code, specially after the first beta. Also, rushing to add new code may lead to oversight some limitation, some API bug, etc...
Agree. The code seems to need extensive rework and it would be better to take the time to get it right. Chuck
2008/12/1 David Cournapeau <cournape@gmail.com>:
first beta. Also, rushing to add new code may lead to oversight some limitation, some API bug, etc...
I'm happy to postpone, but then we should accept that the API may change just after 0.7. It concerns me that non-reviewed code becomes the standard quo, and then sets the API for the future. Also, if Pauli is willing to write some tests, we can commit those so long. Cheers Stéfan
Stéfan van der Walt wrote:
2008/12/1 David Cournapeau <cournape@gmail.com>:
first beta. Also, rushing to add new code may lead to oversight some limitation, some API bug, etc...
I'm happy to postpone, but then we should accept that the API may change just after 0.7. It concerns me that non-reviewed code becomes the standard quo, and then sets the API for the future.
Hey, that's one of the reason why I prefer avoiding changes during the beta timeline :) By rushing code into the release, we are less likely to review changes correctly; from the discussion, it looks quite likely that the rewrite is not ready yet. The only goal of a beta is to see whether we have unexpected failures, and solve them until the release. If we add new code, there is no point in doing beta. Even without taking into account this, having a window where no new feature is allowed is a really common process; even linux, which changes extremely rapidly, has a limited time window for merges. Every project with a proper release management enforce those very strictly, because that's the whole point, really. Otherwise, releases are nothing more than a snapshot of the source code at a given time The problem of people not seeing their code quickly should be solved by doing more frequent releases IMO; "freezing" the trunk for some time before releases helps that, rushing new code does not. It is not theoretical: almost all my time spent on scipy the last few weeks have been related to recently added code. cheers, David
2008/12/1 David Cournapeau <david@ar.media.kyoto-u.ac.jp>:
The only goal of a beta is to see whether we have unexpected failures, and solve them until the release. If we add new code, there is no point in doing beta. Even without taking into account this, having a window where no new feature is allowed is a really common process; even linux, which changes extremely rapidly, has a limited time window for merges.
You are preaching to the converted, but just to point out a subtlety: Linux and most other projects allow contributions to be accepted while they freeze, albeit to a different branch, tree, etc. We should have a similar mechanism in place (we've tried SVN branches without much success in the past).
Every project with a proper release management enforce those very strictly, because that's the whole point, really. Otherwise, releases are nothing more than a snapshot of the source code at a given time
Ideally, releases should simply be snapshots. In practice, that is seldom feasible.
before releases helps that, rushing new code does not. It is not theoretical: almost all my time spent on scipy the last few weeks have been related to recently added code.
Just because the old code builds, does not mean that there isn't deep underlying pathology. But yes, "rushing" code through is a bad idea. Only, in this case I trust the capabilities of the programmer, and if he were to provide unit tests, I think we would have been better off. But let me re-iterate, I am quite happy to postpone, as long as there is no back-lash upon changing the API just after 0.7 has been released. Cheers Stéfan
Stéfan van der Walt wrote:
You are preaching to the converted, but just to point out a subtlety: Linux and most other projects allow contributions to be accepted while they freeze, albeit to a different branch, tree, etc. We should have a similar mechanism in place (we've tried SVN branches without much success in the past).
I agree 100 % here; I certainly don't care about the code being put somewhere in svn which will not be used as the basis for the release (0.7 in this case). David
Mon, 01 Dec 2008 14:48:58 +0200, Stéfan van der Walt wrote:
I'm happy to postpone, but then we should accept that the API may change just after 0.7. It concerns me that non-reviewed code becomes the standard quo, and then sets the API for the future. Also, if Pauli is willing to write some tests, we can commit those so long.
To clarify: there are unit tests, cf. test_nonlin.py in the branch, old and a couple of new ones. The problem is that these currently only show that the algorithms (well, the quasi-Newton ones) find solutions to several test problems without problems, but not really that they work optimally. -- Pauli Virtanen
Mon, 01 Dec 2008 00:10:53 +0200, Stéfan van der Walt wrote:
2008/11/30 David Cournapeau <david@ar.media.kyoto-u.ac.jp>:
Pauli Virtanen wrote:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8?
I would much prefer delaying this after 0.7 release. Unfortunately, I have nothing else to say, since I know nothing to optimization :)
Postponing a patch often leads to the author losing interest, and to the patch never being applied. I don't know if we can afford that. I'd be quite happy to see this going into 0.7 rc2 and trying it out for beta 1 (or whatever the naming scheme is), especially if we can fine-tune the tests a bit.
Well, I promise not to lose interest if this gets delayed until 0.8 :) The main reason for proposing this for 0.7.0 is that IMO having tolerance- based termination conditions for these methods makes them much more useful, especially as they give only NaNs out if you set iter (the fixed number of iterations they do) too high. But maybe it's best to play safe and ship 0.7.0 out first, and make sure the new nonlin.py is OK and well-tested in 0.8. -- Pauli Virtanen
Still no args input for inputting arguments to the function F? Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code. -gideon On Nov 29, 2008, at 2:40 PM, Pauli Virtanen wrote:
Hi all,
I spent some time rewriting the scipy.optimize.nonlin module:
http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py
The following things changed:
- Support tolerance-based stopping conditions (cf. ticket #791)
- Improved handling of input and return values from the functions, so that they are now easier to use.
- Don't use np.matrix at all.
- Jacobian approximations factored into classes; the iteration code is now in only one place, so trust-region handling or other improvements can be added to a single place later on.
- There's now a line search to the direction inverting the Jacobian gave. (But there's no checking that it is an actual descent direction for some merit function.)
- Rewrote docstrings.
The routines should produce the same output as previously. The tests are still not very strong, however.
But I have now some questions:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8?
* Some of the algorithms in there don't appear to work too well, and some appear to be redundant. I'd like to clean up this a bit, leaving only known-good stuff in.
* I'd like to remove `broyden2` as the actual Jacobian approximation in this appears to be the same as in `broyden3`, and there does not appear to be much difference in the work involved in the two.
Ondrej, since you wrote the original code, do you think there is a reason to keep both?
* `broyden1_modified` appears to be, in the end if you work out the matrix algebra, updating the inverse Jacobian in a way that corresponds to
J := J + (y - J s / 2) s^T / ||s||^2
for the Jacobian (with s = dx, y = df). Apart from the factor 1/2, it's Broyden's good method. [1] One can also verify that the updated inverse Jacobian does not satisfy the quasi-Newton condition s = J^{-1} y, and that `broyden1_modified` doesn't generate the same sequence as `broyden1`.
Hence, I'd like to remove this routine, unless there's some literature that shows that the above works better than Broyden's method; Ondrej, do you agree?
.. [1] http://en.wikipedia.org/wiki/Broyden%27s_method http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
* Also, which articles were used as reference for the non-Quasi-Newton algorithms:
- `broyden_modified`. This appears to be a bit exotic, and it uses several magic constants (`w0`, `wl`) whose meaning is not clear to me.
A reference would be helpful here, also for the user who has to choose the parameters as appropriate for his/her specific problem.
- `broyden_generalized`, `anderson`, `anderson2`. These appear to be variants of Anderson mixing, so probably we only want at most one of these. Also, broyden_generalized() == anderson(w0=0), am I correct?
`anderson` and `anderson2` don't appear to function equivalently, and I suspect the update formula in the latter is wrong, since this algorithm can't solve any of the test problems. Do you have a reference for this?
Is there a rule how `w0` should be chosen for some specific problem?
- `excitingmixing`. A reference would be useful, to clarify the heuristics used.
- `linearmixing`. I'm not sure that Scipy should advocate this method :)
Linearmixing and excitingmixing also seem to require something from the objective function, possibly that the eigenvalues of its Jacobian have the opposite sign than `alpha`. For example, neither of them can find a solution for the equation ``I x = 0`` where ``I`` is the identity matrix (test function F2). So, I'm a bit tempted to remove also these routines, as it seems they probably are not too useful for general problems.
* The code in there is still a bit naive implementation of the inexact (Quasi-)Newton method, and one could add eg. add trust-region handling or try to guarantee that the line search direction is always a decrease direction for a merit function. (I don't know if it's possible to do the latter unless one has a matrix representation of the Jacobian approximation.) So, I suspect there are problems for which eg. MINPACK code will find solutions, but for which the nonlin.py code fails.
* One could add more algorithms suitable for large-scale problems; for example some limited-memory Broyden methods (eg. [1]) or Secant- Krylov methods [2].
.. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps .. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods. Journal of Computational Physics, 20(2):357–397, 2004.
I have implementations for both types of algorithms that could possibly go in after some polishing.
-- Pauli Virtanen
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On Sun, Nov 30, 2008 at 4:08 PM, Gideon Simpson <simpson@math.toronto.edu>wrote:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
The method used for the 1d zero finders might be useful here. Chuck
2008/11/30 Gideon Simpson <simpson@math.toronto.edu>:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
Why? Instead of optimize.whatever(F, args=extra) just use optimize.whatever(lambda x: F(x,extra)) Anne
On Sun, Nov 30, 2008 at 7:59 PM, Anne Archibald <aarchiba@physics.mcgill.ca>wrote:
2008/11/30 Gideon Simpson <simpson@math.toronto.edu>:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
Why?
Instead of optimize.whatever(F, args=extra) just use optimize.whatever(lambda x: F(x,extra))
Yeah, that was my thought years ago for the zeros functions. I was told no, no, no. I'm not sure there is a good reason beyond what folks are used to. Chuck
Sun, 30 Nov 2008 21:59:32 -0500, Anne Archibald wrote:
2008/11/30 Gideon Simpson <simpson@math.toronto.edu>:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
Why?
Instead of optimize.whatever(F, args=extra) just use optimize.whatever(lambda x: F(x,extra))
Exactly; I was under the impression that this 'args' business was some legacy from times when Python didn't support nested scopes, cf. eg. http://www.python.org/doc/2.1/ref/lambda.html http://www.python.org/dev/peps/pep-0227/ I don't see a need for the 'args' option except maybe for conformance with the other optimization functions in Scipy. But maybe there are also other reasons to include it? -- Pauli Virtanen
2008/12/1 Pauli Virtanen <pav@iki.fi>
Sun, 30 Nov 2008 21:59:32 -0500, Anne Archibald wrote:
2008/11/30 Gideon Simpson <simpson@math.toronto.edu>:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
Why?
Instead of optimize.whatever(F, args=extra) just use optimize.whatever(lambda x: F(x,extra))
Exactly; I was under the impression that this 'args' business was some legacy from times when Python didn't support nested scopes, cf. eg.
http://www.python.org/doc/2.1/ref/lambda.html http://www.python.org/dev/peps/pep-0227/
I don't see a need for the 'args' option except maybe for conformance with the other optimization functions in Scipy. But maybe there are also other reasons to include it?
I have noticed these args not even working well in some cases in integrate ode method (due to the fortran wrapper code not handling them). Eg vode.pyf has double precision intent(hide) :: rpar While the ode class does contain: def set_f_params(self,*args): """Set extra parameters for user-supplied function f.""" self.f_params = args return self I tend to use class functions, which have access to other class methods/attributes, so optimize.whatever(problem.F) Benny
That's a neat trick. Is that innate to python or a part of scipy? -gideon On Nov 30, 2008, at 9:59 PM, Anne Archibald wrote:
2008/11/30 Gideon Simpson <simpson@math.toronto.edu>:
Still no args input for inputting arguments to the function F?
Sorry to complain, but the absence of this has put me off using these routines as it would require a rewrite of much of my code.
Why?
Instead of optimize.whatever(F, args=extra) just use optimize.whatever(lambda x: F(x,extra))
Anne _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On Mon, Dec 1, 2008 at 11:39, Gideon Simpson <simpson@math.toronto.edu> wrote:
That's a neat trick. Is that innate to python or a part of scipy?
It's a Python thing. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Hi Pauli and all! First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below. On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi all,
I spent some time rewriting the scipy.optimize.nonlin module:
http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py
The following things changed:
- Support tolerance-based stopping conditions (cf. ticket #791)
+1
- Improved handling of input and return values from the functions, so that they are now easier to use.
+1
- Don't use np.matrix at all.
+1
- Jacobian approximations factored into classes; the iteration code is now in only one place, so trust-region handling or other improvements can be added to a single place later on.
+1
- There's now a line search to the direction inverting the Jacobian gave. (But there's no checking that it is an actual descent direction for some merit function.)
+1 I would like this to test this on my codes how it behaves. If you tested on your codes and as long as one can turn this off (as one can), let's commit this and then improve.
- Rewrote docstrings.
in general +1, but I don't understand what all the implementations actually do. I suggest to also port my comments from the module docstring into the respective classes. I can help you with that, after we agree on other things below.
The routines should produce the same output as previously. The tests are still not very strong, however.
But I have now some questions:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late, and is it better to wait for 0.8?
Yes, I think it should go into 0.8.
* Some of the algorithms in there don't appear to work too well, and some appear to be redundant. I'd like to clean up this a bit, leaving only known-good stuff in.
+1
* I'd like to remove `broyden2` as the actual Jacobian approximation in this appears to be the same as in `broyden3`, and there does not appear to be much difference in the work involved in the two.
Ondrej, since you wrote the original code, do you think there is a reason to keep both?
I think it is, at least for tutorial purposes and also as an easy way to check that all is fine. Because there may be some numericall differences, also for a lot of iterations, the memory consumption of broyden3 is not limited (is it?). Broyden2 just stores the NxN dense matrix (that can of course be big), but that's it.
* `broyden1_modified` appears to be, in the end if you work out the matrix algebra, updating the inverse Jacobian in a way that corresponds to
J := J + (y - J s / 2) s^T / ||s||^2
for the Jacobian (with s = dx, y = df). Apart from the factor 1/2, it's Broyden's good method. [1] One can also verify that the updated inverse Jacobian does not satisfy the quasi-Newton condition s = J^{-1} y, and that `broyden1_modified` doesn't generate the same sequence as `broyden1`.
Hence, I'd like to remove this routine, unless there's some literature that shows that the above works better than Broyden's method; Ondrej, do you agree?
I agree.
.. [1] http://en.wikipedia.org/wiki/Broyden%27s_method http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
* Also, which articles were used as reference for the non-Quasi-Newton algorithms:
- `broyden_modified`. This appears to be a bit exotic, and it uses several magic constants (`w0`, `wl`) whose meaning is not clear to me.
A reference would be helpful here, also for the user who has to choose the parameters as appropriate for his/her specific problem.
You can use my master thesis: http://ondrej.certik.cz/site_media/cookbook/master.pdf pages 27-31. Everything is explained in there, plus references given to the original literature.
- `broyden_generalized`, `anderson`, `anderson2`. These appear to be variants of Anderson mixing, so probably we only want at most one of these. Also, broyden_generalized() == anderson(w0=0), am I correct?
Yes and no, see my master thesis for details and references. There are some subtle differences. But those methods have never worked for me well.
`anderson` and `anderson2` don't appear to function equivalently, and I suspect the update formula in the latter is wrong, since this algorithm can't solve any of the test problems. Do you have a reference for this?
Yes, see my master thesis, pages above, it's the reference [4].
Is there a rule how `w0` should be chosen for some specific problem?
Yes, page 31 of my master thesis. But broyden2 or 3, or excitingmixing works much better for my problems.
- `excitingmixing`. A reference would be useful, to clarify the heuristics used.
master thesis, equation 3.28. It's the mixing used in the code exciting: http://exciting.sourceforge.net/ I just noticed I forgot to put a reference to that code into my thesis. :(
- `linearmixing`. I'm not sure that Scipy should advocate this method :)
Please don't remove this. Sometimes it's the method that works the best in electronic structure calculations, because broyden2 and 3 sometimes fail. Also it's good to check that it converges to the same thing as broyden (or other) methods.
Linearmixing and excitingmixing also seem to require something from the objective function, possibly that the eigenvalues of its Jacobian have the opposite sign than `alpha`. For example, neither of them can find a solution for the equation ``I x = 0`` where ``I`` is the identity matrix (test function F2). So, I'm a bit tempted to remove also these routines, as it seems they probably are not too useful for general problems.
Please don't removed those routines, they are essential for electronic structure calculations, as they are very robust, fast and doesn't require any memory. It just works for many problems.
* The code in there is still a bit naive implementation of the inexact (Quasi-)Newton method, and one could add eg. add trust-region handling or try to guarantee that the line search direction is always a decrease direction for a merit function. (I don't know if it's possible to do the latter unless one has a matrix representation of the Jacobian approximation.) So, I suspect there are problems for which eg. MINPACK code will find solutions, but for which the nonlin.py code fails.
Yes, maybe they could be improved. nonlin.py is made for cases, where you cannot afford to store the full dense Jacobian NxN matrix in the memory.
* One could add more algorithms suitable for large-scale problems; for example some limited-memory Broyden methods (eg. [1])
I think in the references in my thesis, the broyden3 is sometimes called the limited-memory Broyden method. However, I checked the equations in [1] in your email and they need to do a singular-value-decomposition, so it seems there is yet another approach to this. So yes.
or Secant-Krylov methods [2].
Indeed, that would be very helpful.
.. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps .. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods. Journal of Computational Physics, 20(2):357–397, 2004.
I have implementations for both types of algorithms that could possibly go in after some polishing.
Excellent, looking forward! I'll test it on my problems in electronic structure if that works. Let me know what you think about my comments. Basically, scipy should have as many (working and well tested) algorithms as possible, with the same interface, so that one can change the algorithms and see which works for the particular problem. I am glad that you are also using those methods, so that we can work on this together. Ondrej
Hi Ondrej, Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
Quick comment (I'll post a more thorough reply later on). The "good" and "bad" Broyden's method are names referring to specific ways to update the Jacobian (at least these were the names I learned here in a univ. course), cf. also [1]; they do not really refer to goodness or badness of the methods, and definitely not to quality of implementation. (If you meant I had mislabeled one of these, please correct me.) .. [1] http://en.wikipedia.org/wiki/Broyden%27s_method Cheers, Pauli
On Mon, Dec 8, 2008 at 7:11 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi Ondrej,
Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
Quick comment (I'll post a more thorough reply later on). The "good" and "bad" Broyden's method are names referring to specific ways to update the Jacobian (at least these were the names I learned here in a univ. course), cf. also [1]; they do not really refer to goodness or badness of the methods, and definitely not to quality of implementation. (If you meant I had mislabeled one of these, please correct me.)
Ah, ok --- that wiki didn't exist yet when I wrote this. I only knew first Broyden method and a second Broyden method. Well, still I think it's weird to call something that works well by BadBrodyen, but if that's what people are used to, then ok. Do you have some good reference of this, besides wiki? And in any case, all of this should be explained in the docstrings. Ondrej
On Mon, Dec 8, 2008 at 12:42, Ondrej Certik <ondrej@certik.cz> wrote:
On Mon, Dec 8, 2008 at 7:11 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi Ondrej,
Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
Quick comment (I'll post a more thorough reply later on). The "good" and "bad" Broyden's method are names referring to specific ways to update the Jacobian (at least these were the names I learned here in a univ. course), cf. also [1]; they do not really refer to goodness or badness of the methods, and definitely not to quality of implementation. (If you meant I had mislabeled one of these, please correct me.)
Ah, ok --- that wiki didn't exist yet when I wrote this. I only knew first Broyden method and a second Broyden method. Well, still I think it's weird to call something that works well by BadBrodyen, but if that's what people are used to, then ok. Do you have some good reference of this, besides wiki?
And in any case, all of this should be explained in the docstrings.
If there's an alternate set of names, I would suggest going with those. "Bad" and "Good" are simply going to confuse people. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Mon, Dec 8, 2008 at 8:38 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Dec 8, 2008 at 12:42, Ondrej Certik <ondrej@certik.cz> wrote:
On Mon, Dec 8, 2008 at 7:11 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi Ondrej,
Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
Quick comment (I'll post a more thorough reply later on). The "good" and "bad" Broyden's method are names referring to specific ways to update the Jacobian (at least these were the names I learned here in a univ. course), cf. also [1]; they do not really refer to goodness or badness of the methods, and definitely not to quality of implementation. (If you meant I had mislabeled one of these, please correct me.)
Ah, ok --- that wiki didn't exist yet when I wrote this. I only knew first Broyden method and a second Broyden method. Well, still I think it's weird to call something that works well by BadBrodyen, but if that's what people are used to, then ok. Do you have some good reference of this, besides wiki?
And in any case, all of this should be explained in the docstrings.
If there's an alternate set of names, I would suggest going with those. "Bad" and "Good" are simply going to confuse people.
In fact, I would suggest to use or even invent alternate names, rather than to use Bad and Good, it's really confusing. Ondrej
On Tue, Dec 9, 2008 at 4:51 AM, Ondrej Certik <ondrej@certik.cz> wrote:
On Mon, Dec 8, 2008 at 8:38 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Dec 8, 2008 at 12:42, Ondrej Certik <ondrej@certik.cz> wrote:
On Mon, Dec 8, 2008 at 7:11 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi Ondrej,
Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
Quick comment (I'll post a more thorough reply later on). The "good" and "bad" Broyden's method are names referring to specific ways to update the Jacobian (at least these were the names I learned here in a univ. course), cf. also [1]; they do not really refer to goodness or badness of the methods, and definitely not to quality of implementation. (If you meant I had mislabeled one of these, please correct me.)
Ah, ok --- that wiki didn't exist yet when I wrote this. I only knew first Broyden method and a second Broyden method. Well, still I think it's weird to call something that works well by BadBrodyen, but if that's what people are used to, then ok. Do you have some good reference of this, besides wiki?
And in any case, all of this should be explained in the docstrings.
If there's an alternate set of names, I would suggest going with those. "Bad" and "Good" are simply going to confuse people.
In fact, I would suggest to use or even invent alternate names, rather than to use Bad and Good, it's really confusing.
This page seems to suggest that "bad Broyden" is used to refer to the inverse update using Sherman Morrison, which is what the Wikipedia calls "good Broyden". http://www.math.buffalo.edu/~pitman/courses/mth437/na3/node3.html There's also a paper called "On The Discovery of the "Good Broyden" method by Broyden himself which I presume would be the final authority on which method is which: http://www.springerlink.com/content/cvvygkeyv8blm96r/ (Needs Springer access to download ... which I don't have.) --bb
This page seems to suggest that "bad Broyden" is used to refer to the inverse update using Sherman Morrison, which is what the Wikipedia calls "good Broyden". http://www.math.buffalo.edu/~pitman/courses/mth437/na3/node3.html
There's also a paper called "On The Discovery of the "Good Broyden" method by Broyden himself which I presume would be the final authority on which method is which: http://www.springerlink.com/content/cvvygkeyv8blm96r/ (Needs Springer access to download ... which I don't have.)
Bill, thanks for the tip. I sent you the article offlist. Pauli --- do you have any comments to the discussion? After we agree on a better naming, I'll then try to use your branch at github and help you polish it so that both my and your codes run well and then we can push it to scipy. What do you think? Ondrej
Mon, 15 Dec 2008 16:19:29 +0100, Ondrej Certik wrote: [clip]
Pauli --- do you have any comments to the discussion? After we agree on a better naming, I'll then try to use your branch at github and help you polish it so that both my and your codes run well and then we can push it to scipy. What do you think?
I'm not actually sure how this goes. Is Scipy trunk frozen until 0.7.0? At least there's no branch for 0.7, only the beta1 tag, so I'd assume the trunk is frozen for the moment. Anyway, for the time being github will do, though I'm not sure how well git-svn with all its frequent rebasing will work in collaboration. We can of course hold off rebasing the branch (and only pulling) until the code gets in Scipy trunk, if git starts to get in the way. Btw, I'm not sure how much I manage to work on this during the next few weeks, it's holidays soon, and I probably don't do much coding then. Unless I get extremely bored :) Anyway, we're not in a hurry with this. Cheers, Pauli
On Wed, Dec 17, 2008 at 1:06 PM, Pauli Virtanen <pav@iki.fi> wrote:
I'm not actually sure how this goes. Is Scipy trunk frozen until 0.7.0? At least there's no branch for 0.7, only the beta1 tag, so I'd assume the trunk is frozen for the moment.
Yes, the trunk is "frozen' until we branch for 0.70. The only thing that should be committed to the trunk at this point is documentation improvements, new tests, or essential or trivial bug-fixes that have been discussed on the list. I hope that we can resolve the remaining issues relating to NR licensing by the end of the weekend. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/
Wed, 17 Dec 2008 13:28:26 -0800, Jarrod Millman wrote: [clip]
Yes, the trunk is "frozen' until we branch for 0.70. The only thing that should be committed to the trunk at this point is documentation improvements, new tests, or essential or trivial bug-fixes that have been discussed on the list.
Ok, I'll explain then also this: r5268 In the interp1d n-d cleanup I by mistake broke linear interpolation of complex values. This fixes it. 1D spline interpolation has never worked for complex-valued data, due to _fitpack.bisplev being written in C, so I changed interp1d to raise error in __init__ rather than later on in __call__. Tests included, naturally. -- Pauli Virtanen
On Wed, Dec 17, 2008 at 1:39 PM, Pauli Virtanen <pav@iki.fi> wrote:
Ok, I'll explain then also this: r5268
In the interp1d n-d cleanup I by mistake broke linear interpolation of complex values. This fixes it.
1D spline interpolation has never worked for complex-valued data, due to _fitpack.bisplev being written in C, so I changed interp1d to raise error in __init__ rather than later on in __call__.
Tests included, naturally.
Sounds good. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/
On Wed, Dec 17, 2008 at 10:39 PM, Pauli Virtanen <pav@iki.fi> wrote:
1D spline interpolation has never worked for complex-valued data, due to _fitpack.bisplev being written in C, [...]
Could this be addressed by calling bisplev twice, once for the real part and once for the imaginary part?
Pauli Virtanen
-- Tom Grydeland <Tom.Grydeland@(gmail.com)>
Thu, 18 Dec 2008 08:36:17 +0100, Tom Grydeland wrote:
On Wed, Dec 17, 2008 at 10:39 PM, Pauli Virtanen <pav@iki.fi> wrote:
1D spline interpolation has never worked for complex-valued data, due to _fitpack.bisplev being written in C, [...]
Could this be addressed by calling bisplev twice, once for the real part and once for the imaginary part?
Excellent point, it's linear in the data, so we can do this. Committed as r5275, with tests. This is a rather trivial change. -- Pauli Virtanen
Hi, Sorry for the delay -- busy times... First, a short summary of what I think it would be good to: - Remove broyden1_modified - Remove broyden_generalized; the code is identical to anderson with w0=0. - Remove broyden_modified; if I understand Eyert [1] correctly, this method is expected to be always inferior to (extended) Anderson mixing. So I don't see a real need for retaining this in Scipy. Also, I don't know how to check that it works correctly. - Remove anderson2, because it does not work for any of the test problems. Since `anderson` seems to work, I don't think it's wise to try to get this second implementation up to speed. - Change broyden1 to update the inverse Jacobian using Sherman-Morrison. No need for inverting J on every iteration. - Convert broyden3 to a simple limited-memory method. Rename it, too. - Change the API so that there's a single nonlinear solver function, which takes an option that determines which Jacobian approximation to use. - Maybe add a reverse-communication interface, see below. .. [1] V. Eyert, J. Comp. Phys, 124, 271 (1996). Here comes the rest: Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav@iki.fi> wrote: [clip] First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
This is probably not going to be in 0.7.0, even though the release was postponed. The API might still need a bit of thinking, and it's probably best to test this thoroughly. The difference in Broyden's good/first and bad/second method is to my understanding the one mentioned in Wikipedia: both methods can be written as updates to the inverse Jacobian because the update is rank-1 so that the Sherman-Morrison formula applies. The difference can be seen if one looks at the domains (right-hand vector) of the rank-1 inverse updates: Broyden's "good" method uses J_{n-1}^{-T} dx, whereas the "bad" method uses df. These two vectors are not equal, even though both updates satisfy the quasi-Newton condition. [clip]
- Rewrote docstrings.
in general +1, but I don't understand what all the implementations actually do. I suggest to also port my comments from the module docstring into the respective classes. I can help you with that, after we agree on other things below.
I'm not sure how much we can say about relative efficiency of the various methods. We can cite some references on this, but I'd rather not say much based on a couple of test problems. [clip]
* I'd like to remove `broyden2` as the actual Jacobian approximation in this appears to be the same as in `broyden3`, and there does not appear to be much difference in the work involved in the two.
Ondrej, since you wrote the original code, do you think there is a reason to keep both?
I think it is, at least for tutorial purposes and also as an easy way to check that all is fine. Because there may be some numericall differences, also for a lot of iterations, the memory consumption of broyden3 is not limited (is it?). Broyden2 just stores the NxN dense matrix (that can of course be big), but that's it.
Well, I think having two implementations of the same method does not really make sense, and scipy.optimize should contain only production-quality code. But yes, the memory consumption of broyden3 and broyden2 are not exactly the same, although neither is a real limited-memory method. See below. [clip]
* Also, which articles were used as reference for the non-Quasi-Newton algorithms:
- `broyden_modified`. This appears to be a bit exotic, and it uses several magic constants (`w0`, `wl`) whose meaning is not clear to me.
A reference would be helpful here, also for the user who has to choose the parameters as appropriate for his/her specific problem.
You can use my master thesis:
http://ondrej.certik.cz/site_media/cookbook/master.pdf
pages 27-31. Everything is explained in there, plus references given to the original literature.
Looking at Eyert's paper, it seems to me that there's no need to retain this method.
- `broyden_generalized`, `anderson`, `anderson2`. These appear to be variants of Anderson mixing, so probably we only want at most one of these. Also, broyden_generalized() == anderson(w0=0), am I correct?
Yes and no, see my master thesis for details and references. There are some subtle differences. But those methods have never worked for me well.
`anderson` and `anderson2` don't appear to function equivalently, and I suspect the update formula in the latter is wrong, since this algorithm can't solve any of the test problems. Do you have a reference for this?
Yes, see my master thesis, pages above, it's the reference [4].
Ok, anderson2 and anderson should be exactly equivalent. This probably means that there's a bug in anderson2, or maybe the default parameters are somehow strange. Of the tree, I'd retain only `anderson`. [clip]
- `excitingmixing`. A reference would be useful, to clarify the heuristics used. [clip] - `linearmixing`. I'm not sure that Scipy should advocate this method :) [clip] Linearmixing and excitingmixing also seem to require something from the objective function, possibly that the eigenvalues of its Jacobian have the opposite sign than `alpha`. For example, neither of them can find a solution for the equation ``I x = 0`` where ``I`` is the identity matrix (test function F2). So, I'm a bit tempted to remove also these routines, as it seems they probably are not too useful for general problems.
Please don't removed those routines, they are essential for electronic structure calculations, as they are very robust, fast and doesn't require any memory. It just works for many problems.
These are pretty straightforward methods, so maybe they can stay. But the docstrings should clearly say that these are intended only for specialized problems. I wonder if electronic structure (DFT) calculations are a bit of a special case. Ie. are the eigenvalues of the Jacobian of the Kohn-Sham functional F[n] all negative (or less than 1)? [clip]
* One could add more algorithms suitable for large-scale problems; for example some limited-memory Broyden methods (eg. [1])
I think in the references in my thesis, the broyden3 is sometimes called the limited-memory Broyden method. However, I checked the equations in [1] in your email and they need to do a singular-value-decomposition, so it seems there is yet another approach to this. So yes.
It's a part of a limited-memory method, but not the whole story, because the present implementation does not actually throw any data away. So, it's an "unlimited memory" Broyden method :) But it would be very easy (= 3 lines of code) to make it a simple limited-memory method, just by throwing away iterates older than some M. The singular value reduction can be implemented later. There are probably many different schemes proposed in the literature how the reduction can be done... [clip]
Let me know what you think about my comments. Basically, scipy should have as many (working and well tested) algorithms as possible, with the same interface, so that one can change the algorithms and see which works for the particular problem.
Yes, but this must be balanced against maintenance + testing costs. I want to be sure that the code I contribute works, and I wouldn't like to spend too much time working on unproven methods. So I'd rather include only methods for which there is reason to believe that they work well.
I am glad that you are also using those methods, so that we can work on this together.
It'd be very useful if you could test this code, including the line search, on a real large-scale problem. At present, I'm not sure if it makes too many iterations compared to its payoff. Also, we might need to implement a reverse communication interface, something on the lines of iterator = Solver(x0) for x in iterator: iterator.feed(func(x)) if iterator.converged(): break else: raise RuntimeError("Didn't converge") For expensive self-consistent iterations, this often feels the natural way to use the nonlinear solver. -- Pauli Virtanen
On Wed, Dec 17, 2008 at 9:55 PM, Pauli Virtanen <pav@iki.fi> wrote:
Hi,
Sorry for the delay -- busy times...
First, a short summary of what I think it would be good to:
- Remove broyden1_modified
- Remove broyden_generalized; the code is identical to anderson with w0=0.
- Remove broyden_modified; if I understand Eyert [1] correctly, this method is expected to be always inferior to (extended) Anderson mixing. So I don't see a real need for retaining this in Scipy. Also, I don't know how to check that it works correctly.
- Remove anderson2, because it does not work for any of the test problems. Since `anderson` seems to work, I don't think it's wise to try to get this second implementation up to speed.
- Change broyden1 to update the inverse Jacobian using Sherman-Morrison. No need for inverting J on every iteration.
- Convert broyden3 to a simple limited-memory method. Rename it, too.
- Change the API so that there's a single nonlinear solver function, which takes an option that determines which Jacobian approximation to use.
- Maybe add a reverse-communication interface, see below.
.. [1] V. Eyert, J. Comp. Phys, 124, 271 (1996).
Here comes the rest:
Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav@iki.fi> wrote: [clip] First let me apologize for taking me so long to reply. I wrote this code in the first place and I am happy that Pauli has rewritten it. I agree with the general direction, but I think this change should not go into 0.7.0, as it changes the interface and it is not well tested yet. Also, you renamed all the working broyden implementations that I use as BadBroyden, so I suggest to name them GoodBroyden, more below.
This is probably not going to be in 0.7.0, even though the release was postponed. The API might still need a bit of thinking, and it's probably best to test this thoroughly.
The difference in Broyden's good/first and bad/second method is to my understanding the one mentioned in Wikipedia: both methods can be written as updates to the inverse Jacobian because the update is rank-1 so that the Sherman-Morrison formula applies. The difference can be seen if one looks at the domains (right-hand vector) of the rank-1 inverse updates: Broyden's "good" method uses J_{n-1}^{-T} dx, whereas the "bad" method uses df. These two vectors are not equal, even though both updates satisfy the quasi-Newton condition.
[clip]
- Rewrote docstrings.
in general +1, but I don't understand what all the implementations actually do. I suggest to also port my comments from the module docstring into the respective classes. I can help you with that, after we agree on other things below.
I'm not sure how much we can say about relative efficiency of the various methods. We can cite some references on this, but I'd rather not say much based on a couple of test problems.
[clip]
* I'd like to remove `broyden2` as the actual Jacobian approximation in this appears to be the same as in `broyden3`, and there does not appear to be much difference in the work involved in the two.
Ondrej, since you wrote the original code, do you think there is a reason to keep both?
I think it is, at least for tutorial purposes and also as an easy way to check that all is fine. Because there may be some numericall differences, also for a lot of iterations, the memory consumption of broyden3 is not limited (is it?). Broyden2 just stores the NxN dense matrix (that can of course be big), but that's it.
Well, I think having two implementations of the same method does not really make sense, and scipy.optimize should contain only production-quality code.
But yes, the memory consumption of broyden3 and broyden2 are not exactly the same, although neither is a real limited-memory method. See below.
[clip]
* Also, which articles were used as reference for the non-Quasi-Newton algorithms:
- `broyden_modified`. This appears to be a bit exotic, and it uses several magic constants (`w0`, `wl`) whose meaning is not clear to me.
A reference would be helpful here, also for the user who has to choose the parameters as appropriate for his/her specific problem.
You can use my master thesis:
http://ondrej.certik.cz/site_media/cookbook/master.pdf
pages 27-31. Everything is explained in there, plus references given to the original literature.
Looking at Eyert's paper, it seems to me that there's no need to retain this method.
- `broyden_generalized`, `anderson`, `anderson2`. These appear to be variants of Anderson mixing, so probably we only want at most one of these. Also, broyden_generalized() == anderson(w0=0), am I correct?
Yes and no, see my master thesis for details and references. There are some subtle differences. But those methods have never worked for me well.
`anderson` and `anderson2` don't appear to function equivalently, and I suspect the update formula in the latter is wrong, since this algorithm can't solve any of the test problems. Do you have a reference for this?
Yes, see my master thesis, pages above, it's the reference [4].
Ok, anderson2 and anderson should be exactly equivalent. This probably means that there's a bug in anderson2, or maybe the default parameters are somehow strange. Of the tree, I'd retain only `anderson`.
[clip]
- `excitingmixing`. A reference would be useful, to clarify the heuristics used. [clip] - `linearmixing`. I'm not sure that Scipy should advocate this method :) [clip] Linearmixing and excitingmixing also seem to require something from the objective function, possibly that the eigenvalues of its Jacobian have the opposite sign than `alpha`. For example, neither of them can find a solution for the equation ``I x = 0`` where ``I`` is the identity matrix (test function F2). So, I'm a bit tempted to remove also these routines, as it seems they probably are not too useful for general problems.
Please don't removed those routines, they are essential for electronic structure calculations, as they are very robust, fast and doesn't require any memory. It just works for many problems.
These are pretty straightforward methods, so maybe they can stay. But the docstrings should clearly say that these are intended only for specialized problems.
I wonder if electronic structure (DFT) calculations are a bit of a special case. Ie. are the eigenvalues of the Jacobian of the Kohn-Sham functional F[n] all negative (or less than 1)?
[clip]
* One could add more algorithms suitable for large-scale problems; for example some limited-memory Broyden methods (eg. [1])
I think in the references in my thesis, the broyden3 is sometimes called the limited-memory Broyden method. However, I checked the equations in [1] in your email and they need to do a singular-value-decomposition, so it seems there is yet another approach to this. So yes.
It's a part of a limited-memory method, but not the whole story, because the present implementation does not actually throw any data away. So, it's an "unlimited memory" Broyden method :)
But it would be very easy (= 3 lines of code) to make it a simple limited-memory method, just by throwing away iterates older than some M. The singular value reduction can be implemented later. There are probably many different schemes proposed in the literature how the reduction can be done...
[clip]
Let me know what you think about my comments. Basically, scipy should have as many (working and well tested) algorithms as possible, with the same interface, so that one can change the algorithms and see which works for the particular problem.
Yes, but this must be balanced against maintenance + testing costs. I want to be sure that the code I contribute works, and I wouldn't like to spend too much time working on unproven methods. So I'd rather include only methods for which there is reason to believe that they work well.
Yes, the only thing that I really care is that the methods that are known to work well for me are there. And I am willing to maintain them.
I am glad that you are also using those methods, so that we can work on this together.
It'd be very useful if you could test this code, including the line search, on a real large-scale problem. At present, I'm not sure if it makes too many iterations compared to its payoff.
Also, we might need to implement a reverse communication interface, something on the lines of
iterator = Solver(x0) for x in iterator: iterator.feed(func(x)) if iterator.converged(): break else: raise RuntimeError("Didn't converge")
For expensive self-consistent iterations, this often feels the natural way to use the nonlinear solver.
Ok, I'll test it. So far I only have a one atomic solver, so that's not a large scale problem, but if I put for example 20000 points in the radial mesh, then I can test the nonlin methods pretty well (e.g. Jacobian being 20000x20000 if we stored it fully, e.g. something like 1.5GB). Right now I am working on a 2D solver and here it will show clearly which method works the best. You can follow my progress here: http://spilka.math.unr.edu/projects/hermes2d/wiki/Projects/SchroedingerEquat... So since you'll be busy in the next couple weeks, I think I'll know soon which methods I need and which not. Thanks for all the comments above -- I'll think about it and probably ask you soon to help me implement or at least test some of those methods. :) Thanks, Ondrej
The difference in Broyden's good/first and bad/second method is to my understanding the one mentioned in Wikipedia: both methods can be written as updates to the inverse Jacobian because the update is rank-1 so that the Sherman-Morrison formula applies. The difference can be seen if one looks at the domains (right-hand vector) of the rank-1 inverse updates: Broyden's "good" method uses J_{n-1}^{-T} dx, whereas the "bad" method uses df. These two vectors are not equal, even though both updates satisfy the quasi-Newton condition.
Here's what it says in that paper "On the discovery of the 'good Broyden' method" by Broyden B[i] = B[i-1] + (y[i-1] - B[i-1]s[i-1]) * ( s[i-1].T / ( s[i-1].T * s[i-1] )) """ The formula is referred to as "good" due to its better numerical performance relative to another formula that I also presented in [3], which has become to be known as the "bad Broyden" update. See [5] for a discussion of these two updates and their relations to the DFP and BFGS updates. """ Selected references: [3] Broyden, C.G. (1965): A class of methods for solving nonlinear simultaneous equations. Math. Comp. 19, 577–593 [5] Dennis, J.E., Jr., Schnabel, R.B. (1981): A new derivation of symmetric positive definite secant updates. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M., eds., Nonlinear Programming 4, 1980. Academic Press, New York, NY, 167–199 --bb
participants (13)
-
Anne Archibald -
Benny Malengier -
Bill Baxter -
Charles R Harris -
David Cournapeau -
David Cournapeau -
Gideon Simpson -
Jarrod Millman -
Ondrej Certik -
Pauli Virtanen -
Robert Kern -
Stéfan van der Walt -
Tom Grydeland