tol, atol and rtol behavior of scipy.sparse.linalg solvers
Dear all, Yet another mail about upcoming changes, so apologies for the seemingly repeating message lately. In [1], we started to remove all the FORTRAN code and rewrote all solvers in pure Python for various reasons too long to enumerate here. As a long waiting issue we want to also address the suboptimal state of providing tolerances for these solvers. As I wrote in [2], I'd like to mention it again to ask for feedback in case anyone has better ideas Currently, we have the following situation; We have a tol and a strange looking atol keywords. What internally happens is that all the inputs are now passed to a helper function called get_atol https://github.com/scipy/scipy/blob/maintenance/1.10.x/scipy/sparse/linalg/_... 1. User didn't provide atol then it first maps to None and that causes a warning and internally atol becomes legacy. This means: a. tol value is checked against the residual norm Ax - b. Note that matvec operation is costly. If the residual norm is already less than tol we exit. b. Then b norm is checked against 0. Since tol*norm(b) won't do any good we return tol c. And the typical case is to return tol*norm(b). 2. User passed legacy all the same as above except the warning. 3. If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand. Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read. What I propose is the following, 1. Make default atol=0. 2. If atol is set to string, fire a warning saying this is going away in version 1.13 (hence make the current deprecation on a definite schedule and assuming I can make it to 1.11 release). These are the users probably wanted to keep old behavior. Document how it can be done to match old behavior via atol, rtol. 3. Introduce a new keyword rtol and map current tol to rtol. Set tol=None as the default to weed out the ones who actually set tol value and fire deprecation warnings for them to use rtol. 4. Use the formula tol = atol + rtol*norm(b) All feedback is welcome. [1] : https://github.com/scipy/scipy/pull/18391 [2] : https://github.com/scipy/scipy/issues/15738
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If `atol` is actually set to something, then we return `max(atol, tol*norm(b))`. Why `max` is used I don't understand. Typically, `atol`, `rtol` pair is used as `tol = atol + rtol*<some metric>`. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard? Stéfan
On Mon, May 1, 2023 at 7:22 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
I concur. `max()` is the appropriate way to combine the two that I'd expect to see in almost all circumstances. `+` does have a history in our ecosystem as we do use `+` in `np.isclose()`, though, but I think that is a bit of a special case of comparing two numbers rather than the size of one number, and I've never been really convinced by the arguments for it. `math.isclose()` works like `max()` (though it is implemented with explicit `||` operations). Julia's iterative solvers use `max()`. https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl/blob/master/src/cg... MATLAB's numerical integration routines use `max()`. https://www.mathworks.com/help/matlab/ref/integral.html#btc_m8o-4 -- Robert Kern
On Mon, May 1, 2023 at 8:17 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 7:22 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
I concur. `max()` is the appropriate way to combine the two that I'd expect to see in almost all circumstances. `+` does have a history in our ecosystem as we do use `+` in `np.isclose()`, though, but I think that is a bit of a special case of comparing two numbers rather than the size of one number, and I've never been really convinced by the arguments for it. `math.isclose()` works like `max()` (though it is implemented with explicit `||` operations).
Julia's iterative solvers use `max()`.
https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl/blob/master/src/cg...
MATLAB's numerical integration routines use `max()`.
https://www.mathworks.com/help/matlab/ref/integral.html#btc_m8o-4
That said, `+` is not at all uncommon outside of our own ecosystem either. Julia's Krylov solvers use it: https://docs.juliahub.com/Krylov/0fcC3/0.5.5/solvers/ Here's some numerical analysis course that uses it: http://web.mit.edu/10.001/Web/Tips/Converge.htm I've never really seen any *comparison* between the two outside of some half-remembered arguments over `np.isclose()` early in development. I think `max()` is the most justified and interpretable. It's just a way to code the ORing of the two conditions. If either the absolute tolerance is met OR the relative tolerance is met, then stop. With `+`, it's a weird mix of both that's close to the `max()` result in most sane circumstances, but diverges in some cases. When the user chooses really bad choices for the two tolerances, neither will work particularly well, but `max()` is easier to reason about why it doesn't work well in that case. I strongly suspect `+` gets its lineage either from C codes or pre-F77 codes where there was no builtin `max()` to use. -- Robert Kern
Interesting. I have different exposure from other fields for "composing" the combined tolerance value. There is no right way since both can be used to arrive at the same value with the same input but with different mental model. And we don't need to go into max-plus algebra , it's too late here for that :) But it must be an intuitive way since you can define arbitrary metrics. So if both of you feel that user should expect max then nevermind max() it is. On Tue, May 2, 2023 at 2:45 AM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 8:17 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 7:22 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
I concur. `max()` is the appropriate way to combine the two that I'd expect to see in almost all circumstances. `+` does have a history in our ecosystem as we do use `+` in `np.isclose()`, though, but I think that is a bit of a special case of comparing two numbers rather than the size of one number, and I've never been really convinced by the arguments for it. `math.isclose()` works like `max()` (though it is implemented with explicit `||` operations).
Julia's iterative solvers use `max()`.
https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl/blob/master/src/cg...
MATLAB's numerical integration routines use `max()`.
https://www.mathworks.com/help/matlab/ref/integral.html#btc_m8o-4
That said, `+` is not at all uncommon outside of our own ecosystem either. Julia's Krylov solvers use it:
https://docs.juliahub.com/Krylov/0fcC3/0.5.5/solvers/
Here's some numerical analysis course that uses it:
http://web.mit.edu/10.001/Web/Tips/Converge.htm
I've never really seen any *comparison* between the two outside of some half-remembered arguments over `np.isclose()` early in development. I think `max()` is the most justified and interpretable. It's just a way to code the ORing of the two conditions. If either the absolute tolerance is met OR the relative tolerance is met, then stop. With `+`, it's a weird mix of both that's close to the `max()` result in most sane circumstances, but diverges in some cases. When the user chooses really bad choices for the two tolerances, neither will work particularly well, but `max()` is easier to reason about why it doesn't work well in that case.
I strongly suspect `+` gets its lineage either from C codes or pre-F77 codes where there was no builtin `max()` to use.
-- Robert Kern
On Mon, May 1, 2023, at 17:44, Robert Kern wrote:
Here's some numerical analysis course that uses it:
This link also shows why a small atol is useful to guarantee convergence for |x| close to 0. You lose that feature when you set atol = 0, and while it probably depends on the application whether that's desirable or not, I bet that in most cases you'd either want it or it would do no harm. Stéfan
On Mon, May 1, 2023 at 5:21 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
IIRC, I first saw it in *Numerical Recipes* (1986). Chuck
On Mon, May 1, 2023 at 9:04 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 1, 2023 at 5:21 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
IIRC, I first saw it in *Numerical Recipes* (1986).
To be fair, I *think* it ultimately derives from a principled combination of sources of numerical error. If your inputs have absolute roundoff error (i.e. rounded to the nearest 0.0001) and the floating point computation contributes `rtol` amount of relative roundoff error during the computation (usually something like `n_flops * eps`), then the total error should be about the sum of the two. In general, I still lean towards `max()`. The principled application of the sum depends on knowing the details of the computation and the data sources to properly scale each of the tolerances. I'm _pretty_ okay with understanding the details of these things, but in reality, I'll just stick with the defaults and tweak up or down when it doesn't work. And in that scenario, `max()` is more interpretable to me. -- Robert Kern
This has turned out to be a really nice teaching moment for me. I was completely unaware of how much max() prevalent is. I guess spent my time sitting in a bubble about sum() in my own circles. In fact that's the reason why I kept using atol in assert_allclose with rtol = 0., in my faulty line of thinking in hindsight, for better error control. In the numerical integration schemes, you basically want to keep rtol for the evolution, that is, from iteration to iteration to track the approach, hence the name relative and typically atol for the landing. But I can now see how max() is easier to reason about the 2-norm based control. I might indeed vote for max() but to be honest I'm more undecided now than yesterday. On Tue, May 2, 2023 at 9:04 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 9:04 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 1, 2023 at 5:21 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
IIRC, I first saw it in *Numerical Recipes* (1986).
To be fair, I *think* it ultimately derives from a principled combination of sources of numerical error. If your inputs have absolute roundoff error (i.e. rounded to the nearest 0.0001) and the floating point computation contributes `rtol` amount of relative roundoff error during the computation (usually something like `n_flops * eps`), then the total error should be about the sum of the two.
In general, I still lean towards `max()`. The principled application of the sum depends on knowing the details of the computation and the data sources to properly scale each of the tolerances. I'm _pretty_ okay with understanding the details of these things, but in reality, I'll just stick with the defaults and tweak up or down when it doesn't work. And in that scenario, `max()` is more interpretable to me.
-- Robert Kern _______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: ilhanpolat@gmail.com
On Tue, May 2, 2023 at 1:06 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 9:04 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, May 1, 2023 at 5:21 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
IIRC, I first saw it in *Numerical Recipes* (1986).
To be fair, I *think* it ultimately derives from a principled combination of sources of numerical error. If your inputs have absolute roundoff error (i.e. rounded to the nearest 0.0001) and the floating point computation contributes `rtol` amount of relative roundoff error during the computation (usually something like `n_flops * eps`), then the total error should be about the sum of the two.
In general, I still lean towards `max()`. The principled application of the sum depends on knowing the details of the computation and the data sources to properly scale each of the tolerances. I'm _pretty_ okay with understanding the details of these things, but in reality, I'll just stick with the defaults and tweak up or down when it doesn't work. And in that scenario, `max()` is more interpretable to me.
I agree with this. In practice, the choice doesn't matter much. The sum version has the advantage of being smoother, but I suspect it was just an easy way to get something useful. Whether one or the other is easier to reason about depends on how much time you want to spend reasoning about it :) I will note that a lot of the early numerical stuff was either in Algol 68 (died) or Fortran, C wasn't a suitable language, not least because it only had doubles at a time that single precision floats were about 4x faster, at least on the popular VAX series, and the VAX file system was record based (Fortran) rather than byte stream based. Of course, there were a lot of different float versions before the IEEE standard came out. Chuck
Just reaching out to give an update on the progress. The tolerance plan (with max() as the decision) is implemented in the PR (still in draft state). I am fixing a few final bugs in GMRES but no blockers. I would really appreciate you give it a go and engage in the PR. I know about these solvers from the past but I don't use them in production since quite a long time hence I am wondering if they are usable. GMRES has a mind of its own due to many legacy behaviors piled up and I tried to bring along as much as I can but it is a hot mess and I might have introduced some bugs though existing tests pass. Anyways, all feedback is most welcome. On Wed, May 3, 2023 at 3:40 AM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, May 2, 2023 at 1:06 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, May 1, 2023 at 9:04 PM Charles R Harris < charlesr.harris@gmail.com> wrote:
On Mon, May 1, 2023 at 5:21 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 1, 2023, at 11:49, Ilhan Polat wrote:
If atol is actually set to something, then we return max(atol, tol*norm(b)). Why max is used I don't understand.
Typically, atol, rtol pair is used as tol = atol + rtol*<some metric>. This is more or less what everybody expects from this pair and their naming is chosen to reflect this. But I'm a bit lost in all the issues I could read.
The first formulation — max(atol, rtol*norm(b)) — is what I'd expect. I.e., you use atol when answers are small, or rtol for larger answers. I'm a bit surprised to see them summed; is this standard?
IIRC, I first saw it in *Numerical Recipes* (1986).
To be fair, I *think* it ultimately derives from a principled combination of sources of numerical error. If your inputs have absolute roundoff error (i.e. rounded to the nearest 0.0001) and the floating point computation contributes `rtol` amount of relative roundoff error during the computation (usually something like `n_flops * eps`), then the total error should be about the sum of the two.
In general, I still lean towards `max()`. The principled application of the sum depends on knowing the details of the computation and the data sources to properly scale each of the tolerances. I'm _pretty_ okay with understanding the details of these things, but in reality, I'll just stick with the defaults and tweak up or down when it doesn't work. And in that scenario, `max()` is more interpretable to me.
I agree with this. In practice, the choice doesn't matter much. The sum version has the advantage of being smoother, but I suspect it was just an easy way to get something useful. Whether one or the other is easier to reason about depends on how much time you want to spend reasoning about it :) I will note that a lot of the early numerical stuff was either in Algol 68 (died) or Fortran, C wasn't a suitable language, not least because it only had doubles at a time that single precision floats were about 4x faster, at least on the popular VAX series, and the VAX file system was record based (Fortran) rather than byte stream based. Of course, there were a lot of different float versions before the IEEE standard came out.
Chuck _______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: ilhanpolat@gmail.com
participants (4)
-
Charles R Harris -
Ilhan Polat -
Robert Kern -
Stefan van der Walt