Ralf Gommers wrote:
Hi Andrew, Thanks for bringing this up. On Wed, Jul 26, 2023 at 8:04 PM Knyazev, Andrew Andrew.Knyazev@ucdenver.edu wrote:
For iterative solvers, performance is determined by a total execution time, which is the time per iteration times the number of iterations needed to get the requested accuracy. The present benchmark does not check the accuracy, so fails this goal too, i.e., is quite useless and misleading. The property that for a known problem a code of an iterative solver gives a guaranteed accuracy in a known fixed number of iterations is desired but not given and must be checked and rechecked with every chance of the code. Measuring total runtime is for the benchmark suite, while measured whether a given absolute/relative tolerance level is achieved or a solution is obtained at all in <=N iterations is for the test suite. We have quite a few test cases like that, you can just assert there with something like `solution.n_iter <= 25`. The test suite is run way more often and on a much wider range of platforms, so it is a much better place for that type of check. Cheers, Ralf
Ralf, thanks for your quick response! I agree with you in principle, of course. However, the specific benchmark I am editing https://github.com/scipy/scipy/pull/18954/ (search ASV for "sakurai" and "mikota") has its peculiarities that make the standard approach impractical: - One of the solvers under the benchmark comparison, eigsh, has an internal check for convergence and fails if no convergence. The other, lobpcg, only gives a warning if the accuracy is not reached but is designed to never give an error. The third solver is dense so supposedly just never fails. An external assert is needed to make these solvers comparable in a benchmark. - Some of the benchmark tests take long time to run because the matrix size needs to be large enough to make the benchmark meaningful. These runs are simply too long for the test suite, so their accuracy cannot be tested there. - The actual benchmarks in this PR are currently tuned such that there are no actual failures with the current versions of the solvers benchmarked. Practically speaking, we have a rather philosophical discussion like "to be or not to be" on whether a single line assert that currently has no effect can be present in this benchmark in this specific PR or should not be allowed as a matter of principle.