Hi Hans,

On Mon, 11 Jan 2021 at 11:47, Hans Dembinski <hans.dembinski@gmail.com> wrote:
Hi Andrea,

> On 8. Jan 2021, at 10:20, Andrea Gavana <andrea.gavana@gmail.com> wrote:
>
>     long time no see :-) . I thought to start 2021 with a bit of a bang, to try and forget how bad 2020 has been... So I am happy to present you with a revamped version of the Global Optimization Benchmarks from my previous exercise in 2013.
>
> This new set of benchmarks pretty much superseeds - and greatly expands - the previous analysis that you can find at this location: http://infinity77.net/global_optimization/ .

thank you for sharing this. I was going to point out the "No Free Lunch Theorem" but you mention it yourself on your website, good.

Thank you for your comments. Before trying to answer your concerns, let me clarify a few things as it seems to me we are mixing two sets of results in the comments:

1. This page: http://infinity77.net/global_optimization/ represents my old work on global optimization (dated 2013), and I consider it now obsolete and superseded by the work at point (2)
2. This page: http://infinity77.net/go_2021/index.html (and subsequent pages) is now my latest reference. If you read the description (and in particular the section about "The Rules"), you will see that they are not quite the same as they were in point (1).
 

I have a few questions/comments:

- The exclusion of gradient-based solvers is unfortunate. It is of course up to you what you investigate, but gradient-based solvers are surely useful in practice. Not all real-life problems involve a (non-analytical) simulation.

I welcome suggestions on which *global* optimization algorithms are gradient-based and can be applied relatively easily using Python, NumPy, SciPy. Note that there is no mention of gradient-based or non gradient-based algorithms in the page http://infinity77.net/go_2021/index.html.
 

- "This effort stems from the fact that I got fed up with the current attitude of most mathematicians/numerical optimization experts, who tend to demonstrate the advantages of an algorithm based on “elapsed time” or “CPU time” or similar meaningless performance indicators." I don't know where you get that from, what are your sources? I have never seen an academic paper that used elapsed time or CPU time. The scientific papers I have read use the number of function evaluations to compare performance, which is a meaningful machine-independent performance measure if the total time is dominated by the time spend in the function, as it usually is.

While I agree that in recent years the shift to use functions evaluations instead of CPU time has greatly prevailed, that was not the case a while back. A few examples:

https://www.mat.univie.ac.at/~neum/ms/comparison.pdf
https://arxiv.org/pdf/1709.08242.pdf

But since that sentence seems to be annoying, I'll just remove it :-) .

 

- I feel uneasy about your performance measure. It is whether the solver finds the minimum value of the function (why not the location of the minimum? Isn't that usually of interest?) within some fixed tolerance in 2000 function evaluations.

I have both the function value and the location of the minimum. Of course they are both important, and of course I will use the location of the minimum to do further analysis. The point of the exercise is: I know where the global optimum (optima) lies, can the solver find it with a specific tolerance? Most benchmarks are designed like this. The 2,000 is not a hard limit anymore per se, as you will notice I have run all the 16 benchmarks with different stopping conditions in terms of maximum number of function evaluations. Specifically, all the benchmarks have been run 22 times, for each run limiting the function evaluations budget to 100, then 200, 300, 400, ..., 2000, 5000, 10000. Some of the benchmarks have been extended to 50,000 (http://infinity77.net/go_2021/globopt.html).
 
a) The maximum number of function evaluations that you use does not depend on the dimensionality of the problem, but it clearly should. The search space is larger in higher dimensional problems, so more evaluations are needed by any algorithm. That is also obvious from your results.

I agree, and this is why I have run all the benchmarks multiple times with varying budgets of function evaluations, and which is also why many benchmarks have a "Dimensionality Effect" chapter to look at what happens when the problem grows in size (http://infinity77.net/go_2021/gkls.html#size-dimensionality-effectshttp://infinity77.net/go_2021/mmtfg.html#size-dimensionality-effects , many others). That said, most of the problems I usually deal with are computationally expensive - so if one day my model has two parameters to tune and I allow the algorithm to have 500 function evaluations, that may take me one week to solve. If the month after it is decided that the model is better described by a function of 10 parameters, I am not going to ask for 2,500 simulations - it's going to take me a month and a half to get the results back. The solver will have to deal with a 10-parameter model in 500-600 evaluations anyway.
 
b) Instead of recording a binary outcome (success/failure to find the minimum in N evaluations), I think it would be more useful to record the number of evaluations until the minimum is reached and then give the mean or median number of function evaluations over many trials as well as the percentage of successful convergences. Algorithms can be ranked by robustness (the number of correctly solved problems) and convergence rate (the average/median number of function evaluations). Robustness and convergence rate may be anti-correlated. You are mixing the two in your performance measure, which makes it more difficult to interpret.

This is exactly what has been done for the SciPy Extended benchmark. http://infinity77.net/go_2021/scipy_extended.html#test-functions-general-solvers-performances tells you that this specific benchmark has been run considering for every benchmark function 100 random starting points, and the reported statistics (overall success, number of functions evaluations) refer to successful optimizations only. I haven't repeated the 100 random starting points for the other 15 benchmarks because it would take me forever to run them like that.
 

- http://infinity77.net/global_optimization/ does not list the same algorithms as the table in http://infinity77.net/go_2021/thebenchmarks.html#info-general-results

please see above: http://infinity77.net/global_optimization is old and should not be looked at anymore. 
 


- While I think we agree that CPU-time is not a useful means to compare algorithms, it is then quite surprising to see Fig 0.6 with the CPU times. I suppose the pure-Python implementations perform so badly because the benchmark functions are all rather small python functions which are quick to evaluate so that the time spend inside the solver does matter.

That's generally correct. It was just a way for me to say that some of the solvers are inherently slower than others, although for a real life problem it shouldn't matter so much.
 

- I see some an inconsistency between your declared goals and your benchmark. You say you only care about optimization of non-analytical functions in which the function computation involves a simulation, but then most of your benchmark functions are analytical functions. In other words, you do not measure the performance on non-analytical functions. It is quite possible that the ranking would look different if you used non-analytical functions in the benchmarks.

The problems in the benchmarks are analytical (or almost analytical, not all benchmarks are like that - did you take a look at http://infinity77.net/go_2021/huygens.html ? ) because the benchmarks are designed to be that way. In global optimization, a benchmark is designed to try and reproduce features that a real-life objective function may exhibit. In the event that I will ever manage to get a paper published on this exercise I will definitely include real-life problems in the analysis, as this is always my aim.

The usefulness of this set of benchmarks come from the fact that, after all this monster analysis with 16 different test suites, I can approach a new, real-life study by saying: "look, I don't know which solvers will be the best, but I will definitely start with MCS, SHGO or Dual Annealing"..
 

- I would prefer to read technical documents written in a more professional detached writing style and I think others would do, too.

That will probably follow if and when (if ever) a paper may be published about this. That said, I welcome, any time, corrections or modifications to the writings - I understand that some of the paragraphs I have written can be seen as less professional than needed, so I will be happy to change them.
 

Andrea.


Regards,
Hans
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@python.org
https://mail.python.org/mailman/listinfo/scipy-dev