On Wed, Mar 8, 2023 at 5:23 PM jiahao chen via SciPy-Dev <scipy-dev@python.org> wrote:
hello all!
   I got some problems using the optimize.minimize(method='L-BFGS-B'). I want to fit a 2D Gaussian signal, and I have got the initial guess for x0, ∑, a (the mean vector, covariance matrix, weight). The loss is defined as the L2 norm of the difference between the calculated signal and the original image. But the function 'optimize.minimize' forces me to write the variables into a 1D array, so I have to arrange write it as x_init = [x0[0],x0[1],∑00,∑01,∑11,a]. But these terms have totally different physical meanings, and their values may be in different orders of magnitude. So as expected, no matter what initial values were used, the minimization stopped after only one iteration and the result was incorrect.
  I want to optimize 3 parameters simultaneously: the mean vector, the covariance matrix, the weight. But the 'optimize.minimize' function forced me to use only one variable. So how should I organize my variables?

Unfortunately, unless if you have a specialized solver for your problem, the art of transforming your problem into a well-behaved function over a Euclidean space is still left up to you.

Fortunately, you are in a position to have pretty good estimators to get initial guesses (you can do means and covariances of the pixel coordinates weighted by the signal intensity and then do a linear least squares to get the final amplitude `a`). First, I'd double check to see if that's good enough. But even if not, then those values will give you good ideas on the scale factors that you can apply to each variable dimension to get them into roughly the same scale. You can also cast your problem in terms of deltas against your original guess. That is, if you have a function `f(x)` that works in the original space, you can instead optimize `def g(dx): return f(x0 + dx * scale_factors)`.

When optimizing covariance matrices, I don't recommend just splatting out the elements as individual terms in the vector. While that sometimes works, it's not respecting the structure of covariance matrices. You can try alternate representations like the marginal standard deviations and the correlation factor, or the eigenvalues and the rotation angle of the semimajor axis. That both helps avoid passing invalid covariance matrices through your Gaussian, it can also help make a nicer space to optimize in.

You can also try rotating through subsets of variables that go well together, holding the other variables fixed. So you might optimize for the mean `x0` first, holding `∑` and `a` fixed, then hold the new `x0` fixed with `a` and optimize `∑`, then finish off with `a` (which is an easy linear least-square problem). Keep going around that cycle until you reach some optimization criterion of your choice. This is an ad hoc optimization algorithm with no formal convergence guarantees, but when there are natural groupings, especially ones that make some of these sub-optimizations nearly trivial, it can be a worthwhile trick.

--
Robert Kern