
On Wed, 28 Jul 2021 at 03:24, Sara Fridovich-Keil <sarafridov@gmail.com> wrote:
I have been using scipy.optimize.fmin_bfgs on some derivative-free benchmark problems, and noticed that whenever the Wolfe line search requires a directional derivative, the current implementation estimates the entire gradient via finite differencing, and then computes the directional derivative by taking the inner product of the gradient and the search direction. In my experiments, replacing this full gradient estimation with a single extra function evaluation to estimate the directional derivative directly, is faster.
What I’d like to do is have an option for fprime to be either provided or not provided to the Wolfe line search. If the objective has a nice/cheap gradient then the current behavior is fine (passing the gradient function as fprime, and computing directional derivatives with an inner product), but if the objective is derivative-free then derphi should be computed with finite differencing along the search direction (just one extra function evaluation) instead of using fprime.
Estimating gradients with finite differences relies on a good choice of step size. Good step defaults are automatically chosen by `optimize._numdiff.approx_derivative` when fprime is estimated numerically. Estimating derphi with numerical differentiation along the search direction would first of all require a good step size along the search direction, `pk`. Whilst this may be ok if the parameter scaling, pk, and the derivatives are well chosen/well behaved (e.g. all the parameters are the same magnitude, etc), I'm concerned that there will be cases where it won't be as numerically accurate/stable as the existing behaviour. For example, a chosen step along pk may result in individual dx that aren't optimal from a numerical differentiation viewpoint. How would one know if a specific system was exhibiting that behaviour? I'd expect the current code to be more robust than your proposed alternative. Having said that, I'm not an expert in this domain, so I'd be interested to hear what someone who is more expert than me has to say. Can you point to any literature that says that your proposed changes are generally acceptable? Andrew.