Conjugate gradients minimizer

Thu Mar 27 17:26:13 EST 2003

Carl Banks wrote:

[I'd said:]
> > I concur. If it turns out that the overhead from BFGS
> > *is* too much for some reason, you might also consider
> > something intermediate between conjugate gradient and
> > BFGS, such as L-BFGS.
> > 
> > But it's not true that BFGS requires inverting a matrix
> > at each step. The matrix you maintain is an approximation
> > to the *inverse* of the Hessian, and no inversion is
> > necessary.
> 
> Oh, silly me.  I probably said that but just intended to mean "you
> have to store an NxN matrix."  I never heard of L-BFGS, though.  I
> admit I got my optimization knowledge from 20-year-old texts.

Nothing very wrong with that; some 20-year-old optimization
books are very good :-). If you're in the market for something
newer, I can highly recommend Nocedal & Wright, which (inter
alia) covers L-BFGS. (Not surprising, since I think Nocedal
invented it.) The idea of L-BFGS is that instead of storing
the approximation to the inverse Hessian explicitly, you
store some information from each of the last <some-number>
iterations, which you use to reconstruct what the approximate
inverse Hessian would have been if those were the only
iterations that had ever happened. Choosing <some-number>
carefully lets you interpolate between a scheme very like
conjugate gradient (if you only keep a couple of iterations'
worth of information) through to something equivalent[1] to
full BFGS (if you store as many as your space has dimensions).

[1] Er, I'm not certain it's equivalent. If it's not, it's
    probably at any rate about equally good. My books on
    optimization are all at work, and I'm at home :-).

-- 
Gareth McCaughan  Gareth.McCaughan at pobox.com
.sig under construc