advice on stochastic(?) optimisation
Hi, I'll looking for a bit of guidance as to what sort of algorithm is most appropriate/efficient for finding the local maximum of a function (in 2 dimensions), where each function evaluation is 1) noisy and 2) expensive/slow to evaluate. I'd welcome any suggestions for where best to start investigating this (text books, references, web-sites or existing optimisation libraries). I've no background in this field at all. cheers, Bryan
I use several "flavors" of evolutionary optimization: particle swarm and variations of the genetic algorithm. Using latin hypercube sampling to generate the initial population is highly recommended. There are several online websites and you can even find some algorithms coded in python. Don't forget to use swig or f2py when you can. On Thu, Aug 28, 2008 at 10:23 AM, bryan cole <bryan.cole@teraview.com> wrote:
Hi,
I'll looking for a bit of guidance as to what sort of algorithm is most appropriate/efficient for finding the local maximum of a function (in 2 dimensions), where each function evaluation is 1) noisy and 2) expensive/slow to evaluate.
I'd welcome any suggestions for where best to start investigating this (text books, references, web-sites or existing optimisation libraries). I've no background in this field at all.
cheers, Bryan
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- Kimberly S. Artita Graduate Student, Engineering Science College of Engineering Southern Illinois University Carbondale Carbondale, Illinois 62901-6603 Office: ENGB 0044, Water Resources Research Lab Phone: (618)-528-0349 E-mail: kartita@gmail.com, kartita@siu.edu web: http://civil.engr.siu.edu/GraduateStudents/artita/index.html
On Thu, Aug 28, 2008 at 11:23 AM, bryan cole <bryan.cole@teraview.com> wrote:
I'll looking for a bit of guidance as to what sort of algorithm is most appropriate/efficient for finding the local maximum of a function (in 2 dimensions), where each function evaluation is 1) noisy and 2) expensive/slow to evaluate.
A few points that might influence the choice of a method are: - Are there any constraints? - Can you compute the first derivatives of your objective function (i.e., its gradient) and of your constraints? - How about second derivatives? - If not, do those functions have a known structure? Do you have access to a computer code that evaluates them or are they essentially a "black box"?
I'd welcome any suggestions for where best to start investigating this (text books, references, web-sites or existing optimisation libraries). I've no background in this field at all.
For noisy optimization you might want to start with Tim Kelley's book and his method (called DIRECT). See http://www4.ncsu.edu/~ctk/iffco.html There is Fortran code that could be wrapped up with f2py and (apparently more up to date) Matlab code that could be converted to Python. For his book, see http://www.ec-securehost.com/SIAM/FR18.html There is also a class of very successful methods called mesh-adaptive direct search. There is a C++ code at http://www.gerad.ca/NOMAD/ for problems which are essentially made up of black boxes. The advantage of such methods is that, despite the difficulty of the problems, they guarantee certain convergence properties. That's not to say that they always work, although they often do, but rather that when they don't you can usually figure out why. Dominique
On Fri, Aug 29, 2008 at 12:23 AM, bryan cole <bryan.cole@teraview.com> wrote:
Hi,
I'll looking for a bit of guidance as to what sort of algorithm is most appropriate/efficient for finding the local maximum of a function (in 2 dimensions), where each function evaluation is 1) noisy and 2) expensive/slow to evaluate.
Do you have a formula for the function f ? If what you have is only noisy observations of f(x), without knowing f, that's basically what stochastic approximation is about: you have a big literature about this kind of problems. The first article is the one introducing Robbins-Monro algorithm: Robbins, H. and Monro, S. "A Stochastic Approximation Method." Ann. Math. Stat. 22, 400-407, 1951. A recent book covering the field is the Kushner and Yin book: http://www.springer.com/math/probability/book/978-0-387-00894-3?cm_mmc=Googl... The problem of those algorithms is that they are hard to implement in python, because of their recursive nature, hence non vectorizable. If your function/observation is hard to compute, it may not be a big problem, though. cheers, David
On Fri, Aug 29, 2008 at 6:46 AM, David Cournapeau <cournape@gmail.com> wrote:
The problem of those algorithms is that they are hard to implement in python, because of their recursive nature, hence non vectorizable. If your function/observation is hard to compute, it may not be a big problem, though.
I forgot a link to some implementation: http://leon.bottou.org/projects/sgd cheers, David
Firstly, thanks everyone for the responses. I think there are enough pointers here to get me started.
Do you have a formula for the function f ? If what you have is only noisy observations of f(x), without knowing f, that's basically what stochastic approximation is about: you have a big literature about this kind of problems.
In fact, this is an instrumentation optimisation. Each function sample operation is in fact an experimental measurement.
The first article is the one introducing Robbins-Monro algorithm:
Robbins, H. and Monro, S. "A Stochastic Approximation Method." Ann. Math. Stat. 22, 400-407, 1951.
A recent book covering the field is the Kushner and Yin book:
http://www.springer.com/math/probability/book/978-0-387-00894-3?cm_mmc=Googl...
"Stochasic Approximation" seems to be just what I need. I'm reading up on it now...
The problem of those algorithms is that they are hard to implement in python, because of their recursive nature, hence non vectorizable. If your function/observation is hard to compute, it may not be a big problem, though.
The expense in terms of my "function" evaluations is so great (~max sample rate is 15 measurements per second), the python overhead will be negligible. However, I hope I can exploit the fact that my function is quite slowly varying. It something like a distorted 2D Gaussian. It can be assumed there's only one maximum within the region bounds. The main problem is that the measurements are noisy, so attempts to estimate the function gradient are very error-prone. This seems like it should be a common problem in experimental science / instrumentation, so there ought to be lots of info on this subject. I just didn't know what heading to search under. cheers, Bryan
On 28-Aug-08, at 11:23 AM, bryan cole wrote:
I'll looking for a bit of guidance as to what sort of algorithm is most appropriate/efficient for finding the local maximum of a function (in 2 dimensions), where each function evaluation is 1) noisy and 2) expensive/slow to evaluate.
Noisy how, exactly? And do you have gradients (or approximate gradients)? Can you at least be guaranteed that the function you are evaluating is proportional (on average) to the true function? There is a wide and deep literature on stochastic gradient descent, particularly in the context of neural networks. Here are some papers that you might find of interest: Local Gain Adaptation in Stochastic Gradient Descent by N. Schraudolph: http://tinyurl.com/69xm45 A set of lecture notes by Leon Bottou on the subject: http:// leon.bottou.org/papers/bottou-mlss-2004 In two dimensions, though, I doubt anything too complicated will be necessary. David
participants (5)
-
bryan cole
-
David Cournapeau
-
David Warde-Farley
-
Dominique Orban
-
Kimberly Artita