Proposal for more generic optimizers (posted before on scipy-user)
Hi, I'm migratting toward Python for some weeks, but I do not find the tools I had to develop for my PhD in SciPy at the moment. I can't, for instance, find an elegant way to save the set of parameters used in an optimization for the standard algorithms. What is more, I think they can be more generic. What I did in C++, and I'd like your opinion about porting it in Python, was to define a standard optimizer with no iteration loop - iterate was a pure virtual method called by a optimize method -. This iteration loop was then defined for standard optimizer or damped optimizer. Each time, the parameters tested could be saved. Then, the step that had to be taken was an instance of a class that used a gradient step, a Newton step, ... and the same was used for the stoping criterion. The function was a class that defined value, gradient, hessian, ... if needed. For instance, a simplified instruction could have been : Optimizer* optimizer = StandardOptimizer</*some more parameters not relevant in Python*/(function, GradientStep(), SimpleCriterion(NbMaxIterations), step, saveParameters); optimizer->optimize(); optimizer->getOptimalParameters(); The "step" argument was a constant by which the computed step had to be multiplied, by default, it was 1. I know that this kind of writting is not as clear and lightweight as the current one, which is used by Matlab too. But perhaps giving more latitude to the user can be proposed with this system. If people want, I can try making a real Python example... Matthieu P.S. : sorry for the multiposting, I forgot that there were two ML for SciPy :(
Hi, Here is a little proposal for the simple optimizer - I intend to make a damped one if the structure I propose is OK -. What is in the package : - Rosenbrock is the Rosenbrock function, with gradient and hessian method, the example - Optimizer is the core optimizer, the skeletton - StandardOptimizer is the standard optimizer - not very complicated in fact - with six optimization examples - Criterions is a file with three simple convergence criterions, monotony, relative error and absolute error. More complex can be created. - GradientStep is a class taht computes the gradient step of a function at a specific point - NewtonStep is the same as the latter, but with a Newton step. - NoAppendList is an empty list, not derived from list, but it could be done if needed. The goal was to be able to save every set of parameters if needed, by passing a list or a container to Optimizer Some may wonder why the step is a class and not a function. It could be a function, as I use functors, but I want a class so that state-based steps can be used as well, as Levenberg-Marquardt one for instance Now, it is not very complicated, just a bunch of class that are really simple, but if this kind of structure is interesting, I'd like some comments so as it can be made more Pythonic. Matthieu 2007/2/27, Matthieu Brucher <matthieu.brucher@gmail.com>:
Hi,
I'm migratting toward Python for some weeks, but I do not find the tools I had to develop for my PhD in SciPy at the moment. I can't, for instance, find an elegant way to save the set of parameters used in an optimization for the standard algorithms. What is more, I think they can be more generic.
What I did in C++, and I'd like your opinion about porting it in Python, was to define a standard optimizer with no iteration loop - iterate was a pure virtual method called by a optimize method -. This iteration loop was then defined for standard optimizer or damped optimizer. Each time, the parameters tested could be saved. Then, the step that had to be taken was an instance of a class that used a gradient step, a Newton step, ... and the same was used for the stoping criterion. The function was a class that defined value, gradient, hessian, ... if needed. For instance, a simplified instruction could have been : Optimizer* optimizer = StandardOptimizer</*some more parameters not relevant in Python*/(function, GradientStep(), SimpleCriterion(NbMaxIterations), step, saveParameters); optimizer->optimize(); optimizer->getOptimalParameters();
The "step" argument was a constant by which the computed step had to be multiplied, by default, it was 1.
I know that this kind of writting is not as clear and lightweight as the current one, which is used by Matlab too. But perhaps giving more latitude to the user can be proposed with this system. If people want, I can try making a real Python example...
Matthieu
P.S. : sorry for the multiposting, I forgot that there were two ML for SciPy :(
On Thu, 8 Mar 2007, Matthieu Brucher apparently wrote:
Here is a little proposal for the simple optimizer - I intend to make a damped one if the structure I propose is OK -. What is in the package : - Rosenbrock is the Rosenbrock function, with gradient and hessian method, the example - Optimizer is the core optimizer, the skeletton - StandardOptimizer is the standard optimizer - not very complicated in fact - with six optimization examples - Criterions is a file with three simple convergence criterions, monotony, relative error and absolute error. More complex can be created. - GradientStep is a class taht computes the gradient step of a function at a specific point - NewtonStep is the same as the latter, but with a Newton step. - NoAppendList is an empty list, not derived from list, but it could be done if needed. The goal was to be able to save every set of parameters if needed, by passing a list or a container to Optimizer
Hard to say since there is no interface description, but a couple reactions ... - isolate the examples (probably you already did) perhaps in a subdirectory - possibly package the step classes together - don't introduce the NoAppendList class unless it is really needed, and it doesn't seem to be. The Optimizer can just create a standard container when needed to keep track of as much of the optimization history as might be desired. (Thinking about an interface to express various desires might be useful.) - rename Criterions, perhaps to Criteria or ConvergeTest Cheers, Alan Isaac
Thanks for the comments ! Hard to say since there is no interface description, In fact, the interface description is in the files, but I can put a global description here, after I used your comments to enhance the code :) but a couple reactions ...
- isolate the examples (probably you already did) perhaps in a subdirectory
Will be done - possibly package the step classes together This can be done, also for the optimizer type and the criterias as the goal is to be able to use every optimizer with every step and every criteria. - don't introduce the NoAppendList class unless it is really
needed, and it doesn't seem to be. The Optimizer can just create a standard container when needed to keep track of as much of the optimization history as might be desired. (Thinking about an interface to express various desires might be useful.)
The purpose of this list was to not make a test inside the iteration loop. In the first idea I had, a test was made to know if the parameters were to be saved or not. I'm balanced between the two solutions. - rename Criterions, perhaps to Criteria or ConvergeTest Yes, sorry, I did not use the good translation. OK, I'll work on the update. Matthieu
On Thu, 8 Mar 2007, Matthieu Brucher apparently wrote:
The purpose of this list was to not make a test inside the iteration loop. In the first idea I had, a test was made to know if the parameters were to be saved or not. I'm balanced between the two solutions.
You can skip a test inside the loop and use a default method that is always called. def record_history(self,etcetera): pass Allow alternative functions to be passed by the user as part of the Optimizer object initialization so that overriding is not necessary. Of course the 'etcetera' part is important ... I can imagine wanting lots of pieces of the history. Cheers, Alan Isaac
You can skip a test inside the loop and use a default method that is always called.
def record_history(self,etcetera): pass
Allow alternative functions to be passed by the user as part of the Optimizer object initialization so that overriding is not necessary.
Of course the 'etcetera' part is important ... I can imagine wanting lots of pieces of the history.
OK, I see what you mean, and it is far more generic this way, so I will take this road. For the etcetera argument, I suppose a **parameter is a good choice, and passing every variable of the loop to record_history is what you mean by "I can imagine wanting lots of pieces of the history." My last question before returning to work would be where to find the coding conventions for SciPy, I did not find them on the website, but I must missed something... Matthieu
On Thu, 8 Mar 2007, Matthieu Brucher apparently wrote:
For the etcetera argument, I suppose a **parameter is a good choice
It is the obvious choice, but I am not sure what the best approach will be. Presumably implementation will be revealing.
and passing every variable of the loop to record_history is what you mean by "I can imagine wanting lots of pieces of the history."
Yes. Cheers, Alan Isaac
For the etcetera argument, I suppose a **parameter is a good choice
It is the obvious choice, but I am not sure what the best approach will be. Presumably implementation will be revealing.
Some of the arguments cannot be decided beforehand. For instance, for a damped optimizer, which sets of parameters should be saved ? Everyone, including each that is tested for minimization of the function, or only the one at the end of the loop ? I'll make a simple proposal for the interface with the modifications I added since your email. Matthieu
Here is my new proposal. So, the interface is separated in three modules : - Criteria contains the converge criteria - MonotonyCriterion, constructed with a iteration limit and an error level - RelativeValueCriterion, constructed with a iteration limit and an error level - AbsoluteValueCriterion, constructed with a iteration limit and an error level I think the names are self-explaining. The interface of these criteria is a simple __call__ method that take the current number of iterations, the last values of the cost function and the corresponding points. - Step contains some step that the optimizer can take in the process. Their interface is simple, a __call__ method with a cost function as an argument as well as the point at which the step must be computed - GradientStep needs that the cost function implements the gradient method - NewtonStep needs that the cost unction implements the gradient and the hessian method - Optimizer contains the optimizer skeletton as well as a standard optimizer - Optimizer implements the optimize method that calls iterate until the criterion is satisfied. Paramaters are the cost function and the criterion, can have a record argument that will be called on each iteration with the information of the step - point, value, iteration, step, ... - and can have a stepSize parameter - it is a factor that will be multiplied by the step, useful to have little step in a steepest /gradient descent - - StandardOptimizer implements the standard optimizer, that is the new point is the last point + the step. The additional arguments are an instance of a step - GradientStep or NewtonStep at this point - and... the starting point. perhaps these arguments should be put in the Optimizer. A cost function that must be optimized must/can have : - a __call__ method with a point as argument - a gradient method with the same argument, if needed - a hessian argument, same argument, if needed Other steps I use in my research need additional method, but for a simple proposal, no need for them. Some examples are provided in the Rosenbrock.py file, it is the Rosenbrock function, with __call__, gardient and hessian method. Then 6 optimizations are made, some converge to the real minimum, other don't because of the choice for the criterion - the MonotonyCriterion is not very useful here, but for a damped or a stochastic optimizer, it is a pertinent choice -. AppendList.py is an example for the "record" parameter, it saves only the points used in the optimization. 2007/3/8, Matthieu Brucher <matthieu.brucher@gmail.com>:
For the etcetera argument, I suppose a **parameter is
a good choice
It is the obvious choice, but I am not sure what the best approach will be. Presumably implementation will be revealing.
Some of the arguments cannot be decided beforehand. For instance, for a damped optimizer, which sets of parameters should be saved ? Everyone, including each that is tested for minimization of the function, or only the one at the end of the loop ?
I'll make a simple proposal for the interface with the modifications I added since your email.
Matthieu
I don't really have time to look at this for the next week, but a couple quick comments. 1. Instead of:: if 'stepSize' in kwargs: self.stepSize = kwargs['stepSize'] else: self.stepSize = 1. I prefer this idiom:: self.stepSize = kwargs.get('stepSize',1) 2. All optimizers should have a maxiter attribute, even if you wish to set a large default. This needs corresponding changes in ``optimize``. 3. It seems like ``AppendList`` is an odd and specific object. I'd stick in in the example file. 4. I understand that you want an object that provides the function, gradient, and hessian. But when you make a class for these, it is full of (effectively) class functions, which suggests just using a module. I suspect there is a design issue to think about here. This might (??) go so far as to raise questions about the usefulness of the bundling. Cheers, Alan Isaac
2007/3/11, Alan G Isaac <aisaac@american.edu>:
I don't really have time to look at this for the next week, but a couple quick comments.
Thanks for the comments, I hope other people will help be with it :) 1. Instead of::
if 'stepSize' in kwargs: self.stepSize = kwargs['stepSize'] else: self.stepSize = 1.
I prefer this idiom::
self.stepSize = kwargs.get('stepSize',1)
Yes, true, I'll make the changes. 2. All optimizers should have a maxiter attribute,
even if you wish to set a large default. This needs corresponding changes in ``optimize``.
OK, it can be done, in fact in the C++ implementation I use, the maxiter is a variable of the optimizer, not of the criterion. 3. It seems like ``AppendList`` is an odd and specific
object. I'd stick in in the example file.
Yes, it can be put there, it was there for modularization. 4. I understand that you want an object that provides
the function, gradient, and hessian. But when you make a class for these, it is full of (effectively) class functions, which suggests just using a module.
It's not only a module, it is a real class, with a state. For instance, an approximation function can need a set of points that will be stored in the class, and a module is not enough to describe it - a simple linear approximation with a robust cost function for instance - I suspect there is a design issue to think about here.
This might (??) go so far as to raise questions about the usefulness of the bundling.
Perhaps a more precise example of the usefullness is needed ? Matthieu
4. I understand that you want an object that provides
the function, gradient, and hessian. But when you make a class for these, it is full of (effectively) class functions, which suggests just using a module.
On Sun, 11 Mar 2007, Matthieu Brucher apparently wrote:
It's not only a module, it is a real class, with a state. For instance, an approximation function can need a set of points that will be stored in the class, and a module is not enough to describe it - a simple linear approximation with a robust cost function for instance -
This seems to be a different question? One question is the question of optimizer design: should it take as an argument an object that provides a certain mix of services, or should it take instead as arguments the functions proiding those services. I am not sure, just exploring it. I am used to optimizers that take a function, gradient procedure, and hessian procedure as arguments. I am just asking whether *requiring* these to be bundled is the right thing to do. This design would not mean that I cannot pass as arguments methods of some object. (I think I am responding to your objection here.) Note that requiring bundling imposes an interface requirement on the bundling object. This is not true if I just provide the functions/methods as arguments.
Perhaps a more precise example of the usefullness is needed ?
Perhaps so. Cheers, Alan Isaac
Hi again This seems to be a different question?
One question is the question of optimizer design: should it take as an argument an object that provides a certain mix of services, or should it take instead as arguments the functions proiding those services. I am not sure, just exploring it.
For a object point of view, I think that the "thing" the more capable of knowing how to optimize itself is an object function. It knows better of the context than several functions/objects each responsible for the value, gradient, ... I am used to optimizers that take a function,
gradient procedure, and hessian procedure as arguments. I am just asking whether *requiring* these to be bundled is the right thing to do.
I find it far more convinient this way. Here, there is only gradient and hessian, but one could imagine adding hessianp that returns hessian*gradient, and if this method is available, it is automatically used. In my research, I use partial gradient - only a part of the gradient, for some parameters, is computed -, and if I had to pass it to the optimizer each time, it woulb be very cumbersome - mainly because I use several optimizers used sequentially - It is written more shortly, every argument in the constructor can be used by the methods, ... This design would not mean that I cannot pass as arguments
methods of some object. (I think I am responding to your objection here.)
If you want to pass a method of another object to the optimizer, I suppose you have a design problem. How a method of another object can know better that a method from the object being optimized ? But then, you could do this by assigning the gradient method to your custom object - but as I said, it would be very very awkward and not adviced from a strictly computer science point of view -. I really would want to have a precise example as well :D Note that requiring bundling imposes an interface
requirement on the bundling object. This is not true if I just provide the functions/methods as arguments.
Yes, that's true, but for functions, you still have to provide them, the only thing you have to provide that is not in the functions solution is that every gradient or hessian must have the same name. Not much, knowing that you have a more concise module.
Perhaps a more precise example of the usefullness is needed ?
Perhaps so.
Enclosed is a simple example, only with a gradient, deriving the hessian was not useful for the example. So it is a class that allows the estimation the parameters (a, b) in y = ax + b, knowing y and x. The cost function is based on the German McClure "distance", so it has an additional parameters - and it does not converge efficiently - If I had to pass the gradient as a parameters, how could I pass efficiently y, x and the GM parameter to the function ? - it really is a question, BTW - Here, I can't forget a parameter, they are all needed at the construction, so less error prone, more robust as every call is made with the same parameters and more modular as my cost function is really one block. But then I know that it is not something we, as scientists, are accustomed to, but as Python is object oriented, we could use this. It was not possible with Matlab, and I suppose that this impacted a lot on how we want to use optimizers, something new is always more difficult to apprehend and use. Matthieu P.S. : I did not change the code at the moment, but I'm thinking about the most efficient way to do it :)
Hi, I didn't have the time to make the changes Alan proposed, but I would like some other advice... The goal of my proposal is to have something better than MatLab Optimization Toolbox, at least for the simplest optimization, domain where Matlab does not use the litterature - for instance the conjugate-gradient method seems to not use the Wolfe conditions for convergence... -. And the structure for an optimizer is thought so as to be more modular, so implementing a new "optimizer" does not imply writting everything from scratch. Everything cannot be thought in advance, but some can. Matthieu
Hallo Matthieu Brucher, Alan Isaac and other developers, (excuse my bad English) this is a last-year postgraduate from instityte of cybernetics, Ukraine National Sciences academy, optimization department I'm interested in the way you intend to continue optimization routines development in Python, & I'm observing the thread in forum. Despite I'm member of all 3 mailing lists referred to scipy/numpy, my messages somehow return "waiting for moderator approvement" & nothing is published. That's why I decided to add your emails for more safety in adresses for sending. So I have 3 years experience of optimization in MATLAB, and some experience with TOMLAB (tomopt.com). I wrote toolbox "OpenOpt" which can run both MATLAB & Octave http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=13115&objectType=file there are 6 solvers currently: 2 nonsmooth local (from our deparment) & 4 global (connected from other GPL sourses). There is also nonSmoothSole - fsolve equivalent for nonsmooth funcs (no guarantie for non-convex funcs), and example of comparison is included. The key feature is TOMLAB-like interface, it's like that: prob = ooAssign(objFun, x0, .....<optional params>) r = ooRun(prob, solver) in TOMLAB you should write strict manner, like prob = nlpAssign(objFun, x0, [],[],[],A,b,Aeq,beq,[],[],[], [],[],f0,[],...) so I decided to replace it by string assignment prob = nlpAssign(objFun, x0, 'A', A, 'beq', beq, 'Aeq', Aeq) or prob = nlpAssign(objFun, x0, 'A=[1 2 3]; b=2; TolFun=1e-5; TolCon=1e-4; doPlot=1') etc then params may be assigned directly: prob.parallel.df=1;%use parallel calculation of numerical (sub)gradient via MATLAB dfeval() prob.doPlot= false; prob.fPattern = ... prob.cPattern = ... %patterns of dependences i_th constraint by x(j) prob.hPattern = ... prob.check.dc=1; %check user-supplied gradient etc So some time later I encounted things that don't allow effective further development (first of all passing by copy, not reference), & now I'm rewriting its to Pyhton (about 20-25% is done for now). I've got some experience and things in Python ver will be organiezed in a better way. for example prob = NLP(myObjFun, x0, TolFun=1e-5, TolCon=1e-4, TolGrad=1e-6, TolX=1e-4, MaxIter=1e4, MaxFunEvals=1e5, MaxTime=1e8, MaxCPUTime=1e8, IterPrint=1etc); # or prob = LP(...), prob = NSP(...) - nonSmoothProblem, prob = QP(...) etc prob.run() # or maybe r = prob.run() I intend to connect some unconstrained solvers from scipy.optimize. All people in my department are opensourse followers; we have much optimization-related software (most is for nonsmooth funcs & network problems, we research them from 1965), but almost all is Fortran-written. I intend to call for GSoC support, but the only one scipy-related person I found in http://wiki.python.org/moin/SummerOfCode/Mentors is Jarrod Millman <http://wiki.python.org/moin/JarrodMillman> And as far as I understood from conversation with some persons from PSF, this year in GSoC they are interested first of all in Python core, so chances for getting support are very low. However, if you can help me in any way please inform me. There are also some chances to achieve direct google support, but last year only 15 students have sucseeded. But I will continue my work in anyway. WBR, Dmitrey. Matthieu Brucher пишет:
Hi,
I didn't have the time to make the changes Alan proposed, but I would like some other advice... The goal of my proposal is to have something better than MatLab Optimization Toolbox, at least for the simplest optimization, domain where Matlab does not use the litterature - for instance the conjugate-gradient method seems to not use the Wolfe conditions for convergence... -. And the structure for an optimizer is thought so as to be more modular, so implementing a new "optimizer" does not imply writting everything from scratch. Everything cannot be thought in advance, but some can.
Matthieu ------------------------------------------------------------------------
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
dmitrey wrote:
All people in my department are opensourse followers; we have much optimization-related software (most is for nonsmooth funcs & network problems, we research them from 1965), but almost all is Fortran-written. I intend to call for GSoC support, but the only one scipy-related person I found in http://wiki.python.org/moin/SummerOfCode/Mentors is Jarrod Millman <http://wiki.python.org/moin/JarrodMillman> And as far as I understood from conversation with some persons from PSF, this year in GSoC they are interested first of all in Python core, so chances for getting support are very low. However, if you can help me in any way please inform me.
The Space Telescope Science Institute is also a mentoring organization this year, and they have offered to take some scipy projects, too. http://code.google.com/soc/stsci/about.html http://www-int.stsci.edu/~aconti/GSoC.htm I look forward to seeing what you come up with. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, 22 Mar 2007, Matthieu Brucher wrote:
the structure for an optimizer is thought so as to be more modular, so implementing a new "optimizer" does not imply writting everything from scratch. Everything cannot be thought in advance, but some can.
We certainly agree on this. My main expressed concern was about interface. Now I have another possibly crazy thought. Right now you require an object that implements methods with certain names, which is ok but I think not perfect. Here is a possibly crazy thought, just to explore. How about making all arguments optional, and allowing passing either such an object or the needed components? Finally, to accmmodate existing objects with a different name structure, allow passing a dictionary of attribute names. fwiw, Alan
Right now you require an object that implements methods with certain names, which is ok but I think not perfect. Here is a possibly crazy thought, just to explore. How about making all arguments optional, and allowing passing either such an object or the needed components? Finally, to accmmodate existing objects with a different name structure, allow passing a dictionary of attribute names.
Indeed, that could be a solution. The only question remaining is how to use the dictionnary, perhaps creating a fake object with the correct interface ? Matthieu
On Thursday 22 March 2007 13:37:59 Matthieu Brucher wrote:
Right now you require an object that implements methods with certain names, which is ok but I think not perfect. Here is a possibly crazy thought, just to explore. How about making all arguments optional, and allowing passing either such an object or the needed components? Finally, to accmmodate existing objects with a different name structure, allow passing a dictionary of attribute names.
Indeed, that could be a solution. The only question remaining is how to use the dictionnary, perhaps creating a fake object with the correct interface
Just 2c: I just ported the loess package to numpy (the package is available in the scipy SVN sandbox as 'pyloess'). The loess part has a C-based structure that looks like what you're trying to reproduce. Basically, the package implements a new class, loess. The attributes of a loess object are themselves classes: one for inputs, one for outputs, and several others controlling different aspects of the estimation. The __init__method of a loess object requires two mandatory arguments for the independent variables (x) and the observed response (y). These arguments are used to initialize the inputs subclass. The __init__ method also accepts optional parameters, that modiify the values of the different parameters. However, one can also modify specific arguments directly. The outputs section is created when a new loess object is instantiated, but with empty arrays. The arrays are filled when the .fit() method of the loess object is called. Maybe you would like to check the tests/test_pyloess.py file to get a better idea. I'm currently updating the docs and writing a small example.
Thanks for this intel ;) I think that what I want to do is more general that what LOESS proposes, as it is not only about modelling. But there surely is stuff to use :) - and on other topic, there are some good ideas in LOESS that I could use in my PhD thesis - Matthieu 2007/3/22, Pierre GM <pgmdevlist@gmail.com>:
On Thursday 22 March 2007 13:37:59 Matthieu Brucher wrote:
Right now you require an object that implements methods with certain names, which is ok but I think not perfect. Here is a possibly crazy thought, just to explore. How about making all arguments optional, and allowing passing either such an object or the needed components? Finally, to accmmodate existing objects with a different name structure, allow passing a dictionary of attribute names.
Indeed, that could be a solution. The only question remaining is how to use the dictionnary, perhaps creating a fake object with the correct interface
Just 2c: I just ported the loess package to numpy (the package is available in the scipy SVN sandbox as 'pyloess'). The loess part has a C-based structure that looks like what you're trying to reproduce.
Basically, the package implements a new class, loess. The attributes of a loess object are themselves classes: one for inputs, one for outputs, and several others controlling different aspects of the estimation.
The __init__method of a loess object requires two mandatory arguments for the independent variables (x) and the observed response (y). These arguments are used to initialize the inputs subclass. The __init__ method also accepts optional parameters, that modiify the values of the different parameters. However, one can also modify specific arguments directly.
The outputs section is created when a new loess object is instantiated, but with empty arrays. The arrays are filled when the .fit() method of the loess object is called.
Maybe you would like to check the tests/test_pyloess.py file to get a better idea. I'm currently updating the docs and writing a small example. _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On Thursday 22 March 2007 15:19:55 Matthieu Brucher wrote:
Thanks for this intel ;) I think that what I want to do is more general that what LOESS proposes, as it is not only about modelling. But there surely is stuff to use :)
No doubt about that. There was a need for lowess/stl/loess at one point, and I already have parts done. I just had to patch the rest. The info was given just as an example of implementation. The python part was strongly dependent on the underlying C code, hence the multiplication of subclasses. I considered simplify the implementation, but realized it already had everything I needed in a pretty neat form.
Alan wrote:
Right now you require an object that implements methods with certain names, which is ok but I think not perfect. Here is a possibly crazy thought, just to explore. How about making all arguments optional, and allowing passing either such an object or the needed components? Finally, to accmmodate existing objects with a different name structure, allow passing a dictionary of attribute names.
On Thu, 22 Mar 2007, Matthieu Brucher apparently wrote:
Indeed, that could be a solution. The only question remaining is how to use the dictionnary, perhaps creating a fake object with the correct interface ?
OK, I see why you want that approach. (So that you can still pass a single object around in your optimizer module.) Yes, that seems right... This seems to bundle naturally with a specific optimizer? If so, the class definition should reside in the StandardOptimizer module. Cheers, Alan Isaac PS For readability, I think Optimizer should define a "virtual" iterate method. E.g., def iterate(self): return NotImplemented
OK, I see why you want that approach. (So that you can still pass a single object around in your optimizer module.) Yes, that seems right...
Exactly :) This seems to bundle naturally with a specific optimizer? I'm not an expert in optimization, but I intended several class/seminars on the subject, and at least for the usual simple optimizer - the standard optimizer, all damped approach, and all the other that use a step and a criterion test - use this interface, and with a lot of different steps that are usual - gradient, every conjugated gradient solution, (quasi-)Newton - or criteria. I even suppose it can do very well in semi-quadratic optimization, with very little change, but I have to finish some work before I can read some books on the subject to begin implementing it in Python. If so, the class definition should reside in the StandardOptimizer module.
Cheers, Alan Isaac
PS For readability, I think Optimizer should define a "virtual" iterate method. E.g., def iterate(self): return NotImplemented
Yes, it seems better. Thanks for the opinion ! Matthieu
A little update of my proposal : - each step can be update after each iteration, it will be enhanced so that everything computed in the iteration will be passed on, in case it is needed to update the step. That could be useful for approximated steps - added a simple Damped optimizer, it tries to take a step, if the cost is higher than before, half a step is tested, ... - a function object is created if the function argument is not passed (takes the arg 'fun' as the cost function, gradient for the gradient, ...). Some safeguards must still be implemented. I was thinking of the limits of this architecture : - defenitely all quasi-Newton optimizers can be ported to this framework, as well as all semi-quadratic ones - constrained optimization will not unless it is modified so that it can, but as I do not use such optimizers in my PhD thesis, I do not know them enough But even the simplex/polytope optimizer (fmin) can be expressed in the framework - it is useless though, as it would be slower -, and can advantages of the different stopping criteria. BTW, I used some parts of this framework in an EM algorithm with an AIC based optimizer on the top. As I said in another thread, I'm in favour of fine-grained modules, even if some wrapper can provide simple optimization procedures. Matthieu 2007/3/26, Matthieu Brucher <matthieu.brucher@gmail.com>:
OK, I see why you want that approach.
(So that you can still pass a single object around in your optimizer module.) Yes, that seems right...
Exactly :)
This seems to bundle naturally with a specific optimizer?
I'm not an expert in optimization, but I intended several class/seminars on the subject, and at least for the usual simple optimizer - the standard optimizer, all damped approach, and all the other that use a step and a criterion test - use this interface, and with a lot of different steps that are usual - gradient, every conjugated gradient solution, (quasi-)Newton - or criteria. I even suppose it can do very well in semi-quadratic optimization, with very little change, but I have to finish some work before I can read some books on the subject to begin implementing it in Python.
If so, the class definition should reside in the StandardOptimizer module.
Cheers, Alan Isaac
PS For readability, I think Optimizer should define a "virtual" iterate method. E.g., def iterate(self): return NotImplemented
Yes, it seems better.
Thanks for the opinion !
Matthieu
A new proposal... I refactored the code for the line search part, it is now a another module. The damped optimizer of the last proposal is now a damped line search, by default no line search is performed at all. Matthieu 2007/4/13, Matthieu Brucher <matthieu.brucher@gmail.com>:
A little update of my proposal :
- each step can be update after each iteration, it will be enhanced so that everything computed in the iteration will be passed on, in case it is needed to update the step. That could be useful for approximated steps - added a simple Damped optimizer, it tries to take a step, if the cost is higher than before, half a step is tested, ... - a function object is created if the function argument is not passed (takes the arg 'fun' as the cost function, gradient for the gradient, ...). Some safeguards must still be implemented.
I was thinking of the limits of this architecture : - defenitely all quasi-Newton optimizers can be ported to this framework, as well as all semi-quadratic ones - constrained optimization will not unless it is modified so that it can, but as I do not use such optimizers in my PhD thesis, I do not know them enough
But even the simplex/polytope optimizer (fmin) can be expressed in the framework - it is useless though, as it would be slower -, and can advantages of the different stopping criteria. BTW, I used some parts of this framework in an EM algorithm with an AIC based optimizer on the top.
As I said in another thread, I'm in favour of fine-grained modules, even if some wrapper can provide simple optimization procedures.
Matthieu
2007/3/26, Matthieu Brucher < matthieu.brucher@gmail.com>:
OK, I see why you want that approach.
(So that you can still pass a single object around in your optimizer module.) Yes, that seems right...
Exactly :)
This seems to bundle naturally with a specific optimizer?
I'm not an expert in optimization, but I intended several class/seminars on the subject, and at least for the usual simple optimizer - the standard optimizer, all damped approach, and all the other that use a step and a criterion test - use this interface, and with a lot of different steps that are usual - gradient, every conjugated gradient solution, (quasi-)Newton - or criteria. I even suppose it can do very well in semi-quadratic optimization, with very little change, but I have to finish some work before I can read some books on the subject to begin implementing it in Python.
If so, the class definition should reside in the StandardOptimizer
module.
Cheers, Alan Isaac
PS For readability, I think Optimizer should define a "virtual" iterate method. E.g., def iterate(self): return NotImplemented
Yes, it seems better.
Thanks for the opinion !
Matthieu
I started a discussion page on the Trac for design ideas etc. about modular optimization. Right now I am just adding questions I have about things. As it becomes more coherent, we can bring these questions/ideas to the list for comments. http://projects.scipy.org/scipy/scipy/wiki/DevelopmentIdeas/ ModularOptimization Michael. P.S. What is the best way to share code ideas on the wiki? Small bits work well inline, but for larger chunks it would be nice to be able to attach files. Unfortunately none of the wiki's deals well with attached files (no versioning and sometimes no way of modifying the file). On 13 Apr 2007, at 9:14 AM, Matthieu Brucher wrote:
A new proposal... I refactored the code for the line search part, it is now a another module. The damped optimizer of the last proposal is now a damped line search, by default no line search is performed at all.
Matthieu
Good point ;) I could make those changes safe for the f or func part... Using an object to optimize is for me better than a collection of functions although a collection of functions can be made into an object if needed. For the interface, I suppose that assembling a optimizer is not something everybody will want to do, that's why some optimizers are proposed out of the box in MatLab toolboxes for instance, but allowing to customize rapidly an optimizer can be a real advantage over all other optimization packages. One of the members of the lab I studying in said to me that he did see if such modularization was pertinent. He used for its application (warping an image) a Levenberg-Marquardt optimizer with constraints and the line-search was performed with interval analysis. Until some days ago, I thought that he was right, that only some of optimizers can be expressed in "my" framework. Now, I think that even his optimization could be expressed, and if he wanted to modify something in the optimizer, it would be much simpler with this architecture, in Python, that what he has now, in C. He made some stuff very specific for his function, as a lot of people would want to do, but couldn't with a fixed interface ike MatLab's, but in fact a lot could be expressed in terms of a specific step, a specific line search, a specific criterion and a specific function/set of parameters. Until some time ago, I thought that modules with criteria, steps and optimizers would be enough, now I think I missed the fact that a lot of optimizers share the line search, and that it should be onother module. I'm writting some other tests functions (shamelessly taken from _Engineering Optimization_ from Rao) with other line searches and steps, I'll keep you posted. Matthieu 2007/4/14, Michael McNeil Forbes <mforbes@physics.ubc.ca>:
I started a discussion page on the Trac for design ideas etc. about modular optimization. Right now I am just adding questions I have about things. As it becomes more coherent, we can bring these questions/ideas to the list for comments.
http://projects.scipy.org/scipy/scipy/wiki/DevelopmentIdeas/ ModularOptimization
Michael.
P.S. What is the best way to share code ideas on the wiki? Small bits work well inline, but for larger chunks it would be nice to be able to attach files. Unfortunately none of the wiki's deals well with attached files (no versioning and sometimes no way of modifying the file).
On 13 Apr 2007, at 9:14 AM, Matthieu Brucher wrote:
A new proposal... I refactored the code for the line search part, it is now a another module. The damped optimizer of the last proposal is now a damped line search, by default no line search is performed at all.
Matthieu
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On 14 Apr 2007, at 8:59 AM, Matthieu Brucher wrote:
Good point ;) I could make those changes safe for the f or func part... Using an object to optimize is for me better than a collection of functions although a collection of functions can be made into an object if needed.
For the interface, I suppose that assembling a optimizer is not something everybody will want to do, that's why some optimizers are proposed out of the box in MatLab toolboxes for instance, but allowing to customize rapidly an optimizer can be a real advantage over all other optimization packages.
And one can easily make convenience functions which take standard arguments and package them internally. I think that the interface should be flexible enough to allow users to just call the optimizers with a few standard arguments like they are used to, but allow users to "build" more complicated/more customized optimizers as they need. Also, it would be nice if an optimizer could be "tuned" to a particular problem (i.e. have a piece of code that tries several algorithms and parameter values to see which is fastest.)
One of the members of the lab I studying in said to me that he did see if such modularization was pertinent. He used for its application (warping an image) a Levenberg-Marquardt optimizer with constraints and the line-search was performed with interval analysis. Until some days ago, I thought that he was right, that only some of optimizers can be expressed in "my" framework. Now, I think that even his optimization could be expressed, and if he wanted to modify something in the optimizer, it would be much simpler with this architecture, in Python, that what he has now, in C. He made some stuff very specific for his function, as a lot of people would want to do, but couldn't with a fixed interface ike MatLab's, but in fact a lot could be expressed in terms of a specific step, a specific line search, a specific criterion and a specific function/set of parameters.
Until some time ago, I thought that modules with criteria, steps and optimizers would be enough, now I think I missed the fact that a lot of optimizers share the line search, and that it should be onother module.
My immediate goal is to try and get the interface and module structure well defined so that I know where to put the pieces of my Broyden code when I rip it apart. One question about coupling: Useful criteria for globally convergent algorithms include testing the gradients and/or curvature of the function. In the Broyden algorithm, for example, these would be maintained by the "step" object, but the criteria object would need to access these. Likewise, if a "function" object can compute its own derivatives, then the "ceriterion" object should access it from there. Any ideas on how to deal with these couplings? Perhaps the "function" object should maintain all the state (approximate jacobians etc.). Michael.
And one can easily make convenience functions which take standard arguments and package them internally. I think that the interface should be flexible enough to allow users to just call the optimizers with a few standard arguments like they are used to, but allow users to "build" more complicated/more customized optimizers as they need. Also, it would be nice if an optimizer could be "tuned" to a particular problem (i.e. have a piece of code that tries several algorithms and parameter values to see which is fastest.)
Exactly.
My immediate goal is to try and get the interface and module structure well defined so that I know where to put the pieces of my Broyden code when I rip it apart.
I can help you with it :) One question about coupling: Useful criteria for globally convergent
algorithms include testing the gradients and/or curvature of the function.
Simple, the criterion takes, for the moment, the current iteration number, the former value, the current value, same for the parameters. It can be modified to add the gradient if needed - I think the step would be a better choice ? - In the Broyden algorithm, for example, these would be
maintained by the "step" object,
but the criteria object would need
to access these.
Access what exactly ? Likewise, if a "function" object can compute its
own derivatives, then the "ceriterion" object should access it from there.
I don't think that the criterion need to access this, because it would mean it knows more than it should, from an object-oriented point of view, but this can be discussed :) Any ideas on how to deal with these couplings? Perhaps the
"function" object should maintain all the state (approximate jacobians etc.).
I don't think so, the function provides methods to compute gradient, hessian, ... but only the step object knows what to do with it : approximate a hessian, what was already approximated, ... A step object is associated with one optimizer, a function object can be optimized several times. If it has a state, it couldn't be used with several optimizers without reinitializing it, and it is not intuitive enough. I've thinking about a correct architecture for several months now, and that is what I think is a good one : - a function to optimize that provides some method to compute the cost, the gradient, hessian, ... only basic stuff - an object that is responsible for the optimization, the glue between all modules -> optimizer - an object that tells if the optimization has converged. It needs the current iteration number, several last values, parameters, perhaps other things, but these things should be discussed - an object that computes a new step, takes a function to optimize, can have a state - to compute approximate hessian or inverse hessian - a line search that can find a new candidate - section method, damped method, no method at all, with a state (Marquardt), ... With these five objects, I _think_ every unconstrained method can be expressed. For the constraints, I suppose the step and the line search should be adapted, but no other module needs to be touched. I implemented the golden and fibonacci section, it's pretty straightforward to add other line searches or steps, I'll try to add some before I submit it on TRAC. Matthieu
On 14 Apr 2007, at 1:51 PM, Matthieu Brucher wrote:
One question about coupling: Useful criteria for globally convergent algorithms include testing the gradients and/or curvature of the function.
Simple, the criterion takes, for the moment, the current iteration number, the former value, the current value, same for the parameters. It can be modified to add the gradient if needed - I think the step would be a better choice ?
In the Broyden algorithm, for example, these would be maintained by the "step" object,
but the criteria object would need to access these.
Access what exactly ?
"these == gradient/hessian information" The criterion needs access to this information, but the question is: who serves it? If the "function" can compute these, then it should naturally serve this information. With the Broyden method, you suggest that the "step" would serve this information. Thus, there are two objects (depending on the choice of method) that maintain and provide gradient information. After thinking about this some more, I am beginning to like the idea that only the "function" object be responsible for the Jacobian. If the function can compute the Jacobian directly: great, use a newton- like method. If it can't, then do its best to approximate it (i.e. the "Broyden" part of the algorithm would be encoded in the function object rather than the step object." The "function" object alone then serves up information about the value of the function at a given point, as well as the gradient and hessian at that point (either exact or approximate) to the criterion, step, and any other objects that need it.
Likewise, if a "function" object can compute its own derivatives, then the "ceriterion" object should access it from there.
I don't think that the criterion need to access this, because it would mean it knows more than it should, from an object-oriented point of view, but this can be discussed :)
Certain termination criteria need access to the derivatives to make sure that they terminate. It would query the function object for this information. Other criteria may need to query the "step" object to find out the size of the previous steps. The "criterion" should not maintain any of these internally, just rely on the values served by the other objects: this does not break the encapsulation, it just couples the objects more tightly, but sophisticated criteria need this coupling.
Any ideas on how to deal with these couplings? Perhaps the "function" object should maintain all the state (approximate jacobians etc.).
I don't think so, the function provides methods to compute gradient, hessian, ... but only the step object knows what to do with it : approximate a hessian, what was already approximated, ... A step object is associated with one optimizer, a function object can be optimized several times. If it has a state, it couldn't be used with several optimizers without reinitializing it, and it is not intuitive enough.
The "function" object maintains all the information known about the function: how to compute the function, how to compute/approximate derivatives etc. If the user does not supply code for directly computing derivatives, but wants to use an optimization method that makes use of gradient information, then the function object should do its best to provide approximate information. The essence behind the Broyden methods is to approximate the Jacobian information in a clever and cheap way. I really think the natural place for this is in the "function" object, not the "step".
I've thinking about a correct architecture for several months now, and that is what I think is a good one : - a function to optimize that provides some method to compute the cost, the gradient, hessian, ... only basic stuff - an object that is responsible for the optimization, the glue between all modules -> optimizer - an object that tells if the optimization has converged. It needs the current iteration number, several last values, parameters, perhaps other things, but these things should be discussed - an object that computes a new step, takes a function to optimize, can have a state - to compute approximate hessian or inverse hessian - a line search that can find a new candidate - section method, damped method, no method at all, with a state (Marquardt), ...
With these five objects, I _think_ every unconstrained method can be expressed. For the constraints, I suppose the step and the line search should be adapted, but no other module needs to be touched.
Please describe how you think the Broyden root-finding method would fit within this scheme. Which object would maintain the state of the approximate Jacobian? Michael.
I suspected it would become more problematic to decouple everything, but not that soon :) "these == gradient/hessian information"
The criterion needs access to this information, but the question is: who serves it? If the "function" can compute these, then it should naturally serve this information. With the Broyden method, you suggest that the "step" would serve this information. Thus, there are two objects (depending on the choice of method) that maintain and provide gradient information.
I'll look in the litterature for the Broyden method, if I see the whole algorithm, I think I'll be able to answer your questions ;) After thinking about this some more, I am beginning to like the idea
that only the "function" object be responsible for the Jacobian. If the function can compute the Jacobian directly: great, use a newton- like method. If it can't, then do its best to approximate it (i.e. the "Broyden" part of the algorithm would be encoded in the function object rather than the step object."
I think that if that if the function knows, on its own, how to compute the Jacobian, the hessian, ... it should provide it. When it does not, it shouldn't be the man sitting on the computer that modifies its function to add a Broyden algorithm to the function object. He sould only say to the optimizer that the function does not compute the Jacobian by using another module. What module ? That is a question for later. The goal of this is to have a clean architecture, and adding a way to compute something directly in the function, something that is not dependent on the function, but on the step, is not a good thing. The "function" object alone then serves up information about the
value of the function at a given point, as well as the gradient and hessian at that point (either exact or approximate) to the criterion, step, and any other objects that need it.
I'm OK with it as long as it is not an approximation algorithm that is based on gradient, ... to compute for instance the hessian. Such an approximation algorithm is generic, and as such it should be put in another module or in a function superclass.
I don't think that the criterion need to access this, because it
would mean it knows more than it should, from an object-oriented point of view, but this can be discussed :)
Certain termination criteria need access to the derivatives to make sure that they terminate. It would query the function object for this information. Other criteria may need to query the "step" object to find out the size of the previous steps.
The step is not the good one, it's the line search object goal to find the correct step size, and such intel is given back to the optimizer core, because there, everything is saved - everything should be saved with a call to recordHistory -. What could be done is that every object - step or line search - returns along with the result - the result being the step, the new candidate, ... - a dictionnary with such values. In that case, the criterion can choose what it needs directly inside it. The "criterion" should
not maintain any of these internally, just rely on the values served by the other objects: this does not break the encapsulation, it just couples the objects more tightly, but sophisticated criteria need this coupling.
For the moment, the state was not in the criterion, one cannot know how any time it could be called inside an optimizer. This state is maintained by the optimizer itself - contains the last 2 values, the last 2 sets of parameters -, but I suppose that if we have the new candidate, the step and its size, those can be removed, and so the dictionary chooses what it needs.
I don't think so, the function provides methods to compute
gradient, hessian, ... but only the step object knows what to do with it : approximate a hessian, what was already approximated, ... A step object is associated with one optimizer, a function object can be optimized several times. If it has a state, it couldn't be used with several optimizers without reinitializing it, and it is not intuitive enough.
The "function" object maintains all the information known about the function: how to compute the function, how to compute/approximate derivatives etc. If the user does not supply code for directly computing derivatives, but wants to use an optimization method that makes use of gradient information, then the function object should do its best to provide approximate information. The essence behind the Broyden methods is to approximate the Jacobian information in a clever and cheap way.
That would mean that it can have a state, I really do not support this approach. The Broyden _is_ a way to get a step from a function that does not give some intell - Jacobian, for instance -, so it is not a function thing, it is a step mode. I really think the natural place for this is in the "function"
object, not the "step".
With these five objects, I _think_ every unconstrained method can be expressed. For the constraints, I suppose the step and the line search should be adapted, but no other module needs to be touched.
Please describe how you think the Broyden root-finding method would fit within this scheme. Which object would maintain the state of the approximate Jacobian?
No problem, I'll check in my two optimization books this evening provided I have enough time - I'm a little late in some important work projects :| - Thanks for your patience and your will to create generic optimizers :) Matthieu
I'll look in the litterature for the Broyden method, if I see the whole algorithm, I think I'll be able to answer your questions ;)
As I understand the Broyden update, the whole trick is that you don't need the precise Jacobian and it is subsequently approximated at every iteration. So all that is needed (and in my problems actually all that is available) is the function value. Ondrej
2007/4/16, Ondrej Certik <ondrej@certik.cz>:
I'll look in the litterature for the Broyden method, if I see the whole algorithm, I think I'll be able to answer your questions ;)
As I understand the Broyden update, the whole trick is that you don't need the precise Jacobian and it is subsequently approximated at every iteration. So all that is needed (and in my problems actually all that is available) is the function value.
Ondrej
That's what I understood from the discussion - well, every Quasi-Newton algorithm does this to some extent -, but I'll check to propose a full example. Matthieu
On 16 Apr 2007, at 9:07 AM, Matthieu Brucher wrote:
I'll look in the litterature for the Broyden method, if I see the whole algorithm, I think I'll be able to answer your questions ;)
Basically, and approximated Jacobian is used to determine the step direction and/or size (depending on the "step" module etc.) The key to the Broyden approach is that the information about F(x+dx) is used to update the approximate Jacobian (think multidimensional secant method) in a clever way without any additional function evaluations (there is not a unique way to do this and some choices work better than others). Thus, think of Broyden methods as a Quasi-Newton methods but with a cheap and very approximate Jacobian (hence, one usually uses a robust line search method to make sure that one is always descending).
After thinking about this some more, I am beginning to like the idea that only the "function" object be responsible for the Jacobian. If the function can compute the Jacobian directly: great, use a newton- like method. If it can't, then do its best to approximate it (i.e. the "Broyden" part of the algorithm would be encoded in the function object rather than the step object."
I think that if that if the function knows, on its own, how to compute the Jacobian, the hessian, ... it should provide it. When it does not, it shouldn't be the man sitting on the computer that modifies its function to add a Broyden algorithm to the function object. He sould only say to the optimizer that the function does not compute the Jacobian by using another module. What module ? That is a question for later. The goal of this is to have a clean architecture, and adding a way to compute something directly in the function, something that is not dependent on the function, but on the step, is not a good thing.
My view is that the person sitting at the computer does one of the following things:
F1 = Function(f) F2 = Function(f,opts) F3 = Function(f,df,ddf,opts) etc.
In this first case, the object F1 can compute f(x), and will use finite differences or some more complicated method to compute derivatives df(x) and ddf(x) if required by the optimization algorithm. In F2, the user provides options that specify options about how to do these computations (for example, step size h, should a centred difference be used? Perhaps the function is cheap and a Richardson extrapolation should be used for higher accuracy. If f is analytic and supports complex arguments, then the difference step should be h=eps*1j. Maybe f has been implemented using an automatic differentiation library etc. Just throwing out ideas here...) In the third case, the user has explicitly provided functions to compute the Jacobian etc. so these will be used (unless the user specifies otherwise). In any case, all of the functors F1, F2 and F3 can be passed to various "optimizers". There would be a set of modules behind the interface provided by Function() that implement these various techniques for computing and/or estimating the derivatives, including the Broyden method. The user sitting at the computer does nothing other than select from a set of options (opts) what methods he wants the library to use. Note, the user could pass explicit things to Function() too, like a custom function that computes numerical derivatives.
The "function" object alone then serves up information about the value of the function at a given point, as well as the gradient and hessian at that point (either exact or approximate) to the criterion, step, and any other objects that need it.
I'm OK with it as long as it is not an approximation algorithm that is based on gradient, ... to compute for instance the hessian. Such an approximation algorithm is generic, and as such it should be put in another module or in a function superclass.
A Function "superclass" is what I had in mind.
... Certain termination criteria need access to the derivatives to make sure that they terminate. It would query the function object for this information. Other criteria may need to query the "step" object to find out the size of the previous steps.
The step is not the good one, it's the line search object goal to find the correct step size, and such intel is given back to the optimizer core, because there, everything is saved - everything should be saved with a call to recordHistory -. What could be done is that every object - step or line search - returns along with the result - the result being the step, the new candidate, ... - a dictionnary with such values. In that case, the criterion can choose what it needs directly inside it.
Yes, it seems that the optimizer should maintain information about the history. The question I have is about the flow of information: I imagine that the criterion object should be able to query the optimization object for the information that it needs. We should define an interface of things that the optimizer can serve up to the various components. This interface can be extended as required to support more sophisticated algorithms.
The "criterion" should not maintain any of these internally, just rely on the values served by the other objects: this does not break the encapsulation, it just couples the objects more tightly, but sophisticated criteria need this coupling.
For the moment, the state was not in the criterion, one cannot know how any time it could be called inside an optimizer. This state is maintained by the optimizer itself - contains the last 2 values, the last 2 sets of parameters -, but I suppose that if we have the new candidate, the step and its size, those can be removed, and so the dictionary chooses what it needs.
I don't think so, the function provides methods to compute gradient, hessian, ... but only the step object knows what to do with it : approximate a hessian, what was already approximated, ... A step object is associated with one optimizer, a function object can be optimized several times. If it has a state, it couldn't be used with several optimizers without reinitializing it, and it is not intuitive enough.
The "function" object maintains all the information known about the function: how to compute the function, how to compute/approximate derivatives etc. If the user does not supply code for directly computing derivatives, but wants to use an optimization method that makes use of gradient information, then the function object should do its best to provide approximate information. The essence behind the Broyden methods is to approximate the Jacobian information in a clever and cheap way.
That would mean that it can have a state, I really do not support this approach. The Broyden _is_ a way to get a step from a function that does not give some intell - Jacobian, for instance -, so it is not a function thing, it is a step mode.
I disagree. I think of the Broyden algorithm as a way of maintaining the Jacobian. The way to get the step is independent of this, though it may use the Jacobian information to help it. The Broyden part of the algorithm is solely to approximate the Jacobian cheaply.
... Please describe how you think the Broyden root-finding method would fit within this scheme. Which object would maintain the state of the approximate Jacobian?
No problem, I'll check in my two optimization books this evening provided I have enough time - I'm a little late in some important work projects :| -
Maybe I will see things differently when you do this, but I am pretty convinced right now that the Function() object is the best place for the Broyden part of the algorithm. Michael. P.S. No hurry. I might also disappear from time to time when busy;-)
Basically, and approximated Jacobian is used to determine the step direction and/or size (depending on the "step" module etc.)
The key to the Broyden approach is that the information about F(x+dx) is used to update the approximate Jacobian (think multidimensional secant method) in a clever way without any additional function evaluations (there is not a unique way to do this and some choices work better than others).
Thus, think of Broyden methods as a Quasi-Newton methods but with a cheap and very approximate Jacobian (hence, one usually uses a robust line search method to make sure that one is always descending).
I read some doc, it seems Broyden is a class of different steps, am I wrong ? and it tries to approximate the Hessian of the function. My view is that the person sitting at the computer does one of the
following things:
F1 = Function(f) F2 = Function(f,opts) F3 = Function(f,df,ddf,opts) etc.
OK, that is not what a function is :) A function is the set of f, df, ddf but not with the options. What you are exposing is the construction of an optimizer ;) Did you see the code in my different proposals ? In fact, you have a function class - for instance Rosenbrock class - that defines several methods, like gradient, hessian, ... without a real state - a real state being something other than for instead the number of dimension for the rosenbrock function, the points that need to be approximated, ... a real state is something that is dependent of the subsequent calls to the functor, the gradient, ... - so that this function can be reused efficiently. Then you use an instance of this class to be optimized. You choose your step mode with its parameters like gradient, conjugate gradient, Newton, Quasi-Newton, ... You choose your line search with its own parameters - tolerance, ... - like section methods, interpolation methods, ... Actually, you choose your stopping criterion. Then you make something like : optimizer = StandardOptimizer(function = myFunction, step = myStep, .......) optimizer->optimize() That is a modular design, and that is why some basic functions must be provided so that people that don't care about the underlying design really do not have to care. Then if someone wants a specific, non-standard optimizer, one just have to select the wanted modules - for instance, a conjugate-gradient with a golden-section line search and a relative value criterion onstead of a fibonacci search and an absolute criterion -. It can be more cumbersome at the start, but once some modules are made, assembling them will be more easy, and tests will be more fun :) In this first case, the object F1 can compute f(x), and will use
finite differences or some more complicated method to compute derivatives df(x) and ddf(x) if required by the optimization algorithm. In F2, the user provides options that specify options about how to do these computations (for example, step size h, should a centred difference be used? Perhaps the function is cheap and a Richardson extrapolation should be used for higher accuracy. If f is analytic and supports complex arguments, then the difference step should be h=eps*1j. Maybe f has been implemented using an automatic differentiation library etc. Just throwing out ideas here...)
Yes, that's exactly an optimizer, not a function ;) A Function "superclass" is what I had in mind. As I said, that would make the function a state function, so this instance must not be shared among optimizers, so it is more error prone :( Yes, it seems that the optimizer should maintain information about
the history. The question I have is about the flow of information: I imagine that the criterion object should be able to query the optimization object for the information that it needs. We should define an interface of things that the optimizer can serve up to the various components. This interface can be extended as required to support more sophisticated algorithms.
If each module returns a tuple with the result and a set of parameters used and if the criterion gets all these sets in a dictionary, it will be able to cope with more specific cases of steps or line searches. For the communication between the optimizer and the modules, the only communication is 'I want this and this' from the optimizer, and the rest is defined when instantiating the different modules. For instance, the line search doesn't need to know how the step is computed, and in fact neither does the optimizer. The step knows the function; knows what it needs from it, if it is not provided, the optimization should fail.
That would mean that it can have a state, I really do not support
this approach. The Broyden _is_ a way to get a step from a function that does not give some intell - Jacobian, for instance -, so it is not a function thing, it is a step mode.
I disagree. I think of the Broyden algorithm as a way of maintaining the Jacobian. The way to get the step is independent of this, though it may use the Jacobian information to help it. The Broyden part of the algorithm is solely to approximate the Jacobian cheaply.
Broyden algorithm is a step, the litterature states it as a step, nothing else. Think of 3D computer graphism. Some graphic cards - function - provide some functions, other don't. When they are needed, the driver - the step or the line search - provides an software approach - an approximation -, but do not modify the GPU to add the functions. Here, it is exactly the same, a step is an adapter to provide a step, but never modifies the function. Maybe I will see things differently when you do this, but I am pretty
convinced right now that the Function() object is the best place for the Broyden part of the algorithm.
And now ? :) For instance, the conjugate gradient must remember the last step. It is exactly like Broyden algorithm, it is only simplier, but it has a state. If the last gradient were to be stored inside the function and if this function were to be optimized again with another starting point, the first step would be wrong... If I have some time, I'll program the FR conjugate-gradient step so that you can see how it is made ;) Michael.
P.S. No hurry. I might also disappear from time to time when busy;-)
Me too ;) Matthieu
On 16 Apr 2007, at 12:27 PM, Matthieu Brucher wrote:
Basically, and approximated Jacobian is used to determine the step direction and/or size (depending on the "step" module etc.)
The key to the Broyden approach is that the information about F(x+dx) is used to update the approximate Jacobian (think multidimensional secant method) in a clever way without any additional function evaluations (there is not a unique way to do this and some choices work better than others).
Thus, think of Broyden methods as a Quasi-Newton methods but with a cheap and very approximate Jacobian (hence, one usually uses a robust line search method to make sure that one is always descending).
I read some doc, it seems Broyden is a class of different steps, am I wrong ? and it tries to approximate the Hessian of the function.
I am thinking of root finding G(x) = 0 right now rather than optimization (I have not thought about the optimization problem yet). The way the Broyden root finder works is that you start with an approximate Jacobian J0 (you could start with the identity for example). So, start with an approximate J0 at position x0. G0 = G(x0) dx = -inv(J0)*G0 # This is a Quasi-Newton step. Use your favourite step method here... x1 = x0 + dx G1 = G(x1) Now the Broyden part comes in. It computes a new approximate Jacobian J1 at position x1 using J0, dG and dX such that J1*dx = dG This is the secant condition. There are many ways to do this in multiple dimensions and the various Broyden methods choose one of these. The most common is the BFGS choice, but there are other choices with different convergence properties. Now start over with J0 = J1 and x0 = x1 and repeat until convergence is met. Michael. P.S. More comments to follow.
You might want to check my email (couple of days ago) about the broyden methods together with a python code and tests. I didn't have time to implement it in scipy yet, but you can already use it. Ondrej On 4/16/07, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
On 16 Apr 2007, at 12:27 PM, Matthieu Brucher wrote:
Basically, and approximated Jacobian is used to determine the step direction and/or size (depending on the "step" module etc.)
The key to the Broyden approach is that the information about F(x+dx) is used to update the approximate Jacobian (think multidimensional secant method) in a clever way without any additional function evaluations (there is not a unique way to do this and some choices work better than others).
Thus, think of Broyden methods as a Quasi-Newton methods but with a cheap and very approximate Jacobian (hence, one usually uses a robust line search method to make sure that one is always descending).
I read some doc, it seems Broyden is a class of different steps, am I wrong ? and it tries to approximate the Hessian of the function.
I am thinking of root finding G(x) = 0 right now rather than optimization (I have not thought about the optimization problem yet). The way the Broyden root finder works is that you start with an approximate Jacobian J0 (you could start with the identity for example).
So, start with an approximate J0 at position x0.
G0 = G(x0) dx = -inv(J0)*G0 # This is a Quasi-Newton step. Use your favourite step method here... x1 = x0 + dx G1 = G(x1)
Now the Broyden part comes in. It computes a new approximate Jacobian J1 at position x1 using J0, dG and dX such that
J1*dx = dG
This is the secant condition. There are many ways to do this in multiple dimensions and the various Broyden methods choose one of these. The most common is the BFGS choice, but there are other choices with different convergence properties.
Now start over with J0 = J1 and x0 = x1 and repeat until convergence is met.
Michael.
P.S. More comments to follow. _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
I'll check them today, thanks for the tests too Matthieu 2007/4/16, Ondrej Certik <ondrej@certik.cz>:
You might want to check my email (couple of days ago) about the broyden methods together with a python code and tests. I didn't have time to implement it in scipy yet, but you can already use it.
Ondrej
On 4/16/07, Michael McNeil Forbes <mforbes@physics.ubc.ca> wrote:
On 16 Apr 2007, at 12:27 PM, Matthieu Brucher wrote:
Basically, and approximated Jacobian is used to determine the step direction and/or size (depending on the "step" module etc.)
The key to the Broyden approach is that the information about F(x+dx) is used to update the approximate Jacobian (think multidimensional secant method) in a clever way without any additional function evaluations (there is not a unique way to do this and some choices work better than others).
Thus, think of Broyden methods as a Quasi-Newton methods but with a cheap and very approximate Jacobian (hence, one usually uses a robust line search method to make sure that one is always descending).
I read some doc, it seems Broyden is a class of different steps, am I wrong ? and it tries to approximate the Hessian of the function.
I am thinking of root finding G(x) = 0 right now rather than optimization (I have not thought about the optimization problem yet). The way the Broyden root finder works is that you start with an approximate Jacobian J0 (you could start with the identity for example).
So, start with an approximate J0 at position x0.
G0 = G(x0) dx = -inv(J0)*G0 # This is a Quasi-Newton step. Use your favourite step method here... x1 = x0 + dx G1 = G(x1)
Now the Broyden part comes in. It computes a new approximate Jacobian J1 at position x1 using J0, dG and dX such that
J1*dx = dG
This is the secant condition. There are many ways to do this in multiple dimensions and the various Broyden methods choose one of these. The most common is the BFGS choice, but there are other choices with different convergence properties.
Now start over with J0 = J1 and x0 = x1 and repeat until convergence is met.
Michael.
P.S. More comments to follow. _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
Okay, I think we are thinking similar things with different terminology: I think you are saying that only one object should maintain state (your "optimizer") (I was originally sharing the state which I agree can cause problems). If so, I agree, but to me it seems that object should be called a "partially optimized function". I think of an "optimizer" as something which modifies state rather than something that maintains state. I am thinking of code like: ------ roughly_locate_minimum = Optimizer (criterion=extremelyWeakCriterion,step=slowRobustStep,...) find_precise_minimum = Optimizer (criterion=preciseCriterion,step=fasterStep,...) f = Rosenbrock(...) x0 = ... f_min = OptimizedFunction(f,x0) f_min.optimize(optimizer=roughly_locate_minimum) f_min.optimize(optimizer=find_precise_minimum) #OR (this reads better to me, but the functions should return copies of f_min, so may not be desirable for performance reasons) f_min = roughly_locate_minimum(f_min) f_min = find_precise_minimum(f_min) # Then one can query f_min for results: print f_min.x # Best current approximation to optimum print f_min.f print f_min.err #Estimate error print f_min.df # etc... ----- The f_min object keeps track of all state, can be passed from one optimizer to another, etc. In my mind, it is simply an object that has accumulated information about a function. The idea I have in mind is that f is extremely expensive to compute, thus the object with state f_min accumulates more and more information as it goes along. Ultimately this information could be used in many ways, for example: - f_min could keep track of roughly how long it takes to compute f (x), thus providing estimates of the time required to complete a calculation. - f_min could keep track of values and use interpolation to provide fast guesses etc. Does this mesh with your idea of an "optimizer"? I think it is strictly equivalent, but looking at the line of code "optimizer.optimize()" is much less useful to me than "f_min.optimize (optimizer=...)". What would your ideal "user" code look like for the above use-case? I will try to flesh out a more detailed structure for the OptimizedFunction class, Michael. On 16 Apr 2007, at 12:27 PM, Matthieu Brucher wrote:
... OK, that is not what a function is :) A function is the set of f, df, ddf but not with the options. What you are exposing is the construction of an optimizer ;)
Did you see the code in my different proposals ? In fact, you have a function class - for instance Rosenbrock class - that defines several methods, like gradient, hessian, ... without a real state - a real state being something other than for instead the number of dimension for the rosenbrock function, the points that need to be approximated, ... a real state is something that is dependent of the subsequent calls to the functor, the gradient, ... - so that this function can be reused efficiently. Then you use an instance of this class to be optimized. You choose your step mode with its parameters like gradient, conjugate gradient, Newton, Quasi-Newton, ... You choose your line search with its own parameters - tolerance, ... - like section methods, interpolation methods, ... Actually, you choose your stopping criterion. Then you make something like : optimizer = StandardOptimizer(function = myFunction, step = myStep, .......) optimizer->optimize()
That is a modular design, and that is why some basic functions must be provided so that people that don't care about the underlying design really do not have to care. Then if someone wants a specific, non-standard optimizer, one just have to select the wanted modules - for instance, a conjugate-gradient with a golden- section line search and a relative value criterion onstead of a fibonacci search and an absolute criterion -.
It can be more cumbersome at the start, but once some modules are made, assembling them will be more easy, and tests will be more fun :) ...
2007/4/18, Michael McNeil Forbes <mforbes@physics.ubc.ca>:
Okay, I think we are thinking similar things with different terminology:
Yes, I think that too. I think you are saying that only one object should maintain state
(your "optimizer") (I was originally sharing the state which I agree can cause problems). If so, I agree, but to me it seems that object should be called a "partially optimized function". I think of an "optimizer" as something which modifies state rather than something that maintains state.
Well, in fact the real object that has a state is the step, the optimizer could have a state, but I do not currently use that approach. I'll think about the dependencies that having an state optimizer only would it lead to. That would mean that each call to the step, the criterion or the line search would take another parameter, the state of the optimizer. Let say it's a dict. Each object would take and modify some values, and in fact that is what I pass to the recordHistory function - I think I'll make it again record_history to be scipy coding standard compliant -, so there are not much trouble to do this. I am thinking of code like:
------ roughly_locate_minimum = Optimizer (criterion=extremelyWeakCriterion,step=slowRobustStep,...) find_precise_minimum = Optimizer (criterion=preciseCriterion,step=fasterStep,...)
f = Rosenbrock(...) x0 = ...
f_min = OptimizedFunction(f,x0) f_min.optimize(optimizer=roughly_locate_minimum) f_min.optimize(optimizer=find_precise_minimum)
#OR (this reads better to me, but the functions should return copies of f_min, so may not be desirable for performance reasons) f_min = roughly_locate_minimum(f_min) f_min = find_precise_minimum(f_min)
# Then one can query f_min for results: print f_min.x # Best current approximation to optimum print f_min.f print f_min.err #Estimate error print f_min.df # etc... -----
The f_min object keeps track of all state, can be passed from one optimizer to another, etc. In my mind, it is simply an object that has accumulated information about a function.
You mean you would want to possibly share the state between optimizers ? The idea I have in
mind is that f is extremely expensive to compute, thus the object with state f_min accumulates more and more information as it goes along.
Well, a part of this is done my the recordHistory, but I don't think that saving every whole state in f_min is a good idea from a memory point of view. Why not saving the last state, with every needed values ? For instance, the old and new values, the old and new parameters, the old and new step, the new gradient, ... I think the number iterations should be there as well ;) Ultimately this information could be used in many ways, for
example:
- f_min could keep track of roughly how long it takes to compute f (x), thus providing estimates of the time required to complete a calculation. - f_min could keep track of values and use interpolation to provide fast guesses etc.
Does this mesh with your idea of an "optimizer"? I think it is strictly equivalent, but looking at the line of code "optimizer.optimize()" is much less useful to me than "f_min.optimize (optimizer=...)".
What would your ideal "user" code look like for the above use-case?
Well, not exactly, it would be almost like it. I do not know what you want to put in OptimizedFunction and what is its role exactly. My ideal code is straightforward - in fact it's more the current code - : f = Rosenbrock(...) x0 = ... roughly_locate_minimum_optimizer = StandardOptimizer(function = f, x0 = x0, step = Step.SomeStep(...), lineSearch = LineSearch.InexactLineSearch(...), criterion = Criterion.SomeCriterion(...), record = SomeRecordingFunctionIfNeeded) local_minimum = roughly_locate_minimum_optimizer.optimize() precisely_locate_minimum_optimizer = StandardOptimizer(function = f, x0 = local_minimum, step = Step.SomeOtherStep(...), lineSearch = LineSearch.ExactLineSearch(...), criterion = Criterion.SomeOtherCriterion(...), record = SomeRecordingFunctionIfNeeded) minimum = precisely_locate_minimum_optimizer.optimize() Using the OptimizedFunction to save a state shared by different optimizers that do not save the same things could lead to borders effects difficult to track. What could be done, as I said, is to output the state at the end of the optimizer, and perhaps allowing the user to give it to a new optimizer. It would only take the part that it needs. I will try to flesh out a more detailed structure for the
OptimizedFunction class, Michael.
I'm looking foward to see this ;) Matthieu
On Apr 14, 2007, at 11:37 AM, Michael McNeil Forbes wrote:
P.S. What is the best way to share code ideas on the wiki? Small bits work well inline, but for larger chunks it would be nice to be able to attach files. Unfortunately none of the wiki's deals well with attached files (no versioning and sometimes no way of modifying the file).
In Trac, you can check things into the associated repository and then use source: and diff: links.
participants (9)
-
Alan G Isaac -
Alan Isaac -
dmitrey -
Jonathan Guyer -
Matthieu Brucher -
Michael McNeil Forbes -
Ondrej Certik -
Pierre GM -
Robert Kern