Normalization for optimization in python
Dear all, During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)). My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me? Thank you! -Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj@virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/
Shawn, I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly normalize the input before passing it to the optimizer. I usually do it by writing my function as g(x,norm) == f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve rigorous convergence by multiplying norm by the result of optimization and iterating, but convergence behavior far from a minimum may highly depend both on what you choose as your initial guess and what your initial normalization factor is. Eric On 1/26/2014 3:01 PM, Yuxiang Wang wrote:
Dear all,
During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)).
My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me?
Thank you!
-Shawn
Hi Eric, Great! Thanks a lot for the confirmation. That is really helpful. -Shawn On Sun, Jan 26, 2014 at 7:01 PM, Eric Hermes <ehermes@chem.wisc.edu> wrote:
Shawn,
I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly normalize the input before passing it to the optimizer. I usually do it by writing my function as g(x,norm) == f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve rigorous convergence by multiplying norm by the result of optimization and iterating, but convergence behavior far from a minimum may highly depend both on what you choose as your initial guess and what your initial normalization factor is.
Eric
On 1/26/2014 3:01 PM, Yuxiang Wang wrote:
Dear all,
During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)).
My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me?
Thank you!
-Shawn
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj@virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/
Hi, Eric. I have a problem with my Jacobian matrices being non-homogeneous due to being derived from variables of different units, such as when combining rotation and translation parameters. Besides scaling the initial value by a factor, do you have an example of how to scale down the pertaining Jacobian matrices? Thanks I advance! Carlos -- View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimizat... Sent from the Scipy-User mailing list archive at Nabble.com.
Well, I guess I am not Eric but I did started this whole conversation :) Question: if your Jacobian is numerically computed (as done automatically by the minimize() function), shouldn't it be already normalized, if we already normalized the independent variable being passed into? Shawn On Sat, Apr 25, 2015 at 2:02 AM, cjaramillo <cjaramillo@gradcenter.cuny.edu> wrote:
Hi, Eric.
I have a problem with my Jacobian matrices being non-homogeneous due to being derived from variables of different units, such as when combining rotation and translation parameters. Besides scaling the initial value by a factor, do you have an example of how to scale down the pertaining Jacobian matrices?
Thanks I advance!
Carlos
-- View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimizat... Sent from the Scipy-User mailing list archive at Nabble.com. _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj@virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/
Yuxiang Wang wrote
Well, I guess I am not Eric but I did started this whole conversation :)
Question: if your Jacobian is numerically computed (as done automatically by the minimize() function), shouldn't it be already normalized, if we already normalized the independent variable being passed into?
Shawn
On Sat, Apr 25, 2015 at 2:02 AM, cjaramillo <
cjaramillo@.cuny
> wrote:
Hi, Eric.
I have a problem with my Jacobian matrices being non-homogeneous due to being derived from variables of different units, such as when combining rotation and translation parameters. Besides scaling the initial value by a factor, do you have an example of how to scale down the pertaining Jacobian matrices?
Thanks I advance!
Carlos
-- View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimizat... Sent from the Scipy-User mailing list archive at Nabble.com. _______________________________________________ SciPy-User mailing list
SciPy-User@
-- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia
yw5aj@
+1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ _______________________________________________ SciPy-User mailing list
SciPy-User@
I'm providing the Jacobian functions to be used in the optimization. It's faster this way. I found a couple of articles doing some normalization of Jacobian matrices of non-homogeneous parameters. Any examples with scipy woukd be appreciated. I'm trying to grasp the normalization methos for the Jacobian and they seem to be a simple diagonal matrix multiplication. -- View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimizat... Sent from the Scipy-User mailing list archive at Nabble.com.
On 25 April 2015 at 08:02, cjaramillo <cjaramillo@gradcenter.cuny.edu> wrote:
I have a problem with my Jacobian matrices being non-homogeneous due to being derived from variables of different units, such as when combining rotation and translation parameters. Besides scaling the initial value by a factor, do you have an example of how to scale down the pertaining Jacobian matrices?
Scaling factors can be error prone when you have to propagate them. Usually, the safest way is to blame it on the units. So, in your case, you define the rotation angles in radians and the typical displacement to be 1 in your units (assuming the rotations are big, so around 1 rad). This technique also works with any other mathematical problems, like differential equations or linear systems. /David.
Daπid wrote
On 25 April 2015 at 08:02, cjaramillo <
cjaramillo@.cuny
> wrote:
Scaling factors can be error prone when you have to propagate them. Usually, the safest way is to blame it on the units. So, in your case, you define the rotation angles in radians and the typical displacement to be 1 in your units (assuming the rotations are big, so around 1 rad).
Thanks, David. However, you can still run into biased convergences even when all variables would be under the same units. For example, when you have a parameter that translates in the range from 0 to 10 [mm] and others that translate in the hundreds of [mm]. I'm searching on the web more about this, but I would appreciate you can refer me to a source where they employ unit-based scaling (normalization) of Jacobian matrices for their use within optimization. -- View this message in context: http://scipy-user.10969.n7.nabble.com/SciPy-User-Normalization-for-optimizat... Sent from the Scipy-User mailing list archive at Nabble.com.
On 27 April 2015 at 14:40, cjaramillo <cjaramillo@gradcenter.cuny.edu> wrote:
Thanks, David. However, you can still run into biased convergences even when all variables would be under the same units. For example, when you have a parameter that translates in the range from 0 to 10 [mm] and others that translate in the hundreds of [mm].
That is still not a problem. 10 mm and 500 mm are the same amount (more or less). The whole purpose of scaling is to minimise numerical inaccuracies when adding or subtracting numbers. But since the resolution of a double is 1e-15, you need nine orders of magnitude difference before you get accuracies comparable with perfectly scaled float (1e-6) (waiving my hands violently here). Note that, due to the nature of the optimisation problems, the numerical noise in most "simple" functions will get swamped by the procedure, unless you are interested in an extremely accurate result on a finely crafted function. You can check on your case and compare with and without the scaling, without providing the gradient for simplicity. I'm searching on the web more about this,
but I would appreciate you can refer me to a source where they employ unit-based scaling (normalization) of Jacobian matrices for their use within optimization.
This is usually called "natural units" or "nondimensionalisation". You will perhaps find more relevant hits in the context of differential equations. I don't know from the top of my head of any reference, but this should be covered in most numerical analysis texts. /David.
participants (4)
-
cjaramillo
-
Daπid
-
Eric Hermes
-
Yuxiang Wang