Amir Beck’s INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB Excise 5.2
INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB. Amir Beck. 2014
本文主要涉及题目(ii)的MATLAB部分,
题目(i): https://blog.csdn.net/IYXUAN/article/details/121454946
实验目的
- 掌握回溯法梯度下降、阻尼牛顿法和混合牛顿法;
- 学会使用MATLAB,通过上述三种方法求解优化问题;
- 了解这三种方法的优劣,并能够分析结果产生的原因。
实验环境
MATLAB R2021a
实验内容
Consider the Freudenstein and Roth test function
f(x)=f1(x)2+f2(x)2,x∈R2,f\left(x\right)=f_1\left(x\right)^2+f_2\left(x\right)^2,x\in \mathbb{R}^2,f(x)=f1(x)2+f2(x)2,x∈R2,
where
f1(x)=−13+x1+((5−x2)x2−2)x2,f_1\left(x\right)=-13+x_1+\left(\left(5-x_2\right)x_2-2\right)x_2\ ,f1(x)=−13+x1+((5−x2)x2−2)x2 ,
f2(x)=−29+x1+((x2+1)x2−14)x2.f_2\left(x\right)=-29+x_1+\left(\left(x_2+1\right)x_2-14\right)x_2\ .f2(x)=−29+x1+((x2+1)x2−14)x2 .
Use MATLAB to employ the following three methods on the problem of minimizing fff :
- the gradient method with backtracking and parameters (s,α,β)=(1,0.5,0.5)\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)(s,α,β)=(1,0.5,0.5).
- the hybrid Newton’s method with parameters (s,α,β)=(1,0.5,0.5)\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)(s,α,β)=(1,0.5,0.5).
- damped Gauss–Newton’s method with a backtracking line search strategy with parameters (s,α,β)=(1,0.5,0.5)\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)(s,α,β)=(1,0.5,0.5).
All the algorithms should use the stopping criteria ∣∣∇f(x)∣∣≤ϵ,ϵ=10−5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣ ∇f(x)∣∣≤ϵ,ϵ=10−5. Each algorithm should be employed four times on the following four starting points: (−50,7)T,(20,7)T,(20,−18)T,(5,−10)T\left(-50,7\right)^T,\left(20,7\right)^T,\left(20,-18\right)^T,\left(5,-10\right)^T(−50,7)T,(20,7)T,(20,−18)T,(5,−10)T. For each of the four starting points, compare the number of iterations and the point to which each method converged. If a method did not converge, explain why.
算法描述
gradient_method_backtracking
回溯法梯度下降算法描述:
设定 ϵ\epsilonϵ;
初始化 x0∈Rnx_0\in \mathbb{R}^nx0∈Rn;
FOR k = 0, 1, 2, …
- 选取下降方向 dk=g(xk)d_k=g(x_k)dk=g(xk);
- 选取步长 tkt_ktk,使得 f(xk+tkdk)<f(xk)f(x_k+t_kd_k)<f(x_k)f(xk+tkdk)<f(xk);
- 设置 xk+1=xk+tkdkx_{k+1}=x_k+t_kd_kxk+1=xk+tkdk;
- IF ∥g(xk+1)∥≤ϵ\Vert g(x_{k+1}) \Vert \le \epsilon∥g(xk+1)∥≤ϵ THEN STOP,OUTPUT xk+1x_{k+1}xk+1;
在算法循环的 2. 步长选取中,对于定步长的梯度下降来说,tkt_ktk 为一常量,即 tk=tˉt_k=\bar{t}tk=tˉ. 而对于采用回溯法的非精确线搜索来说,令步长的初值 tk=s,s>0t_k=s, s>0tk=s,s>0,更新步长 tk←βtk,β∈(0,1)t_k \gets \beta t_k, \beta \in (0,1)tk←βtk,β∈(0,1),直到满足 f(xk)−f(xk+tkdk)≥−αtk∇f(xk)Tdk,α∈(0,1)f(x_k)-f(x_k+t_kd_k)\ge -\alpha t_k\nabla f(x_k)^Td_k, \alpha \in (0,1)f(xk)−f(xk+tkdk)≥−αtk∇f(xk)Tdk,α∈(0,1),可以证明,当 t∈[0,ϵ]t\in [0,\epsilon]t∈[0,ϵ],上述不等式总成立。
newton_backtracking
回溯牛顿算法描述:
回溯牛顿法和上述梯度下降相似,只需将1. 下降方向选取 改为牛顿方向即可。
dk=−(∇2f(xk))−1∇f(xk)d_k=-\left( \nabla^2f(x_k) \right )^{-1}\nabla f(x_k)dk=−(∇2f(xk))−1∇f(xk).
newton_hybrid
混合牛顿法描述:
在选取下降方向dkd_kdk时,当 Hessian 矩阵∇2f(xk)\nabla^2f(x_k)∇2f(xk)非正定时,使用回溯法梯度下降处理,即dk=−∇f(xk)d_k=-\nabla f(x_k)dk=−∇f(xk)。当∇2f(xk)\nabla^2f(x_k)∇2f(xk)正定时,dkd_kdk可选用牛顿方向。在判定 Hessian 矩阵是否正定时,使用 Cholesky 分解 作为判定条件。
实验步骤
计算 f(x),g(x),h(x)f(x),g(x),h(x)f(x),g(x),h(x)
f(x)=2x12+12x1x22−32x1x2−84x1+2x26−8x25+2x24−80x23+12x22+864x2+1010f(x)=2x_1^2 + 12x_1x_2^2 - 32x_1x_2 - 84x_1 + 2x_2^6 - 8x_2^5 + 2x_2^4 - 80x_2^3 + 12x_2^2 + 864x_2 + 1010f(x)=2x12+12x1x22−32x1x2−84x1+2x26−8x25+2x24−80x23+12x22+864x2+1010
g(x)=(12x22−32x2+4x1−84,24x2−32x1+24x1x2−240x22+8x23−40x24+12x25+864)Tg(x)=\left ( 12x_2^2 - 32x_2 + 4x_1 - 84,24x_2 - 32x_1 + 24x_1x_2 - 240x_2^2 + 8x_2^3 - 40x_2^4 + 12x_2^5 + 864\right )^Tg(x)=(12x22−32x2+4x1−84,24x2−32x1+24x1x2−240x22+8x23−40x24+12x25+864)T
h(x)=(424x2−3224x2−3260x24−160x23+24x22−480x2+24x1+24)h(x)=\begin{pmatrix} 4 &24x_2 - 32 \\ 24x_2 - 32 &60x_2^4 - 160x_2^3 + 24x_2^2 - 480x_2 + 24x_1 + 24 \end{pmatrix}h(x)=(424x2−3224x2−3260x24−160x23+24x22−480x2+24x1+24)
clc;clear
syms f1 f2 x1 x2 f g h
f1=-13+x1+((5-x2)*x2-2)*x2;
f2=-29+x1+((x2+1)*x2-14)*x2;
f=expand(f1^2+f2^2)
% f = 2*x1^2 + 12*x1*x2^2 - 32*x1*x2 - 84*x1 + 2*x2^6 - 8*x2^5 + 2*x2^4 - 80*x2^3 + 12*x2^2 + 864*x2 + 1010
g=gradient(f)
% g = [12*x2^2 - 32*x2 + 4*x1 - 84; 24*x2 - 32*x1 + 24*x1*x2 - 240*x2^2 + 8*x2^3 - 40*x2^4 + 12*x2^5 + 864]
h=hessian(f)
% h = [4, 24*x2 - 32; 24*x2 - 32, 60*x2^4 - 160*x2^3 + 24*x2^2 - 480*x2 + 24*x1 + 24]
绘制函数的等高线和曲面图
Figure 1. contour and surface plots of f(x)f(x)f(x) around the global optimal solution x∗=(5,4)x^*=(5,4)x∗=(5,4).
Figure 2. contour and surface plots of f(x)f(x)f(x) around the local optimal solution x∗=(11.4128,−0.8968)x^*=(11.4128, -0.8968)x∗=(11.4128,−0.8968).
clc;clear
clc;clear
clc;clear
% local optimality
x1 = linspace(11.40,11.42,50);
x2 = linspace(-0.897,-0.8965,50);
% global optimality
% x1 = linspace(4.9,5.1,50);
% x2 = linspace(3.9,4.1,50);
[X1,X2] = meshgrid(x1,x2);
f = 2*X1.^2 + 12*X1.*X2.^2 - 32*X1.*X2 - 84*X1 + 2*X2.^6 - 8*X2.^5 ...+ 2*X2.^4 - 80*X2.^3 + 12*X2.^2 + 864*X2 + 1010;
createfigure(X1,X2,f)
%% 由 surfc 函数生成:
function createfigure(xdata1, ydata1, zdata1)
%CREATEFIGURE(xdata1, ydata1, zdata1)
% XDATA1: surface xdata
% YDATA1: surface ydata
% ZDATA1: surface zdata% Auto-generated by MATLAB on 08-Dec-2021 15:44:51% Create figure
figure1 = figure;% Create axes
axes1 = axes('Parent',figure1);
hold(axes1,'on');% Create surf
surf(xdata1,ydata1,zdata1,'Parent',axes1);% Create contour
contour(xdata1,ydata1,zdata1,'ZLocation','zmin');view(axes1,[-113 13]);
grid(axes1,'on');
axis(axes1,'tight');
hold(axes1,'off');
% Create colorbar
colorbar(axes1);
执行函数
clc;clear%% init
s = 1;
alpha = .5;
beta = .5;
epsilon = 1e-5;
p1 = [-50;7];
p2 = [20;7];
p3 = [20;-18];
p4 = [5;-10];
f = @(x) 2*x(1)^2 + 12*x(1)*x(2)^2 - 32*x(1)*x(2) - 84*x(1) ...+ 2*x(2)^6 - 8*x(2)^5 + 2*x(2)^4 - 80*x(2)^3 + 12*x(2)^2 ...+ 864*x(2) + 1010;
g = @(x) [12*x(2)^2 - 32*x(2) + 4*x(1) - 84; 24*x(2) - 32*x(1)...+ 24*x(1)*x(2) - 240*x(2)^2 + 8*x(2)^3 - 40*x(2)^4 + 12*x(2)^5 + 864];
h = @(x) [4, 24*x(2) - 32; 24*x(2) - 32, 60*x(2)^4 - 160*x(2)^3 ...+ 24*x(2)^2 - 480*x(2) + 24*x(1) + 24];%% call func
% gradient_method_backtracking
[x_gb1,fun_val_gb1] = gradient_method_backtracking(f,g,p1,s,alpha,...
beta,epsilon);
[x_gb2,fun_val_gb2] = gradient_method_backtracking(f,g,p2,s,alpha,...
beta,epsilon);
[x_gb3,fun_val_gb3] = gradient_method_backtracking(f,g,p3,s,alpha,...
beta,epsilon);
[x_gb4,fun_val_gb4] = gradient_method_backtracking(f,g,p4,s,alpha,...
beta,epsilon);% newton_backtracking
x_nb1 = newton_backtracking(f,g,h,p1,alpha,beta,epsilon);
x_nb2 = newton_backtracking(f,g,h,p2,alpha,beta,epsilon);
x_nb3 = newton_backtracking(f,g,h,p3,alpha,beta,epsilon);
x_nb4 = newton_backtracking(f,g,h,p4,alpha,beta,epsilon);% newton_hybrid
x_nh1 = newton_hybrid(f,g,h,p1,alpha,beta,epsilon);
x_nh2 = newton_hybrid(f,g,h,p2,alpha,beta,epsilon);
x_nh3 = newton_hybrid(f,g,h,p3,alpha,beta,epsilon);
x_nh4 = newton_hybrid(f,g,h,p4,alpha,beta,epsilon);
function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function valuex=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)iter=iter+1;t=s;while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)t=beta*t;endx=x-t*grad;fun_val=f(x);grad=g(x);fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...iter,norm(grad),fun_val);
end
function x=newton_backtracking(f,g,h,x0,alpha,beta,epsilon)% Newton’s method with backtracking
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function valuex=x0;
gval=g(x);
hval=h(x);
d=hval\gval;
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))iter=iter+1;t=1;while(f(x-t*d)>f(x)-alpha*t*gval'*d)t=beta*t;endx=x-t*d;fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))gval=g(x);hval=h(x);d=hval\gval;
end
if (iter==10000)fprintf('did not converge\n')
end
function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)% Hybrid Newton’s method
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function valuex=x0;
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)d=L'\(L\gval);
elsed=gval;
end
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))iter=iter+1;t=1;while(f(x-t*d)>f(x)-alpha*t*gval'*d)t=beta*t;endx=x-t*d;fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))gval=g(x);hval=h(x);[L,p]=chol(hval,'lower');if (p==0)d=L'\(L\gval);elsed=gval;end
end
if (iter==10000)fprintf('did not converge\n')
end
结果与分析
p1=(−50,7)T,p2=(20,7)T,p3=(20,−18)T,p4=(5,−10)Tp_1=\left(-50,7\right)^T,p_2=\left(20,7\right)^T,p_3=\left(20,-18\right)^T,p_4=\left(5,-10\right)^Tp1=(−50,7)T,p2=(20,7)T,p3=(20,−18)T,p4=(5,−10)T
gradient_method_backtracking
x* | fun_val | iter | |
---|---|---|---|
p1 | 5.0, 4.0 | 0.000000 | - |
p2 | 5.0, 4.0 | 0.000000 | - |
p3 | 11.4128, -0.8968 | 48.984254 | - |
p4 | 5.0, 4.0 | 0.000107 | - |
# p1
iter_number = 1 norm_grad = 16886.999242 fun_val = 11336.705024
iter_number = 2 norm_grad = 5600.459983 fun_val = 5803.984203
iter_number = 3 norm_grad = 1056.831987 fun_val = 4723.717473
iter_number = 4 norm_grad = 434.668354 fun_val = 4676.341028
iter_number = 5 norm_grad = 181.792854 fun_val = 4664.456052
iter_number = 6 norm_grad = 481.845230 fun_val = 4645.697772
iter_number = 7 norm_grad = 1450.431648 fun_val = 2784.316562
iter_number = 8 norm_grad = 626.401853 fun_val = 2597.516663
iter_number = 9 norm_grad = 220.402369 fun_val = 2565.305713
iter_number = 10 norm_grad = 116.512243 fun_val = 2561.054079
......
iter_number = 20849 norm_grad = 0.000092 fun_val = 0.000000
iter_number = 20850 norm_grad = 0.000092 fun_val = 0.000000
iter_number = 20851 norm_grad = 0.000092 fun_val = 0.000000
iter_number = 20852 norm_grad = 0.000092 fun_val = 0.000000
......# p2
iter_number = 1 norm_grad = 20020.154130 fun_val = 11955.275024
iter_number = 2 norm_grad = 8107.335439 fun_val = 3744.085975
iter_number = 3 norm_grad = 2993.085982 fun_val = 1129.842099
iter_number = 4 norm_grad = 949.370487 fun_val = 445.778197
iter_number = 5 norm_grad = 192.946640 fun_val = 320.464548
iter_number = 6 norm_grad = 86.882053 fun_val = 313.996463
iter_number = 7 norm_grad = 41.236184 fun_val = 311.880377
iter_number = 8 norm_grad = 223.087207 fun_val = 295.899225
iter_number = 9 norm_grad = 91.972368 fun_val = 287.479235
iter_number = 10 norm_grad = 40.293830 fun_val = 285.264643
......
iter_number = 13961 norm_grad = 0.000151 fun_val = 0.000000
iter_number = 13962 norm_grad = 0.000151 fun_val = 0.000000
iter_number = 13963 norm_grad = 0.000151 fun_val = 0.000000
iter_number = 13964 norm_grad = 0.000151 fun_val = 0.000000
iter_number = 13965 norm_grad = 0.000151 fun_val = 0.000000
......# p3
iter_number = 1 norm_grad = 10459534.883496 fun_val = 26901796.557585
iter_number = 2 norm_grad = 4328861.498929 fun_val = 9350549.111728
iter_number = 3 norm_grad = 1815010.486957 fun_val = 3306985.630845
iter_number = 4 norm_grad = 764085.159510 fun_val = 1178877.486706
iter_number = 5 norm_grad = 322588.854922 fun_val = 423840.683140
iter_number = 6 norm_grad = 136745.858345 fun_val = 154327.967236
iter_number = 7 norm_grad = 58355.714510 fun_val = 57260.204151
iter_number = 8 norm_grad = 25173.561268 fun_val = 21780.708212
iter_number = 9 norm_grad = 11034.676282 fun_val = 8504.776429
iter_number = 10 norm_grad = 4932.916085 fun_val = 3367.298679
......
iter_number = 11759 norm_grad = 0.000034 fun_val = 48.984254
iter_number = 11760 norm_grad = 0.000034 fun_val = 48.984254
iter_number = 11761 norm_grad = 0.000034 fun_val = 48.984254
iter_number = 11762 norm_grad = 0.000034 fun_val = 48.984254
iter_number = 11763 norm_grad = 0.000034 fun_val = 48.984254
iter_number = 11764 norm_grad = 0.000034 fun_val = 48.984254
......# p4
iter_number = 1 norm_grad = 740484.580250 fun_val = 1125740.309089
iter_number = 2 norm_grad = 318800.867908 fun_val = 410873.876977
iter_number = 3 norm_grad = 135283.737154 fun_val = 147433.167712
iter_number = 4 norm_grad = 57286.724183 fun_val = 52647.763355
iter_number = 5 norm_grad = 24273.902315 fun_val = 18647.021114
iter_number = 6 norm_grad = 10264.843110 fun_val = 6442.533586
iter_number = 7 norm_grad = 4279.439838 fun_val = 2094.562330
iter_number = 8 norm_grad = 1694.354839 fun_val = 604.979471
iter_number = 9 norm_grad = 568.486420 fun_val = 158.093768
iter_number = 10 norm_grad = 98.759692 fun_val = 69.674565
......
iter_number = 6072 norm_grad = 0.000107 fun_val = 0.000000
iter_number = 6073 norm_grad = 0.000107 fun_val = 0.000000
iter_number = 6074 norm_grad = 0.000107 fun_val = 0.000000
iter_number = 6075 norm_grad = 0.000107 fun_val = 0.000000
iter_number = 6076 norm_grad = 0.000107 fun_val = 0.000000
......
newton_backtracking
x* | fun_val | iter | |
---|---|---|---|
p1 | 5.0, 4.0 | -0.000000 | 8 |
p2 | 5.0, 4.0 | 0.000000 | 11 |
p3 | 11.4128, -0.8968 | 48.984254 | 16 |
p4 | 11.4128, -0.8968 | 48.984254 | 13 |
# p1
iter= 1 f(x)=17489.2000310639
iter= 2 f(x)=3536.1276294897
iter= 3 f(x)=556.9651095482
iter= 4 f(x)=50.4029326519
iter= 5 f(x)=1.2248344195
iter= 6 f(x)=0.0013284549
iter= 7 f(x)=0.0000000018
iter= 8 f(x)=-0.0000000000# p2
iter= 1 f(x)=18114.8519548468
iter= 2 f(x)=3676.4191962662
iter= 3 f(x)=583.8090713510
iter= 4 f(x)=53.8298627069
iter= 5 f(x)=1.3684607619
iter= 6 f(x)=0.0016459572
iter= 7 f(x)=0.0000000027
iter= 8 f(x)=0.0000000007
iter= 9 f(x)=0.0000000002
iter= 10 f(x)=0.0000000000
iter= 11 f(x)=0.0000000000# p3
iter= 1 f(x)=21356364.5665758252
iter= 2 f(x)=5561714.7070109081
iter= 3 f(x)=1444932.1368662491
iter= 4 f(x)=374422.3650773597
iter= 5 f(x)=96843.5106842914
iter= 6 f(x)=25078.5606496656
iter= 7 f(x)=6556.5446423615
iter= 8 f(x)=1764.6276022580
iter= 9 f(x)=509.7515417783
iter= 10 f(x)=171.5021447876
iter= 11 f(x)=76.9343325176
iter= 12 f(x)=52.3964690617
iter= 13 f(x)=49.0499328705
iter= 14 f(x)=48.9842770185
iter= 15 f(x)=48.9842536792
iter= 16 f(x)=48.9842536792# p4
iter= 1 f(x)=695503.8833055196
iter= 2 f(x)=180005.8157903731
iter= 3 f(x)=46559.5630006172
iter= 4 f(x)=12100.5787363822
iter= 5 f(x)=3202.4785180644
iter= 6 f(x)=889.1211707798
iter= 7 f(x)=275.2063763915
iter= 8 f(x)=106.0983438392
iter= 9 f(x)=59.2021422471
iter= 10 f(x)=49.5512377643
iter= 11 f(x)=48.9860167506
iter= 12 f(x)=48.9842536959
iter= 13 f(x)=48.9842536792
newton_hybrid
x* | fun_val | iter | |
---|---|---|---|
p1 | 5.0, 4.0 | -0.000000 | 8 |
p2 | 5.0, 4.0 | 0.000000 | 11 |
p3 | 11.4128, -0.8968 | 48.984254 | 16 |
p4 | 11.4128, -0.8968 | 48.984254 | 13 |
# p1
iter= 1 f(x)=17489.2000310639
iter= 2 f(x)=3536.1276294897
iter= 3 f(x)=556.9651095482
iter= 4 f(x)=50.4029326519
iter= 5 f(x)=1.2248344195
iter= 6 f(x)=0.0013284549
iter= 7 f(x)=0.0000000018
iter= 8 f(x)=-0.0000000000# p2
iter= 1 f(x)=18114.8519548468
iter= 2 f(x)=3676.4191962662
iter= 3 f(x)=583.8090713510
iter= 4 f(x)=53.8298627069
iter= 5 f(x)=1.3684607619
iter= 6 f(x)=0.0016459572
iter= 7 f(x)=0.0000000027
iter= 8 f(x)=0.0000000007
iter= 9 f(x)=0.0000000002
iter= 10 f(x)=0.0000000000
iter= 11 f(x)=0.0000000000# p3
iter= 1 f(x)=21356364.5665758252
iter= 2 f(x)=5561714.7070109081
iter= 3 f(x)=1444932.1368662491
iter= 4 f(x)=374422.3650773597
iter= 5 f(x)=96843.5106842914
iter= 6 f(x)=25078.5606496656
iter= 7 f(x)=6556.5446423615
iter= 8 f(x)=1764.6276022580
iter= 9 f(x)=509.7515417783
iter= 10 f(x)=171.5021447876
iter= 11 f(x)=76.9343325176
iter= 12 f(x)=52.3964690617
iter= 13 f(x)=49.0499328705
iter= 14 f(x)=48.9842770185
iter= 15 f(x)=48.9842536792
iter= 16 f(x)=48.9842536792# p4
iter= 1 f(x)=695503.8833055196
iter= 2 f(x)=180005.8157903731
iter= 3 f(x)=46559.5630006172
iter= 4 f(x)=12100.5787363822
iter= 5 f(x)=3202.4785180644
iter= 6 f(x)=889.1211707798
iter= 7 f(x)=275.2063763915
iter= 8 f(x)=106.0983438392
iter= 9 f(x)=59.2021422471
iter= 10 f(x)=49.5512377643
iter= 11 f(x)=48.9860167506
iter= 12 f(x)=48.9842536959
iter= 13 f(x)=48.9842536792
结果分析
回溯法梯度下降(gradient_method_backtracking)在f(x)f(x)f(x)函数上的表现情况比较糟糕,在所有初始点(p1-p4)上均不能满足结束条件∣∣∇f(x)∣∣≤ϵ,ϵ=10−5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣ ∇f(x)∣∣≤ϵ,ϵ=10−5,所以其迭代步数iter→∞iter \to \inftyiter→∞. 但是可以发现,随着迭代次数的不断增加,最优解x∗x^*x∗和函数值fun_valfun\_valfun_val均收敛。在起始点为p1,p2,p4时,回溯梯度下降求得的最优解xˉ→(5.0,4.0),fun_val→0.0\bar{x} \to (5.0,4.0), fun\_val \to 0.0xˉ→(5.0,4.0),fun_val→0.0,显然是f(x)f(x)f(x)的全局最优,但当以p3为起始点时,所得解为局部最优。此外,当以p4为起始点时,fun_val→0.000107fun\_val \to 0.000107fun_val→0.000107,收敛效果不如p1,p2起始点。总之,回溯法梯度下降的结果说明,由梯度下降得到的f(x)f(x)f(x)的最优解,无论是在全局最优还是局部最优,其梯度∇f(x∗)\nabla f\left(x^*\right)∇f(x∗)难以收敛到0\mathscr{0}0,这也导致了梯度下降的发散。
回溯牛顿法(newton_backtracking)和混合牛顿法(newton_hybrid)得到的结果极为相似,二者在所有四个起始点上的收敛结果表现良好。当以p1,p2为起始点时,两种方法均在10步左右收敛到全局最优(5.0,4.0)(5.0,4.0)(5.0,4.0)。当以p3,p4为起始点时,两方法也能在16步以内收敛到局部最优(11.4128,−0.8968)(11.4128,-0.8968)(11.4128,−0.8968). 观察两种方法在四个起始点上的输出结果,不难发现,输出结果完全相同,说明∇2f(xk),k=1,2,…\nabla ^2f(x_k), k=1,2,\dots∇2f(xk),k=1,2,…是非正定的,所以混合牛顿法中的(a)步(纯牛顿)永远不执行,混合牛顿此时等价于回溯牛顿。
总之,在f(x)f(x)f(x)上,当以∣∣∇f(x)∣∣≤ϵ,ϵ=10−5\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}∣∣ ∇f(x)∣∣≤ϵ,ϵ=10−5为终止条件时,牛顿法的迭代次数明显少于梯度下降法。两种方法,当选取的起始点不同,可能会导致收敛不到全局最优解。
©️ Sylvan Ding ❤️