高级优化理论与方法(二)
- 上节回顾
- Constrained
- Unconstrained
- FONC
- SONC
- example
- 这节课的内容
- SOSC
- 定理叙述
- 证明
- 例子
- One-dimensional Search Methods
- Iterative Method
- Golden Section Search
- Method
- Issues
- 方法推理
- 算法描述
- Time
- Example
- Fibonacci Method
- Bisection Method
- Newton Method
- Example
- Secant Method
- Bracketing
- 总结
上节回顾
Constrained
m i n f ( x ) s . t . x ∈ Ω min f(x)\\ s.t. x\in \Omega minf(x)s.t.x∈Ω
Unconstrained
m i n f ( x ) min f(x) minf(x)
FONC
x ∗ x^* x∗ is optimal, ∀ d , ∇ f ( x ∗ ) T d ≥ 0 \forall d, \nabla f(x^*)^Td \geq 0 ∀d,∇f(x∗)Td≥0
(interior) ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 ∇f(x∗)=0
SONC
x ∗ x^* x∗ local optimal, ∀ d , d T ∇ F ( x ) T d ≥ 0 \forall d, d^T\nabla F(x)^Td \geq 0 ∀d,dT∇F(x)Td≥0
(interior) ∇ f ( x ∗ ) = 0 , F ( x ∗ ) ≥ 0 \nabla f(x^*)=0,F(x^*)\geq0 ∇f(x∗)=0,F(x∗)≥0
example
m i n f ( x 1 , x 2 ) = x 1 2 − x 2 2 min f(x_1,x_2)=x_1^2-x_2^2 minf(x1,x2)=x12−x22
x ∗ = [ 0 , 0 ] T x^*=[0,0]^T x∗=[0,0]T
∇ f ( x ) = [ 2 x 1 , − 2 x 2 ] T \nabla f(x)=[2x_1,-2x_2]^T ∇f(x)=[2x1,−2x2]T
∇ f ( x ∗ ) = [ 0 , 0 ] T \nabla f(x^*)=[0,0]^T ∇f(x∗)=[0,0]T
H ( x ) = [ 2 0 0 − 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}>0 H(x)=[200−2]>0
d 1 = [ 1 , 0 ] T d_1=[1,0]^T d1=[1,0]T
d 1 T F ( x ∗ ) d 1 = [ 2 , 0 ] [ 1 , 0 ] T = 2 > 0 d_1^TF(x^*)d_1=[2,0][1,0]^T=2>0 d1TF(x∗)d1=[2,0][1,0]T=2>0
d 2 = [ 0 , 1 ] T d_2=[0,1]^T d2=[0,1]T
d 2 T F ( x ∗ ) d 2 = − 2 < 0 d_2^TF(x^*)d_2=-2<0 d2TF(x∗)d2=−2<0
根据SONC, [ 0 , 0 ] T [0,0]^T [0,0]T not local minimizer.
这节课的内容
SOSC
定理叙述
【Second-order Sufficient Condition]
Let f ∈ C 2 f\in C^2 f∈C2 be defined on a region in which x ∗ x^* x∗ is an interior point.Suppose that:
① ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 ∇f(x∗)=0
② F ( x ∗ ) > 0 F(x^*)>0 F(x∗)>0
Then, x ∗ x^* x∗ is a strict local minimizer of f. ∀ x ∈ N ϵ ( x ∗ ) : f ( x ∗ ) < f ( x ) \forall x\in N_{\epsilon}(x^*):f(x^*)<f(x) ∀x∈Nϵ(x∗):f(x∗)<f(x)
注:对于无约束优化问题,我们只能给出一些充分条件或者必要条件,充要条件是数学界的一个公开问题,目前还没有答案。
证明
证:
f ∈ C 2 ⇒ F ( x ∗ ) = F ( x ∗ ) T f \in C^2 \Rightarrow F(x^*)=F(x^*)^T f∈C2⇒F(x∗)=F(x∗)T
(由Clairaut’s Theorem and Schwarz’s Therem, ∀ i , j ∈ [ 1 , n ] , ∂ 2 f ( x ∗ ) ∂ x i ∂ x j = ∂ 2 f ( x ∗ ) ∂ x j ∂ x i \forall i,j \in [1,n],\frac{\partial^2 f(x^*)}{\partial x_i \partial x_j}=\frac{\partial^2 f(x^*)}{\partial x_j \partial x_i} ∀i,j∈[1,n],∂xi∂xj∂2f(x∗)=∂xj∂xi∂2f(x∗))
Rayleigh’s Inequality:for a P ∈ R n × n P \in \mathbb{R}^{n \times n} P∈Rn×n,symmetric, positive definite:
λ m i n ( P ) ∣ ∣ x ∣ ∣ 2 ≤ x T P x ≤ λ m a x ( P ) ∣ ∣ x ∣ ∣ 2 \lambda_{min}(P)||x||^2\leq x^TPx \leq \lambda_{max}(P)||x||^2 λmin(P)∣∣x∣∣2≤xTPx≤λmax(P)∣∣x∣∣2
where λ m i n ( P ) \lambda_{min}(P) λmin(P) and λ m a x ( P ) \lambda_{max}(P) λmax(P) are the minmal and maximal eigenvalue value of P, respectively.
a symmetric matrix is positive definite ⇔ \Leftrightarrow ⇔all its eigenvalues are positive.
∵ d T F ( x ∗ ) d ≥ λ m i n ( F ( x ∗ ) ) ∣ ∣ d ∣ ∣ 2 > 0 \because d^TF(x^*)d \geq \lambda_{min}(F(x^*))||d||^2>0 ∵dTF(x∗)d≥λmin(F(x∗))∣∣d∣∣2>0
∴ f ( x ∗ + d ) − f ( x ∗ ) = 1 2 d T F ( x ∗ ) d + o ( ∣ ∣ d ∣ ∣ 2 ) > 0 \therefore f(x^*+d)-f(x^*)=\frac{1}{2}d^TF(x^*)d+o(||d||^2)>0 ∴f(x∗+d)−f(x∗)=21dTF(x∗)d+o(∣∣d∣∣2)>0
例子
f ( x ) = x 1 2 + x 2 2 f(x)=x_1^2+x_2^2 f(x)=x12+x22
∇ f ( x ) = [ 2 x 1 , 2 x 2 ] T \nabla f(x)=[2x_1,2x_2]^T ∇f(x)=[2x1,2x2]T
H ( x ) = [ 2 0 0 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}>0 H(x)=[2002]>0
x ∗ = [ 0 , 0 ] T x^*=[0,0]^T x∗=[0,0]T
One-dimensional Search Methods
Iterative Method
Iterative Method意为迭代算法。此处算法用algorithm其实不太严谨,因为要设计到算法的复杂度证明、正确性证明、能否停止等等的算法严谨性问题,而method这个词则不用考虑这么多。迭代意为由某个初始点出发,找一些方向,往某些方向更新的过程。
Golden Section Search
Assume f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0,b0] (only one minimizer in [ a 0 , b 0 ] [a_0,b_0] [a0,b0])
Basic Idea: “Narrow Down”
Binary Search does not work out.
Pick two instead of one points.
Method
input: a 0 , b 0 , f , ϵ a_0,b_0,f,\epsilon a0,b0,f,ϵ
1. i = 0 i=0 i=0
2.while b i − a i ≥ ϵ b_i-a_i\geq \epsilon bi−ai≥ϵ do
3.Pick x < y x<y x<y from [a_i,b_i]
4.If f ( x ) < f ( y ) f(x)<f(y) f(x)<f(y) then a i + 1 = a i , b i + 1 = y a_{i+1}=a_i,b_{i+1}=y ai+1=ai,bi+1=y;
else b i + 1 = b i , a i + 1 = x b_{i+1}=b_i,a_{i+1}=x bi+1=bi,ai+1=x
5.i++
6.END while
Issues
1.# while-loop
2.# computation of f ( ⋅ ) f(\cdot) f(⋅)
方法推理
W.O.L.G.(Without Loss of Generality)
Assume b 0 − a 0 = 1 b_0-a_0=1 b0−a0=1
a 1 − a 0 = b 1 − b 0 = ρ < 1 2 a_1-a_0=b_1-b_0=\rho<\frac{1}{2} a1−a0=b1−b0=ρ<21
∀ i : b i + 1 − a i + 1 = ( 1 − ρ ) ( b i − a i ) \forall i: b_{i+1}-a_{i+1}=(1-\rho)(b_i-a_i) ∀i:bi+1−ai+1=(1−ρ)(bi−ai)
b 1 − a 1 = 1 − 2 ρ b_1-a_1=1-2\rho b1−a1=1−2ρ
b 1 − a 1 = ρ ( b 1 − a 0 ) = ρ ( 1 − ρ ) ⇒ 1 − 2 ρ = ρ − ρ 2 ⇒ ρ 2 − 3 ρ + 1 = 0 b_1-a_1=\rho(b_1-a_0)=\rho(1-\rho) \Rightarrow 1-2\rho=\rho-\rho^2 \Rightarrow \rho^2-3\rho+1=0 b1−a1=ρ(b1−a0)=ρ(1−ρ)⇒1−2ρ=ρ−ρ2⇒ρ2−3ρ+1=0
ρ 1 = 3 + 5 2 > 1 2 \rho_1=\frac{3+\sqrt{5}}{2}>\frac{1}{2} ρ1=23+5>21(舍去), ρ 2 = 3 − 5 2 < 1 2 \rho_2=\frac{3-\sqrt{5}}{2}<\frac{1}{2} ρ2=23−5<21
算法描述
1.compile b 1 = a 0 + ( 1 − ρ ) ( b 0 − a 0 ) , a 1 = a 0 + ρ ( b 0 − a 0 ) , f ( a 1 ) , f ( b 1 ) b_1=a_0+(1-\rho)(b_0-a_0),a_1=a_0+\rho(b_0-a_0),f(a_1),f(b_1) b1=a0+(1−ρ)(b0−a0),a1=a0+ρ(b0−a0),f(a1),f(b1)
2.i=0
3.while b i − a i ≥ ϵ b_i-a_i\geq \epsilon bi−ai≥ϵ do
if f ( a i + 1 ) < f ( b i + 1 ) f(a_{i+1})<f(b_{i+1}) f(ai+1)<f(bi+1) then
b i + 2 = a i + 1 , a i + 2 = a i + ρ ( b i + 1 − a i ) , a i + 1 = a i b_{i+2}=a_{i+1},a_{i+2}=a_i+\rho(b_{i+1}-a_i),a_{i+1}=a_i bi+2=ai+1,ai+2=ai+ρ(bi+1−ai),ai+1=ai
else
a i + 2 = b i + 1 , b i + 2 = b i − ρ ( b i − a i + 1 ) , b i + 1 = b i a_{i+2}=b_{i+1},b_{i+2}=b_i-\rho(b_i-a_{i+1}),b_{i+1}=b_i ai+2=bi+1,bi+2=bi−ρ(bi−ai+1),bi+1=bi
4.i++
5.END while
Time
1.While-Loop: time of f ( ⋅ ) f(\cdot) f(⋅)+O(1)
2.Loop: ( 1 − ρ ) N ( b 0 − a 0 ) < ϵ (1-\rho)^N(b_0-a_0)<\epsilon (1−ρ)N(b0−a0)<ϵ
N= a r g m i n ( l o g 1 − ρ ϵ b 0 − a 0 ) argmin(log_{1-\rho}\frac{\epsilon}{b_0-a_0}) argmin(log1−ρb0−a0ϵ)
Example
ϵ = 0.3 \epsilon=0.3 ϵ=0.3
f ( x ) = x 4 − 14 x 3 + 60 x 2 − 70 x f(x)=x^4-14x^3+60x^2-70x f(x)=x4−14x3+60x2−70x
[0,2]
( 1 − ρ ) N < 0.3 2 = 0.15 ⇒ N = 4 (1-\rho)^N<\frac{0.3}{2}=0.15\Rightarrow N=4 (1−ρ)N<20.3=0.15⇒N=4
1. a 1 = a 0 + ρ ( b 0 − a 0 ) = 0.7633 a_1=a_0+\rho(b_0-a_0)=0.7633 a1=a0+ρ(b0−a0)=0.7633
b 1 = a 0 + ( 1 − ρ ) ( b 0 − a 0 ) = 1.236 b_1=a_0+(1-\rho)(b_0-a_0)=1.236 b1=a0+(1−ρ)(b0−a0)=1.236
f ( a 1 ) = − 24.36 f(a_1)=-24.36 f(a1)=−24.36
f ( b 1 ) = − 18.96 f(b_1)=-18.96 f(b1)=−18.96
2.[0,1.236]
b 2 = a 1 = 0.7639 b_2=a_1=0.7639 b2=a1=0.7639
a 1 = a 0 + ρ ( 1.236 − 0 ) = 0.4721 a_1=a_0+\rho(1.236-0)=0.4721 a1=a0+ρ(1.236−0)=0.4721
f ( b 2 ) = − 24.36 f(b_2)=-24.36 f(b2)=−24.36
KaTeX parse error: Expected 'EOF', got '}' at position 6: f(a_2}̲=-21.10
3.[0.4721,1.236]
a 3 = b 2 = 0.7639 a_3=b_2=0.7639 a3=b2=0.7639
b 3 = a 2 + ( 1 − ρ ) ( 1.236 − 0.4721 ) = 0.9443 b_3=a_2+(1-\rho)(1.236-0.4721)=0.9443 b3=a2+(1−ρ)(1.236−0.4721)=0.9443
f ( a 3 ) = − 24.36 f(a_3)=-24.36 f(a3)=−24.36
f ( b 3 ) = − 23.59 f(b_3)=-23.59 f(b3)=−23.59
4.[0.4721,0.9443]
b 4 = a 3 = 0.7639 b_4=a_3=0.7639 b4=a3=0.7639
a 4 = 0.4721 + ρ ( 0.7443 − 0.4721 ) = 0.6525 a_4=0.4721+\rho(0.7443-0.4721)=0.6525 a4=0.4721+ρ(0.7443−0.4721)=0.6525
f ( b 4 ) = − 24.36 f(b_4)=-24.36 f(b4)=−24.36
$f(a_4)=-23.86
5.[0.6525,09443]
0.9443 − 0.6525 < 0.3 = ϵ 0.9443-0.6525<0.3=\epsilon 0.9443−0.6525<0.3=ϵ
算法终止
Fibonacci Method
事实上,每一轮的 ρ \rho ρ不一定要固定,也可以变化。假设 ρ \rho ρ会变化,我们来推导一下每一轮之间 ρ \rho ρ的关系。
ρ 1 ( 1 − ρ 0 ) = 1 − 2 ρ 0 \rho_1(1-\rho_0)=1-2\rho_0 ρ1(1−ρ0)=1−2ρ0
ρ k + 1 ( 1 − ρ k ) = 1 − 2 ρ k \rho_{k+1}(1-\rho_k)=1-2\rho_k ρk+1(1−ρk)=1−2ρk
ρ k + 1 = 1 − ρ k 1 − ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1=1−1−ρkρk
问题转化为
min ( 1 − ρ 0 ) ( 1 − ρ 1 ) ⋯ ( 1 − ρ k ) (1-\rho_0)(1-\rho_1)\cdots (1-\rho_k) (1−ρ0)(1−ρ1)⋯(1−ρk)
s.t. ρ k + 1 = 1 − ρ k 1 − ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1=1−1−ρkρk
结论为 ρ 0 = 1 − F N F N + 1 , ρ N − 1 = 1 − F 1 F 2 \rho_0=1-\frac{F_N}{F_{N+1}},\rho_{N-1}=1-\frac{F_1}{F_2} ρ0=1−FN+1FN,ρN−1=1−F2F1
F k F_k Fk为Fibonacci数列的第 k k k项, F 0 = 0 , F 1 = 1 , F k + 2 = F k + F k + 1 F_0=0,F_1=1,F_{k+2}=F_k+F_{k+1} F0=0,F1=1,Fk+2=Fk+Fk+1
注:用该方法来做比黄金分割法要快。
Bisection Method
Assume:f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0,b0], f continuously differentiable.
f ′ ( c ) < 0 : [ c , b 0 ] f'(c)<0:[c,b_0] f′(c)<0:[c,b0]
f ′ ( c ) > 0 : [ a 0 , c ] f'(c)>0:[a_0,c] f′(c)>0:[a0,c]
f ′ ( c ) = 0 : f'(c)=0: f′(c)=0:return c c c
( 1 2 ) N < ϵ (\frac{1}{2})^N<\epsilon (21)N<ϵ
Newton Method
Assume: f ∈ C 2 ⇒ x ∗ ∈ [ a , b ] : f ′ ( x ∗ ) = 0 f \in C^2\Rightarrow x^*\in [a,b]: f'(x^*)=0 f∈C2⇒x∗∈[a,b]:f′(x∗)=0
x k + 1 = x k − f ( x k ) f ′ ( x k ) x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)} xk+1=xk−f′(xk)f(xk)或 x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) x_{k+1}=x_k-\frac{f'(x_k)}{f''(x_k)} xk+1=xk−f′′(xk)f′(xk)
该方法只有在初始点选的比较好的时候才管用,若初始点选的不好,可能产生振荡不收敛的问题。
Example
f ( x ) = 1 2 x 2 − s i n x f(x)=\frac{1}{2}x^2-sinx f(x)=21x2−sinx
x 0 = 0.5 x_0=0.5 x0=0.5
ϵ = 1 0 − 5 \epsilon=10^{-5} ϵ=10−5
f ′ ( x ) = x − c o s x f'(x)=x-cosx f′(x)=x−cosx
f ′ ′ ( x ) = 1 + s i n x f''(x)=1+sinx f′′(x)=1+sinx
x 1 = 0.5 − 0.5 − c o s 0 , 5 1 + s i n 0.5 = 0.7552 x_1=0.5-\frac{0.5-cos0,5}{1+sin0.5}=0.7552 x1=0.5−1+sin0.50.5−cos0,5=0.7552
x 2 = 0.7391 x_2=0.7391 x2=0.7391
x 3 = 0.7390 x_3=0.7390 x3=0.7390
x 4 = 0.7390 x_4=0.7390 x4=0.7390
Secant Method
secant意为切线。
f ∈ C 1 f \in C^1 f∈C1
f ′ ′ ≈ f ′ ( x k + 1 ) − f ′ ( x k ) x k + 1 − x k f''\approx\frac{f'(x_{k+1})-f'(x_k)}{x_{k+1}-x_k} f′′≈xk+1−xkf′(xk+1)−f′(xk)
x k + 1 = x k − f ′ ( x k ) ( x k − x k − 1 ) f ′ ( x k ) − f ′ ( x k − 1 ) x_{k+1}=x_k-\frac{f'(x_k)(x_k-x_{k-1})}{f'(x_k)-f'(x_{k-1})} xk+1=xk−f′(xk)−f′(xk−1)f′(xk)(xk−xk−1)
Bracketing
Find the initial a 0 , b 0 a_0,b_0 a0,b0
Suffice: a 0 , c , b 0 ← f ( a 0 ) > f ( c ) , f ( b 0 ) > f ( c ) a_0,c,b_0\leftarrow f(a_0)>f(c),f(b_0)>f(c) a0,c,b0←f(a0)>f(c),f(b0)>f(c)
该方法用于求得一个理想的区间,然后使用其它算法来做,但在实际应用中比较少见,且不太好用。
总结
本节课先回顾了FONC和SONC这两个找最值点的必要条件,然后给出了SOSC这个找最值点的充分条件。虽然看上去比较简单,但是关于无约束优化的定理目前也只发展到这种程度。目前数学界还没有找出一个充分必要条件。然后介绍了一维搜索方法中的迭代方法。重点介绍了黄金分割法,简略介绍了斐波那契法、二分法、牛顿法、割线法等方法。