高级优化理论与方法(一)
- 前言
- 基本概念
- 优化的概念
- 定义
- 优化的写法
- 邻域
- 局部最优
- Terms(术语):
- 可行方向
- 定义
- 内部点
- 极限点
- 边界
- 导数
- 一阶导
- 二阶导
- 例子
- 方向导数
- 例子
- Unconstrained Optimization(无约束优化)
- FONC
- 证明
- 推论
- 推论的证明
- 例子
- SONC
- 证明
- 推论
- 总结
前言
这是一个新的系列。我这个学期选了一门《高级优化理论与方法》的课,想着反正要做笔记,不如直接做电子笔记,于是就有了这个系列。由于这门课是一周上一次,所以我基本上会保持一周一更的速度。内容会从易到难。
我们老师的讲稿都是英文的,板书也是英文的。简洁起见,我这里就保留英文板书的原汁原味。但是由于本人的英文水平有限,所以还是偶尔会在里面穿插一些中文。看到中文,大概率是我自己加的注释。
由于是课堂笔记,所以里面的内容可能会有一些小错误,或者不那么严谨的地方,还请大家多多包容,批评指正。
基本概念
优化的概念
定义
Def: Given a function f : A → R n f:A\rightarrow \mathbb{R}^n f:A→Rn, where A ⊆ R n A \subseteq \mathbb{R}^n A⊆Rn
sought x ∈ A , s . t . x \in A,s.t. x∈A,s.t.
{ f ( x 0 ) ≤ f ( x ) , ∀ x ∈ A ( m i n i m i z e r ) f ( x 0 ) ≥ f ( x ) , ∀ x ∈ A ( m a x i m i z e r ) \begin{cases} f(x_0)\leq f(x), \forall x \in A(minimizer)\\ f(x_0)\geq f(x), \forall x \in A(maximizer) \end{cases} {f(x0)≤f(x),∀x∈A(minimizer)f(x0)≥f(x),∀x∈A(maximizer)
注:在本系列中,如未明确说明,则默认是求最小值。
f f f: objective function(目标函数)
A A A:constraint(限制条件)
优化的写法
对于优化,可简写为
min/max f(x)
subject to x ∈ A x \in A x∈A
邻域
Neighborhood of x ∗ ∈ R n x^* \in \mathbb{R}^n x∗∈Rn for ϵ > 0 \epsilon>0 ϵ>0: N ϵ ( x ∗ ) = { x ∈ R n : ∣ ∣ x − x ∗ ∣ ∣ ≤ ϵ } N_{\epsilon}(x^*)=\{x\in \mathbb{R}^n:||x-x^*||\leq\epsilon\} Nϵ(x∗)={x∈Rn:∣∣x−x∗∣∣≤ϵ}
局部最优
x ∗ x^* x∗ is a local optimizer, if ∃ ϵ > 0 , ∀ x ∈ N ϵ ( x ∗ ) , f ( x ) ≥ f ( x ∗ ) \exist \epsilon>0,\forall x\in N_{\epsilon}(x^*),f(x)\geq f(x^*) ∃ϵ>0,∀x∈Nϵ(x∗),f(x)≥f(x∗)
Terms(术语):
A A A: feasible set(可行解集)
x ∈ A x \in A x∈A: feasible solution/vector(可行解)
x ∗ x^* x∗: local/global optimal (feasible) solution(局部/全局最优解)
f ( x ∗ ) = m i n x ∈ A f ( x ) , x ∗ = a r g m i n x ∈ A f ( x ) f(x^*)=min_{x\in A}f(x),x^*=argmin_{x \in A}f(x) f(x∗)=minx∈Af(x),x∗=argminx∈Af(x)
可行方向
定义
Def: feasible direction(可行方向)
A vector d ∈ R n ( d ≠ 0 ) d\in \mathbb{R}^n(d\neq0) d∈Rn(d=0),is a feasible direction at x ∈ A ⊆ R n x\in A\subseteq \mathbb{R}^n x∈A⊆Rn,if ∃ α 0 ( α 0 ∈ R ) \exist\alpha_0(\alpha_0\in \mathbb{R}) ∃α0(α0∈R) s.t. ∀ α ∈ [ 0 , α 0 ] , x + α d ∈ A \forall \alpha\in[0,\alpha_0],x+\alpha d\in A ∀α∈[0,α0],x+αd∈A
注:对于 d ∈ R n d\in \mathbb{R}^n d∈Rn,默认 ∣ ∣ d ∣ ∣ = 1 ||d||=1 ∣∣d∣∣=1。
内部点
interior point: 任意方向都是可行方向的点。
极限点
extreme point: 有些方向是可行方向,有些方向不可行的点。
边界
boundary:极限点的集合。
导数
一阶导
Def: First-Order Derivative
f : R n → R , x = [ x 1 , x 2 , ⋯ , x n ] T ∈ R n f:\mathbb{R}^n\rightarrow\mathbb{R},x=[x_1,x_2,\cdots,x_n]^T\in \mathbb{R}^n f:Rn→R,x=[x1,x2,⋯,xn]T∈Rn
D f = Δ [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯ , ∂ f ∂ x n ] Df\stackrel{\Delta}{=}[\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}] Df=Δ[∂x1∂f,∂x2∂f,⋯,∂xn∂f]
Gradient(梯度): ∇ f = ( D f ) T \nabla f=(Df)^T ∇f=(Df)T
注:默认向量是列向量, x x x是列向量,梯度也是列向量。
二阶导
Second-Order Derivative: Hessian Matrix
F ( x ) = [ ∂ 2 f ∂ x 1 2 ⋯ ∂ 2 f ∂ x 1 ∂ x n ⋮ ⋱ ⋮ ∂ 2 f ∂ x 1 ∂ x n ⋯ ∂ 2 f ∂ x n 2 ] F(x)=\begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_1 \partial x_n} & \cdots &\frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} F(x)= ∂x12∂2f⋮∂x1∂xn∂2f⋯⋱⋯∂x1∂xn∂2f⋮∂xn2∂2f
注:黑塞矩阵也可以用 H ( x ) H(x) H(x)来表示。
例子
f ( x 1 , x 2 ) = 5 x 1 + 8 x 2 + x 1 x 2 − x 1 2 − 2 x 2 2 f(x_1,x_2)=5x_1+8x_2+x_1x_2-x_1^2-2x_2^2 f(x1,x2)=5x1+8x2+x1x2−x12−2x22
D f ( x ) = ( ∇ f ( x ) ) T = [ 5 + x 2 − 2 x 1 , 8 + x 1 − 4 x 2 ] Df(x)=(\nabla f(x))^T=[5+x_2-2x_1,8+x_1-4x_2] Df(x)=(∇f(x))T=[5+x2−2x1,8+x1−4x2]
H ( x ) = [ − 2 1 1 − 4 ] H(x)=\begin{bmatrix} -2 & 1 \\ 1 & -4 \end{bmatrix} H(x)=[−211−4]
方向导数
x = x 0 + α d x=x_0+\alpha d x=x0+αd
∂ f ∂ d ( x ) = d d α f ( x 0 + α d ) ∣ α = 0 = d T ⋅ ∇ f ( x 0 ) = < ∇ f ( x 0 ) , d > \frac{\partial f}{\partial d}(x)=\left . \frac{d}{d\alpha}f(x_0+\alpha d) \right |_{\alpha=0}=d^T \cdot \nabla f(x_0)=<\nabla f(x_0),d> ∂d∂f(x)=dαdf(x0+αd) α=0=dT⋅∇f(x0)=<∇f(x0),d>
< ⋅ > <\cdot> <⋅>表示向量的内积。
注:该公式可以这么理解,在 d d d这个方向上, f ( x 0 + α d ) f(x_0+\alpha d) f(x0+αd)就是一个关于 α \alpha α的函数。
例子
f ( x ) = x 1 x 2 x 3 , d = [ 1 2 , 1 2 , 1 2 ] T f(x)=x_1x_2x_3,d=[\frac{1}{2},\frac{1}{2},\frac{1}{\sqrt{2}}]^T f(x)=x1x2x3,d=[21,21,21]T
∂ f ∂ d ( x ) = ∇ f ( x ) T ⋅ d = [ x 2 x 3 , x 1 x 3 , x 1 , x 2 ] ⋅ [ 1 2 1 2 1 2 ] = x 2 x 3 + x 1 x 3 + 2 x 1 x 2 2 \frac{\partial f}{\partial d}(x)=\nabla f(x)^T\cdot d=[x_2x_3,x_1x_3,x_1,x_2]\cdot \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2}\\ \frac{1}{\sqrt{2}} \end{bmatrix}=\frac{x_2x_3+x_1x_3+\sqrt{2}x_1x_2}{2} ∂d∂f(x)=∇f(x)T⋅d=[x2x3,x1x3,x1,x2]⋅ 212121 =2x2x3+x1x3+2x1x2
Unconstrained Optimization(无约束优化)
FONC
First-Order Necessary Condition (FONC):
Theorem: Let Ω \Omega Ω be a subset of R n \mathbb{R}^n Rn and f ∈ C 1 f \in C^1 f∈C1:a real valued function on Ω \Omega Ω. If x ∗ x^* x∗ is a local minimizer of f over Ω \Omega Ω, then for any feasible direction d d d a x ∗ x^* x∗, we have d T ⋅ ∇ f ( x ∗ ) ≥ 0 d^T\cdot \nabla f(x^*)\geq0 dT⋅∇f(x∗)≥0.
该定理意为连续函数的最小点处的任意方向导数恒为非负。
注: f ∈ C 1 f\in C^1 f∈C1表示 f f f是连续函数,即 f f f在各点处的导数均存在;
f ∈ C 2 f\in C^2 f∈C2表示 f f f的导数连续,即 f f f在各点处的二阶导数均存在。
证明
Pick an arbitrary feasible direction d d d at x ∗ x^* x∗.
Define x ( α ) = x ∗ + α d ⇒ α > 0 , x ( 0 ) = x ∗ x(\alpha)=x^*+\alpha d\Rightarrow \alpha>0,x(0)=x^* x(α)=x∗+αd⇒α>0,x(0)=x∗
ϕ ( x ) = f ( x ( α ) ) \phi(x)=f(x(\alpha)) ϕ(x)=f(x(α))
Taylor’s Theorem: f ( x ∗ + α d ) − f ( x ∗ ) = ϕ ( α ) − ϕ ( 0 ) = ϕ ′ ( 0 ) α + o ( α ) f(x^*+\alpha d)-f(x^*)=\phi(\alpha)-\phi(0)=\phi'(0)\alpha+o(\alpha) f(x∗+αd)−f(x∗)=ϕ(α)−ϕ(0)=ϕ′(0)α+o(α)
∵ x ∗ \because x* ∵x∗ local minimizer
∴ ϕ ′ ( 0 ) α + o ( α ) ≥ 0 \therefore \phi'(0)\alpha+o(\alpha)\geq0 ∴ϕ′(0)α+o(α)≥0
∴ ϕ ′ ( 0 ) ≥ o ( α ) α → 0 \therefore \phi'(0)\geq \frac{o(\alpha)}{\alpha}\rightarrow0 ∴ϕ′(0)≥αo(α)→0
∴ ϕ ′ ( 0 ) ≥ 0 \therefore \phi'(0)\geq0 ∴ϕ′(0)≥0
∴ ϕ ′ ( 0 ) = d T ⋅ ∇ f ( x ∗ ) ≥ 0 \therefore \phi'(0)=d^T\cdot \nabla f(x^*)\geq0 ∴ϕ′(0)=dT⋅∇f(x∗)≥0
推论
Corollary (interior point): If x ∗ x^* x∗ is an interior point and closed minimizer, then ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 ∇f(x∗)=0
推论的证明
∀ d ∈ R n , { d T ∇ f ( x ∗ ) ≥ 0 − d T ∇ f ( x ∗ ) ≥ 0 ⇒ ∇ f ( x ∗ ) = 0 \forall d\in \mathbb{R}^n, \begin{cases} d^T\nabla f(x^*)\geq 0\\ -d^T\nabla f(x^*)\geq 0 \end{cases}\Rightarrow \nabla f(x^*)=0 ∀d∈Rn,{dT∇f(x∗)≥0−dT∇f(x∗)≥0⇒∇f(x∗)=0
例子
min x 1 2 + 0.5 x 2 2 + 3 x 2 + 4.5 x_1^2+0.5x_2^2+3x_2+4.5 x12+0.5x22+3x2+4.5
s.t. x 1 , x 2 ≥ 0 x_1,x_2\geq 0 x1,x2≥0
Ω = { x ∣ x 1 ≥ 0 , x 2 ≥ 0 } \Omega=\{x|x_1\geq0,x_2\geq0\} Ω={x∣x1≥0,x2≥0}
∇ f ( x ) = [ 2 x 1 , x 2 + 3 ] T \nabla f(x)=[2x_1,x_2+3]^T ∇f(x)=[2x1,x2+3]T
① x ∗ = [ 1 , 3 ] T , ∇ f ( x ∗ ) = [ 2 , 6 ] T x^*=[1,3]^T,\nabla f(x^*)=[2,6]^T x∗=[1,3]T,∇f(x∗)=[2,6]T
d = [ d 1 , d 2 ] T d=[d_1,d_2]^T d=[d1,d2]T
d T ∇ f ( x ∗ ) = 2 d 1 + 6 d 2 d^T\nabla f(x^*)=2d_1+6d_2 dT∇f(x∗)=2d1+6d2
取 d 1 = 0 , d 2 = − 1 d_1=0,d_2=-1 d1=0,d2=−1,上式小于0,故 [ 1 , 3 ] T [1,3]^T [1,3]T不为最小值点。
② x ∗ = [ 0 , 3 ] T , ∇ f ( x ∗ ) = [ 0 , 6 ] T x^*=[0,3]^T,\nabla f(x^*)=[0,6]^T x∗=[0,3]T,∇f(x∗)=[0,6]T
此时 d 1 ≥ 0 d_1\geq0 d1≥0, d 2 d_2 d2可正可负。
取 d 1 = 0 , d 2 = − 1 d_1=0,d_2=-1 d1=0,d2=−1,上式小于0,故 [ 0 , 3 ] T [0,3]^T [0,3]T不为最小值点。
③ x ∗ = [ 1 , 0 ] T , ∇ f ( x ∗ ) = [ 2 , 3 ] T x^*=[1,0]^T,\nabla f(x^*)=[2,3]^T x∗=[1,0]T,∇f(x∗)=[2,3]T
此时 d 2 ≥ 0 d_2\geq0 d2≥0, d 1 d_1 d1可正可负。
取 d 1 = − 1 , d 2 = 0 d_1=-1,d_2=0 d1=−1,d2=0,上式小于0,故 [ 1 , 0 ] T [1,0]^T [1,0]T不为最小值点。
④ x ∗ = [ 0 , 0 ] T , ∇ f ( x ∗ ) = [ 0 , 3 ] T x^*=[0,0]^T,\nabla f(x^*)=[0,3]^T x∗=[0,0]T,∇f(x∗)=[0,3]T
此时 d 1 ≥ 0 d_1\geq0 d1≥0, d 2 ≥ 0 d_2\geq0 d2≥0。
d T ∇ f ( x ∗ ) = 3 d 2 ≥ 0 d^T\nabla f(x^*)=3d_2\geq 0 dT∇f(x∗)=3d2≥0
[ 0 , 0 ] T [0,0]^T [0,0]T满足FONC,可能为最小值点。经检验,该点确实为最小值点。
SONC
Second-Order Necessaru Condition:
Theorem: Let Ω \Omega Ω be a subset of R n \mathbb{R}^n Rn and f ∈ C 2 f\in C^2 f∈C2: a real-valued function on Ω \Omega Ω, x ∗ x^* x∗ a local minimizer of f f f over Ω \Omega Ω, and d d d a feasible direction at x ∗ x^* x∗. If d T ∇ f ( x ∗ ) = 0 d^T\nabla f(x^*)=0 dT∇f(x∗)=0, then d T F ( x ∗ ) d ≥ 0 d^T F(x^*)d\geq0 dTF(x∗)d≥0, where F(x) is the Hessian of f f f.
证明
Suppose d T F ( x ) d < 0 d^TF(x)d<0 dTF(x)d<0
Define x ( α ) = x ∗ + α d , ϕ ( x ) = f ( x ( α ) ) x(\alpha)=x^*+\alpha d,\phi(x)=f(x(\alpha)) x(α)=x∗+αd,ϕ(x)=f(x(α))
Taylor’s Theorem: ϕ ( x ) = ϕ ( 0 ) + ϕ ′ ( 0 ) α + α 2 2 ϕ ′ ( 0 ) + o ( α 2 ) \phi(x)=\phi(0)+\phi'(0)\alpha+\frac{\alpha^2}{2}\phi'(0)+o(\alpha^2) ϕ(x)=ϕ(0)+ϕ′(0)α+2α2ϕ′(0)+o(α2)
∵ d T ∇ f ( x ∗ ) = 0 \because d^T\nabla f(x^*)=0 ∵dT∇f(x∗)=0
∴ ϕ ′ ( 0 ) = 0 \therefore \phi'(0)=0 ∴ϕ′(0)=0
∴ ϕ ( α ) − ϕ ( 0 ) = α 2 2 ϕ ′ ( 0 ) + o ( α 2 ) ≥ 0 \therefore \phi(\alpha)-\phi(0)=\frac{\alpha^2}{2}\phi'(0)+o(\alpha^2)\geq0 ∴ϕ(α)−ϕ(0)=2α2ϕ′(0)+o(α2)≥0
∴ ϕ ′ ( 0 ) ≥ 2 o ( α 2 ) α 2 → 0 \therefore \phi'(0)\geq \frac{2o(\alpha^2)}{\alpha^2}\rightarrow 0 ∴ϕ′(0)≥α22o(α2)→0
∴ ϕ ′ ( 0 ) ≥ 0 \therefore \phi'(0)\geq0 ∴ϕ′(0)≥0
d T F ( x ∗ ) d = ϕ ′ ( 0 ) ≥ 0 d^TF(x^*)d=\phi'(0)\geq0 dTF(x∗)d=ϕ′(0)≥0
推论
Corollary (interior point):
If x ∗ x^* x∗ is an interior point and local minimizer, then ∀ d ∈ R : d T F ( x ∗ ) d ≥ 0 \forall d\in \mathbb{R}: d^TF(x^*)d\geq0 ∀d∈R:dTF(x∗)d≥0.
注:此时, F ( x ∗ ) F(x^*) F(x∗)是半正定阵,即 F ( x ∗ ) ≥ 0 F(x^*)\geq0 F(x∗)≥0
总结
本节课介绍了优化理论的最基础的一些概念,以及导数的概念。在介绍完基本概念之后,介绍了几个无约束优化中最基础的定理。目前给出了两个找最值点的必要条件,分别是FONC和SONC,可用于否定一些点是最值点的可能性,但是不太适合用来找最值点。下节课将介绍找最值点的一个充分条件SOSC,并给出一些找最值点的方法。