高级优化理论与方法(十三)
- Non-linear Constrained Optimization
- Case 1
- Definition
- Example 1
- Example 2
- Necessary/Sufficient Conditions
- Definition
- Example
- Theorem
- FONC(Lagrange's Condition)
- 2-Dimensional
- Summary:
- Lagrange's Theorem[FONC]
- Lagrange's Function
- Example 1
- Example 2
- Example 3
- SONC
- SOSC
- Example 1
- Example 2
- Case 2
- Definition
- KKT-Theorem(FONC)
- Example 1
- Example 2
- 总结
Non-linear Constrained Optimization
min f ( x ) f(x) f(x)
s.t. h ( x ) = 0 h(x)=0 h(x)=0
g ( x ) ≤ 0 g(x)\leq 0 g(x)≤0
x ∈ R n , f : R n → R , h : R n → R m , g : R n → R p x\in \mathbb{R}^n, f: \mathbb{R}^n\rightarrow \mathbb{R},h:\mathbb{R}^n\rightarrow \mathbb{R}^m,g:\mathbb{R}^n\rightarrow \mathbb{R}^p x∈Rn,f:Rn→R,h:Rn→Rm,g:Rn→Rp
注:非线性优化问题和线性优化问题的最大区别在于目标函数是否是线性函数。
Case 1
min f ( x ) = 0 f(x)=0 f(x)=0
s.t. h ( x ) = 0 h(x)=0 h(x)=0
h : R n → R m , h ∈ C 1 h:\mathbb{R}^n\rightarrow \mathbb{R}^m,h\in C^1 h:Rn→Rm,h∈C1 (continuously differential)
Definition
Def: Let x ∗ x^* x∗ be with h 1 ( x ∗ ) = 0 , ⋯ , h m ( x ∗ ) = 0 h_1(x^*)=0,\cdots,h_m(x^*)=0 h1(x∗)=0,⋯,hm(x∗)=0. x ∗ x^* x∗ is a regular point, if ∇ h 1 ( x ∗ ) , ⋯ , ∇ h m ( x ∗ ) \nabla h_1(x^*),\cdots,\nabla h_m(x^*) ∇h1(x∗),⋯,∇hm(x∗) are linearly independent.
Jacobian: D h ( x ∗ ) = [ D h 1 ( x ∗ ) D h 2 ( x ∗ ) ⋯ D h m ( x ∗ ) ] T Dh(x^*)=\begin{bmatrix} Dh_1(x^*)\\ Dh_2(x^*)\\ \cdots\\ Dh_m(x^*) \end{bmatrix}^T Dh(x∗)= Dh1(x∗)Dh2(x∗)⋯Dhm(x∗) T
Def: Surface: S = { x ∈ R n : h 1 ( x ) = 0 , ⋯ , h m ( x ) = 0 } S=\{x\in\mathbb{R}^n:h_1(x)=0,\cdots,h_m(x)=0\} S={x∈Rn:h1(x)=0,⋯,hm(x)=0}
Example 1
n = 3 , m = 1 , h ( x ) = x 2 − x 3 2 n=3,m=1,h(x)=x_2-x_3^2 n=3,m=1,h(x)=x2−x32
D h ( x ) = [ 0 , 1 , − 2 x 3 ] Dh(x)=[0,1,-2x_3] Dh(x)=[0,1,−2x3]
∀ x ∈ R 3 , D h ( x ) ≠ 0 \forall x\in\mathbb{R}^3,Dh(x)\neq 0 ∀x∈R3,Dh(x)=0
S = { x : x 2 − x 3 2 = 0 } S=\{x:x_2-x_3^2=0\} S={x:x2−x32=0}
Example 2
h 1 ( x ) = x 1 , h 2 ( x ) = x 2 − x 3 2 h_1(x)=x_1,h_2(x)=x_2-x_3^2 h1(x)=x1,h2(x)=x2−x32
D h ( x ∗ ) = [ 1 0 0 0 1 − 2 x 3 ] Dh(x^*)=\begin{bmatrix} 1&0&0\\ 0&1&-2x_3 \end{bmatrix} Dh(x∗)=[10010−2x3]
S = { x : x 1 = 0 , x 2 − x 3 2 = 0 } S=\{x:x_1=0,x_2-x_3^2=0\} S={x:x1=0,x2−x32=0}
Necessary/Sufficient Conditions
FONC: x ∗ x^* x∗ local minimizer ⇒ ∇ f ( x ∗ ) = 0 \Rightarrow \nabla f(x^*)=0 ⇒∇f(x∗)=0
SONC: x ∗ x^* x∗ local minimizer ⇒ ∇ f ( x ∗ ) = 0 , ∀ y : y T F ( x ∗ ) y ≥ 0 \Rightarrow \nabla f(x^*)=0,\forall y:y^T F(x^*)y\geq 0 ⇒∇f(x∗)=0,∀y:yTF(x∗)y≥0
SOSC: (1) ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 ∇f(x∗)=0 (2) ∀ y : y T F ( x ∗ ) y ≥ 0 ⇒ x ∗ \forall y:y^T F(x^*)y\geq 0\Rightarrow x^* ∀y:yTF(x∗)y≥0⇒x∗ strictly local minimizer
Definition
Def: A curve C C C on a surface S S S is a set of points { x ( t ) ∈ S : t ∈ ( a , b ) } , x ( t ) : R → R n \{x(t)\in S:t\in(a,b)\},x(t):\mathbb{R}\rightarrow \mathbb{R}^n {x(t)∈S:t∈(a,b)},x(t):R→Rn is a continuous function.
Curve differentiable: x ˙ ( t ) = d x d t ( t ) = [ x ˙ 1 ( t ) x ˙ 2 ( t ) ⋯ x ˙ n ( t ) ] \dot{x}(t)=\frac{dx}{dt}(t)=\begin{bmatrix} \dot{x}_1(t)\\ \dot{x}_2(t)\\ \cdots\\ \dot{x}_n(t) \end{bmatrix} x˙(t)=dtdx(t)= x˙1(t)x˙2(t)⋯x˙n(t) exists for all t ∈ ( a , b ) t\in (a,b) t∈(a,b)
x ¨ ( t ) = d 2 x d t 2 ( t ) = [ x ¨ 1 ( t ) x ¨ 2 ( t ) ⋯ x ¨ n ( t ) ] \ddot{x}(t)=\frac{d^2x}{dt^2}(t)=\begin{bmatrix} \ddot{x}_1(t)\\ \ddot{x}_2(t)\\ \cdots\\ \ddot{x}_n(t) \end{bmatrix} x¨(t)=dt2d2x(t)= x¨1(t)x¨2(t)⋯x¨n(t) exists for all t ∈ ( a , b ) t\in (a,b) t∈(a,b)
Def: tangent space at x ∗ ∈ S = { x ∈ R n : h ( x ) = 0 } x^*\in S=\{x\in\mathbb{R}^n:h(x)=0\} x∗∈S={x∈Rn:h(x)=0} is the set T ( x ∗ ) = { y : D h ( x ∗ ) y = 0 } T(x^*)=\{y:Dh(x^*)y=0\} T(x∗)={y:Dh(x∗)y=0}
Example
S = { x ∈ R 3 : h 1 ( x ) = x 1 = 0 , h 2 ( x ) = x 1 − x 2 = 0 } S=\{x\in \mathbb{R}^3: h_1(x)=x_1=0,h_2(x)=x_1-x_2=0\} S={x∈R3:h1(x)=x1=0,h2(x)=x1−x2=0}
D h ( x ∗ ) = [ 1 0 0 1 − 1 0 ] Dh(x^*)=\begin{bmatrix} 1&0&0\\ 1&-1&0 \end{bmatrix} Dh(x∗)=[110−100]
S S S regular points
T ( x ) = { y : ∇ h 1 ( x ) T y = 0 , ∇ h 2 ( x ) T y = 0 } = { [ 0 , 0 , α ] : α ∈ R } ⇒ x 3 T(x)=\{y:\nabla h_1(x)^Ty=0,\nabla h_2(x)^Ty=0\}=\{[0,0,\alpha]:\alpha\in\mathbb{R}\}\Rightarrow x_3 T(x)={y:∇h1(x)Ty=0,∇h2(x)Ty=0}={[0,0,α]:α∈R}⇒x3-axis
Theorem
Thm: Let x ∗ x^* x∗ be a regular point. T ( x ∗ ) T(x^*) T(x∗): tangent space at x ∗ x^* x∗. Then: y ∈ T ( x ∗ ) ⇔ ∃ y\in T(x^*)\Leftrightarrow \exist y∈T(x∗)⇔∃ differentiable curve on S S S passing through x ∗ x^* x∗ with derivative y y y at x ∗ x^* x∗.
FONC(Lagrange’s Condition)
2-Dimensional
h : R 3 → R h: \mathbb{R}^3\rightarrow \mathbb{R} h:R3→R
Let x ∗ = [ x 1 ∗ , x 2 ∗ ] T , h ( x ∗ ) = 0 x^*=[x_1^*,x_2^*]^T, h(x^*)=0 x∗=[x1∗,x2∗]T,h(x∗)=0
Assume ∇ h ( x ∗ ) ≠ 0 \nabla h(x^*)\neq 0 ∇h(x∗)=0
Let x ( t ) : R → R 2 , x ( t ) x(t):\mathbb{R} \rightarrow \mathbb{R}^2,x(t) x(t):R→R2,x(t) continuously differentiable.
x ( t ) = [ x 1 ( t ) x 2 ( t ) ] , t ∈ ( a , b ) , x ∗ = x ( t ∗ ) x(t)=\begin{bmatrix} x_1(t)\\ x_2(t) \end{bmatrix},t\in(a,b),x^*=x(t^*) x(t)=[x1(t)x2(t)],t∈(a,b),x∗=x(t∗)
∵ ∀ t ∈ ( a , b ) : h ( x ( t ) ) = 0 \because \forall t\in (a,b): h(x(t))=0 ∵∀t∈(a,b):h(x(t))=0
∴ ∀ t : d d t h ( x ( t ) ) = 0 \therefore \forall t: \frac{d}{dt}h(x(t))=0 ∴∀t:dtdh(x(t))=0
∴ ∇ h ( x ∗ ) \therefore \nabla h(x^*) ∴∇h(x∗) orthogonal to x ( t ∗ ) x(t^*) x(t∗)
Assume x ∗ = x ( t ∗ ) x^*=x(t^*) x∗=x(t∗) minimizer of f ( x ) f(x) f(x) on S = { x : h ( x ) = 0 } S=\{x:h(x)=0\} S={x:h(x)=0}
Define ϕ ( t ) = f ( x ( t ) ) ⇒ F O N C d ϕ d t ( t ∗ ) = 0 \phi(t)=f(x(t))\stackrel{FONC}{\Rightarrow} \frac{d\phi}{dt}(t^*)=0 ϕ(t)=f(x(t))⇒FONCdtdϕ(t∗)=0
0 = d d t ϕ ( t ∗ ) = ∇ f ( x ( t ∗ ) ) T x ˙ ( t ∗ ) = ∇ f ( x ∗ ) T x ˙ ( t ∗ ) 0=\frac{d}{dt}\phi(t^*)=\nabla f(x(t^*))^T\dot{x}(t^*)=\nabla f(x^*)^T\dot{x}(t^*) 0=dtdϕ(t∗)=∇f(x(t∗))Tx˙(t∗)=∇f(x∗)Tx˙(t∗)
⇒ ∇ f ( x ∗ ) \Rightarrow \nabla f(x^*) ⇒∇f(x∗) is orthogonal to x ˙ ( t ∗ ) \dot{x}(t^*) x˙(t∗)
∇ f ( x ∗ ) = λ ∇ h ( x ∗ ) \nabla f(x^*)=\lambda \nabla h(x^*) ∇f(x∗)=λ∇h(x∗)
Summary:
x ∗ x^* x∗ is a minimizer of f : R 2 → R f:\mathbb{R}^2\rightarrow \mathbb{R} f:R2→R with h ( x ) = 0 , h : R 2 → R h(x)=0,h:\mathbb{R}^2\rightarrow \mathbb{R} h(x)=0,h:R2→R. Then, ∇ h ( x ∗ ) \nabla h(x^*) ∇h(x∗) and ∇ f ( x ∗ ) \nabla f(x^*) ∇f(x∗) are parallel.
⇒ \Rightarrow ⇒ If ∇ h ( x ∗ ) ≠ 0 \nabla h(x^*)\neq 0 ∇h(x∗)=0, then ∃ λ ∗ \exist \lambda^* ∃λ∗ s.t. ∇ f ( x ∗ ) + λ ∗ ∇ h ( x ∗ ) = 0 \nabla f(x^*)+\lambda^*\nabla h(x^*)=0 ∇f(x∗)+λ∗∇h(x∗)=0
Lagrange’s Theorem[FONC]
x ∗ x^* x∗ is a local minimizer of f : R n → R f:\mathbb{R}^n\rightarrow\mathbb{R} f:Rn→R, subject to h ( x ) = 0 , h : R n → R m , m ≤ n h(x)=0, h:\mathbb{R}^n\rightarrow\mathbb{R}^m,m\leq n h(x)=0,h:Rn→Rm,m≤n. Assume x ∗ x^* x∗ is regular. Then ∃ x ∗ ∈ R m \exist x^*\in \mathbb{R}^m ∃x∗∈Rm s.t. D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)=0 Df(x∗)+λ∗TDh(x∗)=0
Lagrange’s Function
Lagrange’s function: l : R n × R m → R l:\mathbb{R}^n\times\mathbb{R}^m\rightarrow \mathbb{R} l:Rn×Rm→R
l ( x , λ ) = f ( x ) + λ T h ( x ) l(x,\lambda)=f(x)+\lambda^Th(x) l(x,λ)=f(x)+λTh(x)
min l ( x , λ ) ⇐ l(x,\lambda)\Leftarrow l(x,λ)⇐ FONC
D l ( x ∗ , λ ∗ ) = 0 ⇒ { D x l ( x ∗ , λ ∗ ) = 0 D λ l ( x ∗ , λ ∗ ) = 0 Dl(x^*,\lambda^*)=0\Rightarrow \begin{cases} D_xl(x^*,\lambda^*)=0\\ D_{\lambda}l(x^*,\lambda^*)=0 \end{cases} Dl(x∗,λ∗)=0⇒{Dxl(x∗,λ∗)=0Dλl(x∗,λ∗)=0
Example 1
已知长方体的表面积为 A A A,求体积的最大值。
max x 1 x 2 x 3 x_1x_2x_3 x1x2x3
s.t. x 1 x 2 + x 2 x 3 + x 1 x 3 = A 2 ( A > 0 ) x_1x_2+x_2x_3+x_1x_3=\frac{A}{2}(A>0) x1x2+x2x3+x1x3=2A(A>0)
f ( x ) = − x 1 x 2 x 3 , h ( x ) = x 1 x 2 + x 2 x 3 + x 1 x 3 − A 2 f(x)=-x_1x_2x_3,h(x)=x_1x_2+x_2x_3+x_1x_3-\frac{A}{2} f(x)=−x1x2x3,h(x)=x1x2+x2x3+x1x3−2A
∇ f ( x ) = [ − x 2 x 3 , − x 1 x 3 , − x 1 x 2 ] T \nabla f(x)=[-x_2x_3,-x_1x_3,-x_1x_2]^T ∇f(x)=[−x2x3,−x1x3,−x1x2]T
∇ h ( x ) = [ x 2 + x 3 , x 1 + x 3 , x 1 + x 2 ] T \nabla h(x)=[x_2+x_3,x_1+x_3,x_1+x_2]^T ∇h(x)=[x2+x3,x1+x3,x1+x2]T
All feasible solutions are regular.
λ ∈ R \lambda\in\mathbb{R} λ∈R
{ ∇ f ( x ) + λ ∇ h ( x ) = 0 h ( x ) = 0 ⇒ { x 2 x 3 − λ ( x 2 + x 3 ) = 0 x 1 x 3 − λ ( x 1 + x 3 ) = 0 x 1 x 2 − λ ( x 1 + x 2 ) = 0 x 1 x 2 + x 2 x 3 + x 1 x 3 − A 2 = 0 \begin{cases} \nabla f(x)+\lambda \nabla h(x)=0\\ h(x)=0 \end{cases}\Rightarrow \begin{cases} x_2x_3-\lambda(x_2+x_3)=0\\ x_1x_3-\lambda(x_1+x_3)=0\\ x_1x_2-\lambda(x_1+x_2)=0\\ x_1x_2+x_2x_3+x_1x_3-\frac{A}{2}=0 \end{cases} {∇f(x)+λ∇h(x)=0h(x)=0⇒⎩ ⎨ ⎧x2x3−λ(x2+x3)=0x1x3−λ(x1+x3)=0x1x2−λ(x1+x2)=0x1x2+x2x3+x1x3−2A=0
当 x 1 = x 2 = x 3 = A 6 x_1=x_2=x_3=\sqrt{\frac{A}{6}} x1=x2=x3=6A时,取到最值
Example 2
f ( x ) = x 1 2 + x 2 2 , h ( x ) = x 1 2 + 2 x 2 2 − 1 f(x)=x_1^2+x_2^2,h(x)=x_1^2+2x_2^2-1 f(x)=x12+x22,h(x)=x12+2x22−1
∇ f ( x ) = [ 2 x 1 2 x 2 ] , ∇ h ( x ) = [ 2 x 1 4 x 2 ] \nabla f(x)=\begin{bmatrix} 2x_1\\ 2x_2 \end{bmatrix},\nabla h(x)=\begin{bmatrix} 2x_1\\ 4x_2 \end{bmatrix} ∇f(x)=[2x12x2],∇h(x)=[2x14x2]
All feasible solutions are regular.
{ ∇ f ( x ) + λ ∇ h ( x ) = 0 h ( x ) = 0 ⇒ { 2 x 1 + 2 λ x 1 = 0 2 x 2 + 4 λ x 2 = 0 x 1 2 + 2 x 2 2 = 1 \begin{cases} \nabla f(x)+\lambda \nabla h(x)=0\\ h(x)=0 \end{cases}\Rightarrow \begin{cases} 2x_1+2\lambda x_1=0\\ 2x_2+4\lambda x_2=0\\ x_1^2+2x_2^2=1 \end{cases} {∇f(x)+λ∇h(x)=0h(x)=0⇒⎩ ⎨ ⎧2x1+2λx1=02x2+4λx2=0x12+2x22=1
either x 1 = 0 x_1=0 x1=0 or λ = − 1 \lambda=-1 λ=−1
λ = − 1 ⇒ { x 1 = ± 1 x 2 = 0 \lambda=-1\Rightarrow\begin{cases} x_1=\pm 1\\ x_2=0 \end{cases} λ=−1⇒{x1=±1x2=0
x 1 = 0 ⇒ { λ = − 1 2 x 2 = ± 1 2 x_1=0\Rightarrow\begin{cases} \lambda=-\frac{1}{2}\\ x_2=\pm \frac{1}{\sqrt{2}} \end{cases} x1=0⇒{λ=−21x2=±21
f ( [ 1 0 ] ) = f ( [ − 1 0 ] ) = 1 f(\begin{bmatrix} 1\\ 0 \end{bmatrix})=f(\begin{bmatrix} -1\\ 0 \end{bmatrix})=1 f([10])=f([−10])=1
f ( [ 0 1 2 ] ) = f ( [ 0 − 1 2 ] ) = 1 2 f(\begin{bmatrix} 0\\ \frac{1}{\sqrt{2}} \end{bmatrix})=f(\begin{bmatrix} 0\\ -\frac{1}{\sqrt{2}} \end{bmatrix})=\frac{1}{2} f([021])=f([0−21])=21
当 x 1 = 0 , x 2 = ± 1 2 x_1=0,x_2=\pm \frac{1}{\sqrt{2}} x1=0,x2=±21时,取到最小值 1 2 \frac{1}{2} 21
Example 3
min − x T Q x -x^TQx −xTQx
s.t. x T P x = 1 x^TPx=1 xTPx=1
P , Q > 0 , P T = P , Q T = Q P,Q>0,P^T=P,Q^T=Q P,Q>0,PT=P,QT=Q
f ( x ) = − x T Q x , h ( x ) = x T P x − 1 f(x)=-x^TQx,h(x)=x^TPx-1 f(x)=−xTQx,h(x)=xTPx−1
l ( x , λ ) = x T Q x + λ ( 1 − x T P x ) l(x,\lambda)=x^TQx+\lambda(1-x^TPx) l(x,λ)=xTQx+λ(1−xTPx)
D x l ( x , λ ) = 2 x T Q − 2 λ x T P = 0 ⇒ ( λ P − Q ) x = 0 ⇒ P − 1 Q x = λ x ⇒ λ , x D_xl(x,\lambda)=2x^TQ-2\lambda x^TP=0\Rightarrow (\lambda P-Q)x=0\Rightarrow P^{-1}Qx=\lambda x\Rightarrow \lambda,x Dxl(x,λ)=2xTQ−2λxTP=0⇒(λP−Q)x=0⇒P−1Qx=λx⇒λ,x are P − 1 Q P^{-1}Q P−1Q’s eigenvalue and eigenvector
D λ l ( x , λ ) = 1 − x T P x = 0 D_{\lambda}l(x,\lambda)=1-x^TPx=0 Dλl(x,λ)=1−xTPx=0
Q x = P λ x Qx=P\lambda x Qx=Pλx
⇒ x T Q x = λ x T P x \Rightarrow x^TQx=\lambda x^TPx ⇒xTQx=λxTPx
⇒ x T Q x = λ \Rightarrow x^TQx=\lambda ⇒xTQx=λ
⇒ λ ∗ : \Rightarrow \lambda^*: ⇒λ∗: maximal eigenvalue of P − 1 Q P^{-1}Q P−1Q
SONC
Assume f : R n → R , h : R n → R m f:\mathbb{R}^n\rightarrow \mathbb{R},h:\mathbb{R}^n\rightarrow \mathbb{R}^m f:Rn→R,h:Rn→Rm twice continuously differentiable.
l ( x , λ ) = f ( x ) + λ T h ( x ) = f ( x ) + λ 1 h 1 ( x ) + ⋯ + λ m h m ( x ) l(x,\lambda)=f(x)+\lambda^Th(x)=f(x)+\lambda_1h_1(x)+\cdots+\lambda_mh_m(x) l(x,λ)=f(x)+λTh(x)=f(x)+λ1h1(x)+⋯+λmhm(x)
L ( x , λ ) = F ( x ) + λ 1 H 1 ( x ) + ⋯ + λ m H m ( x ) L(x,\lambda)=F(x)+\lambda_1H_1(x)+\cdots+\lambda_mH_m(x) L(x,λ)=F(x)+λ1H1(x)+⋯+λmHm(x)
Thm(SONC): x ∗ x^* x∗ a local minimizer of f : R n → R f:\mathbb{R}^n\rightarrow \mathbb{R} f:Rn→R with h ( x ) = 0 , h : R n → R m , m ≤ n , f , h ∈ C 2 h(x)=0,h:\mathbb{R}^n\rightarrow \mathbb{R}^m,m\leq n,f,h\in C^2 h(x)=0,h:Rn→Rm,m≤n,f,h∈C2. Then, ∃ λ ∗ ∈ R m \exist \lambda^*\in \mathbb{R}^m ∃λ∗∈Rm, s.t. { D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 ∀ y ∈ T ( x ∗ ) = { y : D h ( x ∗ ) y = 0 } : y T L ( x ∗ , λ ∗ ) y ≥ 0 \begin{cases} Df(x^*)+{\lambda^*}^TDh(x^*)=0\\ \forall y\in T(x^*)=\{y:Dh(x^*)y=0\}:y^TL(x^*,\lambda^*)y\geq 0 \end{cases} {Df(x∗)+λ∗TDh(x∗)=0∀y∈T(x∗)={y:Dh(x∗)y=0}:yTL(x∗,λ∗)y≥0
SOSC
f , h ∈ C 2 f,h\in C^2 f,h∈C2, If ∃ x ∗ ∈ R n , λ ∗ ∈ R m \exist x^*\in\mathbb{R}^n,\lambda^*\in \mathbb{R}^m ∃x∗∈Rn,λ∗∈Rm, s.t.
- D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)=0 Df(x∗)+λ∗TDh(x∗)=0
- ∀ y ∈ T ( x ∗ ) : y T L ( x ∗ , λ ∗ ) y > 0 \forall y\in T(x^*):y^TL(x^*,\lambda^*)y>0 ∀y∈T(x∗):yTL(x∗,λ∗)y>0
then x ∗ x^* x∗ is a strict local minimizer of f ( x ) f(x) f(x) w.r.t. h ( x ) = 0 h(x)=0 h(x)=0
Example 1
max x T Q x x^TQx xTQx
s.t. x T P x = 1 x^TPx=1 xTPx=1
Q = [ 4 0 0 1 ] , P = [ 2 0 0 1 ] Q=\begin{bmatrix} 4&0\\ 0&1 \end{bmatrix},P=\begin{bmatrix} 2&0\\ 0&1 \end{bmatrix} Q=[4001],P=[2001]
P − 1 Q = [ 2 0 0 1 ] P^{-1}Q=\begin{bmatrix} 2&0\\ 0&1 \end{bmatrix} P−1Q=[2001]
⇒ λ 1 = 2 , λ 2 = 1 \Rightarrow \lambda_1=2,\lambda_2=1 ⇒λ1=2,λ2=1
⇒ λ ∗ = 2 \Rightarrow \lambda^*=2 ⇒λ∗=2
⇒ x ∗ = [ 1 2 , 0 ] T \Rightarrow x^*=[\frac{1}{\sqrt{2}},0]^T ⇒x∗=[21,0]T or x ∗ = [ − 1 2 , 0 ] T x^*=[-\frac{1}{\sqrt{2}},0]^T x∗=[−21,0]T
Example 2
Consider min 1 2 x T Q x \frac{1}{2}x^TQx 21xTQx
s.t. A x = b Ax=b Ax=b
Q > 0 , Q = Q T , A ∈ R m × n , m ≤ n , b ∈ R m , r a n k A = m Q>0,Q=Q^T,A\in\mathbb{R}^{m\times n},m\leq n, b\in\mathbb{R}^m,rankA=m Q>0,Q=QT,A∈Rm×n,m≤n,b∈Rm,rankA=m
l ( x , λ ) = 1 2 x T Q x + λ T ( b − A x ) l(x,\lambda)=\frac{1}{2}x^TQx+\lambda^T(b-Ax) l(x,λ)=21xTQx+λT(b−Ax)
D x l ( x , λ ) = x T Q − λ T A = 0 D_xl(x,\lambda)=x^TQ-\lambda^TA=0 Dxl(x,λ)=xTQ−λTA=0
⇒ x = Q − 1 A T λ \Rightarrow x=Q^{-1}A^T\lambda ⇒x=Q−1ATλ
⇒ A x = A Q − 1 A T λ \Rightarrow Ax=AQ^{-1}A^T\lambda ⇒Ax=AQ−1ATλ
⇒ λ = ( A Q − 1 A T ) − 1 b \Rightarrow \lambda=(AQ^{-1}A^T)^{-1}b ⇒λ=(AQ−1AT)−1b
⇒ x = Q − 1 A T ( A Q − 1 A T ) − 1 b \Rightarrow x=Q^{-1}A^T(AQ^{-1}A^T)^{-1}b ⇒x=Q−1AT(AQ−1AT)−1b
L ( x , λ ) = Q > 0 L(x,\lambda)=Q>0 L(x,λ)=Q>0
Case 2
min f ( x ) f(x) f(x)
s.t. h ( x ) = 0 h(x)=0 h(x)=0
g ( x ) ≤ 0 g(x)\leq 0 g(x)≤0
f : R n → R f:\mathbb{R}^n\rightarrow \mathbb{R} f:Rn→R
h : R n → R m , m ≤ n h:\mathbb{R}^n\rightarrow \mathbb{R}^m,m\leq n h:Rn→Rm,m≤n
g : R n → R p g:\mathbb{R}^n\rightarrow \mathbb{R}^p g:Rn→Rp
Definition
Def: An inequality constraint g j ( x ) ≤ 0 g_j(x)\leq 0 gj(x)≤0 is called active at x ∗ x^* x∗, if g j ( x ∗ ) = 0 g_j(x^*)=0 gj(x∗)=0; otherwise, inactive.
Def: Let x ∗ x^* x∗ satisfy h ( x ∗ ) = 0 h(x^*)=0 h(x∗)=0 and g ( x ∗ ) ≤ 0 g(x^*)\leq 0 g(x∗)≤0. Let J ( x ∗ ) = { j : g j ( x ∗ ) = 0 } , x ∗ J(x^*)=\{j: g_j(x^*)=0\},x^* J(x∗)={j:gj(x∗)=0},x∗ is called regular, if ∇ h i ( x ∗ ) \nabla h_i(x^*) ∇hi(x∗) for all 1 ≤ i ≤ m 1\leq i\leq m 1≤i≤m and ∇ g i ( x ∗ ) \nabla g_i(x^*) ∇gi(x∗) for all j ∈ J ( x ∗ ) j\in J(x^*) j∈J(x∗) are linear independent.
KKT-Theorem(FONC)
Let f , h , g ∈ C 1 , x ∗ f,h,g\in C^1, x^* f,h,g∈C1,x∗ be a regular point and a local minimizer of f ( x ) f(x) f(x) w.r.t. h ( x ∗ ) = 0 h(x^*)=0 h(x∗)=0 and g ( x ∗ ) ≤ 0 g(x^*)\leq 0 g(x∗)≤0. Then, there exist λ ∗ ∈ R m \lambda^*\in\mathbb{R}^m λ∗∈Rm and μ ∗ ∈ R p \mu^*\in\mathbb{R}^p μ∗∈Rp s.t.
- μ ∗ ≥ 0 \mu^*\geq 0 μ∗≥0
- D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) + μ ∗ T D g ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)+{\mu^*}^TDg(x^*)=0 Df(x∗)+λ∗TDh(x∗)+μ∗TDg(x∗)=0
- μ ∗ T g ( x ∗ ) = 0 {\mu^*}^Tg(x^*)=0 μ∗Tg(x∗)=0
Example 1
min − 400 R ( 10 + R ) 2 -\frac{400R}{(10+R)^2} −(10+R)2400R
s.t. − R ≤ 0 -R\leq 0 −R≤0
∇ f ( R ) = − 400 ( 10 − R ) ( 10 + R ) 3 \nabla f(R)=-\frac{400(10-R)}{(10+R)^3} ∇f(R)=−(10+R)3400(10−R)
{ μ ≥ 0 D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) + μ ∗ T D g ( x ∗ ) = 0 μ T g ( x ) = 0 g ( x ) ≤ 0 h ( x ) = 0 \begin{cases} \mu\geq 0\\ Df(x^*)+{\lambda^*}^TDh(x^*)+{\mu^*}^TDg(x^*)=0\\ \mu^T g(x)=0\\ g(x)\leq 0\\ h(x)=0 \end{cases} ⎩ ⎨ ⎧μ≥0Df(x∗)+λ∗TDh(x∗)+μ∗TDg(x∗)=0μTg(x)=0g(x)≤0h(x)=0
⇒ { μ ≥ 0 − 400 ( 10 − R ) ( 10 + R ) 3 − μ = 0 μ R = 0 R ≥ 0 \Rightarrow \begin{cases} \mu\geq 0\\ -\frac{400(10-R)}{(10+R)^3}-\mu=0\\ \mu R=0\\ R\geq 0 \end{cases} ⇒⎩ ⎨ ⎧μ≥0−(10+R)3400(10−R)−μ=0μR=0R≥0
If μ > 0 \mu>0 μ>0, then R = 0 , μ = − 4 R=0,\mu=-4 R=0,μ=−4(✕)
If μ = 0 ⇒ R = 10 \mu=0\Rightarrow R=10 μ=0⇒R=10(✓ )
Example 2
min − 4000 ( 10 + R ) 2 -\frac{4000}{(10+R)^2} −(10+R)24000
s.t. − R < 0 -R<0 −R<0
∇ f ( R ) = 8000 ( 10 + R ) 3 \nabla f(R)=\frac{8000}{(10+R)^3} ∇f(R)=(10+R)38000
KKT: { μ ≥ 0 8000 ( 10 + R ) 3 − μ = 0 μ R = 0 R ≥ 0 \begin{cases} \mu\geq 0\\ \frac{8000}{(10+R)^3}-\mu=0\\ \mu R=0\\ R\geq 0 \end{cases} ⎩ ⎨ ⎧μ≥0(10+R)38000−μ=0μR=0R≥0
μ = 0 ⇒ \mu=0\Rightarrow μ=0⇒ no solution(✕)
μ > 0 ⇒ R = 0 , μ = 8 \mu>0\Rightarrow R=0,\mu=8 μ>0⇒R=0,μ=8(✓ )
总结
这节课主要介绍了非线性约束优化问题。按照不同的约束条件,把问题分为了两种情形。第一种情形是只有等式约束,第二种情形既有等式约束又有不等式约束。在第一种情形下,重点介绍了拉格朗日条件,并在二维情况下推导出了拉格朗日条件。由于拉格朗日条件是一阶必要条件(FONC),又进一步介绍了用拉格朗日条件来求最值的拉格朗日乘数法。然后简要地介绍了二阶必要条件(SONC)和二阶充分条件(SOSC)。最后考虑了第二种情形,并给出了KKT条件。