[足式机器人]Part4 南科大高等机器人控制课 CH11 Bascis of Optimization

本文仅供学习使用
本文参考：
B站：CLEAR_LAB
笔者带更新-运动学
课程主讲教师：
Prof. Wei Zhang
课程链接：
https://www.wzhanglab.site/teaching/mee-5114-advanced-control-for-robotics/

南科大高等机器人控制课 Ch11 Bascis of Optimization

1. Motivation
2. Some Linear Algebra
- 2.1 Real Symmetric Matrices
- 2.2 Positive Semidefinite Matrices
3. Set and Functions
- 3.1 Affine Sets and Functions
- 3.2 Qyadratic Sets and Functions
- 3.3 Convex Set
- 3.4 Cone
- 3.5 Positve Semidefinite Cone
- 3.6 Operations that Preserve Convexity
- 3.7 Convex Function
- 3.8 How to Check a Function of Convex?
- 3.9 Example of Convex Functions
4. Short Introduction to Optimization
- 4.1 Nonlinear Optimiazation Problems
- 4.2 Lagrangian
- - 4.2.1 Lagrangian Dual Problems
  - 4.2.2 Duality Theorems
- 4.3 General Optimality Conditions
- 4.4 KKT Conditions
5. Linear Program
6. Quadratic Program

1. Motivation

Optimization is argulably the most important tool for modern engineering

Robotics:

Differential Inverse Kinematics
Dynamics : ABA(most efficient dynamics algorithm) and LQR
Motion planning
Whole-body control: formulated as a quadratic program
SLAM
Preception

Machine Learning

Linear regression
Support vector machine
Deep learning —— minimize ‘loss’ function

Other domains

Check system stability : SDP
Compressive sensing
Fourier transform : keast square problem

Roughly speaking, most engineering problems (finding a better design, ensure certain properties of the solution, develop an algorithm), can be formulated as optimization / optimal control problems.
在这里插入图片描述
Our goal :

Basic knowledge/key concepts of opt. theory
Formulate / Reformulate opt. problem
Educated users of tools/packages

2. Some Linear Algebra

2.1 Real Symmetric Matrices

$\mathcal{S} ^n\in \mathbb{R} ^{n\times n}$ : set of real symmetric matrices in $\mathbb{R} ^n$ , $A\in \mathcal{S} ^n\Leftrightarrow A^{\mathrm{T}}=A$

All eigenvalues are real (diagonalizable) —— Important

There exists a full set of orthogonal eigenvectors $A\in \mathcal{S} ^n,A=T\varLambda T^{-1}$ nonsigular matrix

Spectral decomposition : If $A\in \mathcal{S} ^n$ , then $A=Q\varLambda Q^{-1}$ , where $\varLambda$ diagonal and $Q$ is unitary —— $Q^{\mathrm{T}}Q=E$ $Q=\left[ q_1,...,q_{\mathrm{n}} \right]$ $q_{\mathrm{i}}$ is $i$ th-column of $Q$ —— $\Rightarrow {q_{\mathrm{i}}}^{\mathrm{T}}q_{\mathrm{j}}=\begin{cases} 0 i=j\\ 1 otherwise\\ \end{cases}$ , $\left\{ q_{\mathrm{i}} \right\}$ orthonormal

2.2 Positive Semidefinite Matrices

$A\in \mathcal{S} ^n$ is called positive semidefinite(PSD), denoted by $A\succeq 0$ , if $x^{\mathrm{T}}Ax\geqslant 0,\forall x\in \mathbb{R} ^n$

$A\in \mathcal{S} ^n$ is called positive definite(PD) , denoted by $A\succ 0$ , $x^{\mathrm{T}}Ax>0$ for all nonzero $x\in \mathbb{R} ^n$

$\mathcal{S} _{+}^{n}$ : set of all PSD (symmetric) matrices

$\mathcal{S} _{++}^{n}$ : set of all PD (symmetric) matrices

PSD or PD matrices can also be defined for non-symmetric matrices : e.g. $\left[ \begin{matrix} 1& 1\\ -1& 1\\ \end{matrix} \right] \Rightarrow x^{\mathrm{T}}\left[ \begin{matrix} 1& 1\\ -1& 1\\ \end{matrix} \right] x={x_1}^2+{x_2}^2$

We assume PSD and PD are symmetric (unless otherwise noted)

Notation : $A\succeq B$ (resp. $A\succ B$ ) means $A-B\in \mathcal{S} _{+}^{n}$ (resp. $A-B\in \mathcal{S} _{++}^{n}$ ) —— $A - B$ PSD - defined a partial order on $\mathcal{S} ^n$ —— It is possible to have $A\nsucc B,A\nsucceq B$

Other equivalent definitions for symmetric PSD matrices :

All $2^n-1$ principal minors of $A$ are nonnegative
All eigs of $A$ are nonnegative
There exists a factorization $A=B^{\mathrm{T}}B$

Other equivalent definitions for symmetric PD matrices :

All $n$ principal minors of $A$ are positive
All eigs of $A$ are strictly positive
There exists a factorization $A=B^{\mathrm{T}}B$ with $B$ square and nonsingular
If $A > 0$ , $A=Q\varLambda Q^{\mathrm{T}}=Q\varLambda ^{\frac{1}{2}}\varLambda ^{\frac{1}{2}}Q^{\mathrm{T}}=B^{\mathrm{T}}B, B=\varLambda ^{\frac{1}{2}}Q^{\mathrm{T}}$

Useful facts :

If $T$ nonsigular(doesn’t need to unitary) , $A\succ 0\Leftrightarrow T^{\mathrm{T}}AT\succ 0$ and $A\succeq 0\Leftrightarrow T^{\mathrm{T}}AT\succeq 0$
Recall : $TAT^{-1}$ : similarity transformation $\mathcal{S} _{+}^{n}$ ; $T^{\mathrm{T}}AT$ : congruent transformation $\mathcal{S} _{++}^{n}$ —— are invariant under congruent transformation
Inner product on $\mathbb{R} ^{m\times n}$ : $<A,B>=tr\left( A^{\mathrm{T}}B \right) =A\cdot B$
$\forall A\in \mathbb{R} ^{m\times n},B\in \mathbb{R} ^{m\times n}\,\,tr\left( A^{\mathrm{T}}B \right) =\sum_{i=1}^m{\sum_{j=1}^n{A_{\mathrm{ij}}B_{\mathrm{ij}}}}$ , Angle between $A, B$ $\cos \theta =\frac{<A,B>}{\sqrt{<A,A><B,B>}},\begin{cases} A\bot B\Rightarrow tr\left( A^{\mathrm{T}}B \right) =0\\ tr\left( A^{\mathrm{T}}B \right) >0\Rightarrow acute\\ \end{cases}$
For $A,B\in \mathcal{S} _{+}^{n},tr\left( AB \right) >0$ —— $A, B$ square symmetric PSD : $<A,B>=tr\left( A^{\mathrm{T}}B \right) =tr\left( AB \right) \Rightarrow tr\left( AB \right) \geqslant 0$
For ant symmetric $A\in \mathcal{S} ^n$ , $\lambda _{\min}\left( A \right) \geqslant \mu \Leftrightarrow A\succeq \mu E$ and $\lambda _{\max}\left( A \right) \leqslant \beta \Leftrightarrow A\preceq \beta E$ (easy proof)

3. Set and Functions

3.1 Affine Sets and Functions

Linear mapping : $f\left( x+y \right) =f\left( x \right) +f\left( y \right) ,f\left( \alpha x \right) =\alpha f\left( x \right)$ , for any $x, y$ in some vector space , and $\alpha \in \mathbb{R}$

Examples:

$f\left( x \right) =Ax,x\in \mathbb{R} ^3,A\in SO\left( 3 \right)$
$f\left( x \right) =\int{x\left( \tau \right) d\tau}$ , for all integrable function $x\left( \cdot \right)$
$E\left( x \right)$ expection of random variable/vector $x$ —— $E\left( x \right) =\int{xf\left( x \right) dx}$
$f\left( x \right) =tr\left( x \right) ,x\in \mathbb{R} ^{n\times n}$

Affine mapping : $f\left( x \right)$ is an affine mapping of $x$ if $g\left( x \right) =f\left( x \right) -f\left( x_0 \right)$ is a linear mapping for some fixed $x_0$

Finite-deimension representation fo affine function : $f\left( x \right) =Ax+b$ —— $g\left( x \right) =f\left( x \right) -f\left( 0 \right) =Ax+b-b=Ax$

Homogeneous representation in $\mathbb{R} ^n$ : $f\left( x \right) =Ax+b\Leftrightarrow \hat{f}\left( x \right) =\hat{A}\hat{x},\hat{A}=\left[ \begin{matrix} A& b\\ 0& 1\\ \end{matrix} \right] ,\hat{x}=\left[ \begin{array}{c} x\\ 1\\ \end{array} \right]$

Linear and affine are often used interchangeably

Linear/affine sets: $\left\{ x:f\left( x \right) \leqslant 0 \right\}$ ofr affine mapping $f$

Line/hyperplane : $a^{\mathrm{T}}x=b$
$a^{\mathrm{T}}x=b\Rightarrow a^{\mathrm{T}}\left( x-x_0 \right) =0\Rightarrow a^{\mathrm{T}}x-a^{\mathrm{T}}x_0=0,a^{\mathrm{T}}x_0=b$
Half space : $a^{\mathrm{T}}x\leqslant b$ —— $a^{\mathrm{T}}x-a^{\mathrm{T}}x_0\leqslant 0$
Polyhedron : $Hx\leqslant h$ —— $H\in \mathbb{R} ^{m\times n},x\in \mathbb{R} ^n,h\in \mathbb{R} ^m$
$\left[ \begin{array}{c} {H_1}^{\mathrm{T}}\\ \vdots\\ {H_{\mathrm{m}}}^{\mathrm{T}}\\ \end{array} \right] x\leqslant \left[ \begin{array}{c} h_1\\ \vdots\\ h_{\mathrm{m}}\\ \end{array} \right]$ —— Imposes $m$ inequality ${H_{\mathrm{i}}}^{\mathrm{T}}x\leqslant h_{\mathrm{i}}$ —— half space
For matrix variable $X\in \mathbb{R} ^{n\times n}$ , $tr\left( AX \right) \leqslant 0$ for given constant matrix $A\in \mathbb{R} ^{n\times n}$ is halfspace in $\mathbb{R} ^{n\times n}$

3.2 Qyadratic Sets and Functions

Quadratic functions in $\mathbb{R} ^n$ : $f\left( x \right) =x^{\mathrm{T}}Ax+b^{\mathrm{T}}x+c,x=\left[ \begin{array}{c} x_1\\ \vdots\\ x_{\mathrm{n}}\\ \end{array} \right] ,f:\mathbb{R} ^n\rightarrow \mathbb{R}$

Quadratic functions (honogeneous form) : $\hat{x}=\left[ \begin{array}{c} x\\ 1\\ \end{array} \right] ,\hat{f}\left( x \right) =\left[ \begin{array}{c} x\\ 1\\ \end{array} \right] ^{\mathrm{T}}\left[ \begin{matrix} A& \frac{b}{2}\\ \frac{b}{2}& c\\ \end{matrix} \right] \left[ \begin{array}{c} x\\ 1\\ \end{array} \right]$ —— $\hat{f}\left( x \right) =\hat{x}^{\mathrm{T}}\hat{A}\hat{x}$ ( $A\in \mathcal{S} _{+}^{n}\Leftrightarrow f\left( x \right) \geqslant 0,\forall x\in \mathbb{R} ^n$ ) —— $f$ - PSD $f\left( x \right) >0$ for all $x\ne 0$ ; $f\left( x \right) =0$ for all $x = 0$

Quadratic sets : $\left\{ x\in \mathbb{R} ^n:f\left( x \right) \leqslant 0 \right\}$ for some quadratic function $f$
eg1: Ball —— $\left\{ x\in \mathbb{R} ^n\left\| x-x_{\mathrm{c}} \right\| ^2\leqslant {r_{\mathrm{c}}}^2 \right\}$ $\Rightarrow f\left( x \right) =\left( x-x_{\mathrm{c}} \right) ^{\mathrm{T}}\left( x-x_{\mathrm{c}} \right) -{r_{\mathrm{c}}}^2\leqslant 0$
eg2 : Ellipsoid : $\left\{ x\in \mathbb{R} ^n\left( x-x_{\mathrm{c}} \right) ^{\mathrm{T}}P^{-1}\left( x-x_{\mathrm{c}} \right) \leqslant 1,P\in \mathcal{S} _{++}^{n} \right\}$

3.3 Convex Set

Convex Set : A set $S$ is convex if any line segment stays in the set
$x_1,x_2\in S\Rightarrow \alpha x_1+\left( 1-\alpha \right) x_2\in S,\forall \alpha \in \left[ 0,1 \right] \Rightarrow \alpha _1x_1+\alpha _2x_2,\alpha _1+\alpha _2=1,\alpha _1\geqslant 0,\alpha _2\geqslant 0$

convex combination of $x_1,x_2$

Convex combination of $x_1,...,x_{\mathrm{k}}$ :
$\left\{ \alpha _1x_1+\alpha _2x_2+...+\alpha _{\mathrm{k}}x_{\mathrm{k}}:\alpha _{\mathrm{i}}\geqslant 0,\sum_i{\alpha _{\mathrm{i}}}=1 \right\}$

Convex hull-凸包 : $\overline{co}\left\{ S \right\}$ set of all convex combinations of points in $S$

3.4 Cone

A set $S$ is called a cone if $\lambda >0,x\in S\Rightarrow \lambda x\in S$
在这里插入图片描述
Conic-圆锥的 combination of $x_1$ and $x_2$ : $x=\alpha _1x_1+\alpha _2x_2,\alpha _1\geqslant 0,\alpha _2\geqslant 0$ —— $cone\left( x_1,...,x_{\mathrm{k}} \right) =\left\{ \sum_i{\alpha _{\mathrm{i}}x_{\mathrm{i}}}:\alpha _{\mathrm{i}}\geqslant 0 \right\}$

Convex cone:

a cone that is convex
equivalently,a set that contains all the conic combinations of points in the set

3.5 Positve Semidefinite Cone

The set of positive semidefinite matrices(i.e, $\mathcal{S} _{+}^{n}$ is a convex cone and is referred to as the positive semidefinite(PSD) cone) —— $\mathcal{S} _{+}^{n}$ : set of PSD $A\in \mathcal{S} _{+}^{n}\Rightarrow \lambda A\geqslant 0\Rightarrow \lambda A\in \mathcal{S} _{+}^{n}$ $\mathcal{S} _{+}^{n}$ is a cone
By definition : pick arbitrary $A,B\in \mathcal{S} _{+}^{n}$ , $\alpha A+\left( 1-\alpha \right) B\in \mathcal{S} _{+}^{n},\alpha \in \left[ 0,1 \right]$ ( $\Rightarrow x^{\mathrm{T}}\left( \alpha A+\left( 1-\alpha \right) B \right) x=\alpha x^{\mathrm{T}}Ax+\left( 1-\alpha \right) x^{\mathrm{T}}Bx\geqslant 0$ )

Recall that if $A,B\in \mathcal{S} _{+}^{n}$ , then $tr\left( AB \right) \geqslant 0$ . This indicates that the cone $\mathcal{S} _{+}^{n}$ is acute.

$x_1\in \mathbb{R} ^n,x_2\in \mathbb{R} ^n$
$\alpha _1x_1+\alpha _2x_2$ linear combination
$\alpha _1x_1+\alpha _2x_2$ $\alpha _1\geqslant 0,\alpha _2\geqslant 0$ conic combination
$\alpha _1x_1+\alpha _2x_2$ $\alpha _1\geqslant 0,\alpha _2\geqslant 0$ $\alpha _1+\alpha _2=1$ convex combination

3.6 Operations that Preserve Convexity

Intersection of possibly infinite number of convex sets is convex
eg: polyhedron —— ${H_1}^{\mathrm{T}}x\leqslant h_1,{H_2}^{\mathrm{T}}x\leqslant h_2,\left[ \begin{array}{c} {H_1}^{\mathrm{T}}\\ {H_2}^{\mathrm{T}}\\ \end{array} \right] x\leqslant \left[ \begin{array}{c} h_1\\ h_2\\ \end{array} \right]$
eg: PSD cone

Affine mapping $f:\mathbb{R} ^n\rightarrow \mathbb{R} ^m$ (i.e. $f\left( x \right) =Ax+b$ )

$f\left( X \right) =\left\{ f\left( x \right) :x\in X \right\}$ is convex whenever $X\subseteq \mathbb{R} ^n$ is convex
e.g. : Ellipsoid : $E_1=\left\{ x\in \mathbb{R} ^n:\left( x-x_{\mathrm{c}} \right) ^{\mathrm{T}}P^{-1}\left( x-x_{\mathrm{c}} \right) \leqslant 1 \right\}$ or $E_2=\left\{ x_{\mathrm{c}}+Au:\left\| u \right\| _2\leqslant 1 \right\}$
$f^{-1}\left( Y \right) =\left\{ x\in \mathbb{R} ^n:f\left( x \right) \in Y \right\}$ is convex whenever $Y\subseteq \mathbb{R} ^m$ is convex
e.g. $\left\{ Ax\leqslant b \right\} =f^{-1}\left( \mathbb{R} _{+}^{n} \right)$ , where $\mathbb{R} _{+}^{n}$ in nonnegative orthant

3.7 Convex Function

Consider a finite dimensional vector space $\chi$ . Let $\mathcal{D} \subset \chi$ be convex

Definition 1 (Convex Function)
A function $f:\mathcal{D} \rightarrow \mathbb{R}$ is called convex if
$f\left( \alpha x_1+\left( 1-\alpha \right) x_2 \right) \leqslant \alpha f\left( x_1 \right) +\left( 1-\alpha \right) f\left( x_2 \right) ,\forall x_1,x_2\in \mathcal{D} ,\forall \alpha \in \left[ 0,1 \right]$

$f:\mathcal{D} \rightarrow \mathbb{R}$ is called strictly convex if
$f\left( \alpha x_1+\left( 1-\alpha \right) x_2 \right) <\alpha f\left( x_1 \right) +\left( 1-\alpha \right) f\left( x_2 \right) ,\forall x_1\ne x_2\in \mathcal{D} ,\forall \alpha \in \left[ 0,1 \right]$
$f:\mathcal{D} \rightarrow \mathbb{R}$ is called concave if $- f$ is convex

3.8 How to Check a Function of Convex?

Directly use definition

First-order condition : if $f$ is differentiable over an open set that contains $\mathcal{D}$ , then $f$ is convex over $\mathcal{D}$ iff(if and only if) —— stay above Taylor around $x$
$f\left( z \right) \geqslant f\left( x \right) +\nabla f\left( x \right) ^{\mathrm{T}}\left( z-x \right) ,\forall x,z\in \mathcal{D}$
Second-order condition: Suppose $f$ is twicely differentiable over an open set that contains $\mathcal{D}$ , then $f$ is convex over $\mathcal{D}$ iff
$\nabla ^2f\left( x \right) \succeq 0$
(concave $\nabla ^2f\left( x \right) \preceq 0$ )
Many other conditions , tricks,…

3.9 Example of Convex Functions

In general , affine functions are both convex and concave
e.g. : $f\left( x \right) =a^{\mathrm{T}}x+b,x\in \mathbb{R} ^n$
e.g. : $f\left( X \right) =tr\left( A^{\mathrm{T}}X \right) +c=\sum_{i=1}^m{\sum_{j=1}^n{A_{\mathrm{ij}}X_{\mathrm{ij}}+c}},X\in \mathbb{R} ^{m\times n}$
$f:\mathbb{R} ^{m\times n}\rightarrow scalar$ / affine func of $X$ (matrix)

Quadratic functions : $f\left( x \right) =x^{\mathrm{T}}Qx+b^{\mathrm{T}}x+c$ is convex iff $Q\succeq 0$
unsing 2nd-order condition $\nabla ^2f\left( x \right) =\left[ \begin{matrix} \frac{\partial ^2f}{\partial x_1\partial x_1}& \frac{\partial ^2f}{\partial x_1\partial x_2}& \cdots\\ \vdots& \frac{\partial ^2f}{\partial x_2\partial x_2}& \cdots\\ \vdots& \vdots& \ddots\\ \end{matrix} \right] =Q$

All norms are convex
e.g. : in $\mathbb{R} ^n$ : $f\left( x \right) =\left\| x \right\| _{\mathrm{p}}=\left( \sum_{i=1}^n{\left| x_{\mathrm{i}} \right|^p} \right) ^{1/p}$ , $\left\| x \right\| _{\infty}=\max _{\mathrm{k}}\left| x_{\mathrm{k}} \right|$
e.g. : in $\mathbb{R} ^{m\times n}$ : $f\left( X \right) =\left\| X \right\| _2=\sigma _{\max}$

Affine mapping of convex func is still convex
e.g. : suppose $f\left( x \right)$ convex $\Rightarrow$ $g\left( x \right) =af\left( x \right) +b$ is also convex

Pointwise maximum of convex func is convex
e.g. : suppose $f_1\left( x \right) ,f_2\left( x \right)$ are convex $\Rightarrow$ $g\left( x \right) =\max \left\{ f_1\left( x \right) ,f_2\left( x \right) \right\}$ is convex
在这里插入图片描述
e.g. : suppose $f\left( x,\theta \right)$ is convex for each $\theta \in \left[ 1,2 \right]$ , then $g\left( x \right) =\underset{\theta \in \left[ 1,2 \right]}{\max}\left\{ f\left( x,\theta \right) \right\}$ convex —— $f\left( x,\theta \right) =\theta x+b$ $\Rightarrow$ $g\left( x \right) =\underset{\theta \in \left[ 1,2 \right]}{\max}\left\{ \theta x+b \right\}$

Pointwise minimum of concave func is concave —— $S\left( x \right) =\underset{\theta \in \left[ 1,2 \right]}{\min}\left\{ \theta x+b \right\}$ is concave

4. Short Introduction to Optimization

4.1 Nonlinear Optimiazation Problems

Nonlinear Optimiazation： Primal problem
minimize : $f_0\left( x \right)$ —— cost func $f:\mathbb{R} ^n\rightarrow \mathbb{R}$ , $x=\left[ \begin{array}{c} x_1\\ \vdots\\ x_{\mathrm{n}}\\ \end{array} \right] \in \mathbb{R} ^n$
subject to : $f_{\mathrm{i}}\left( x \right) \leqslant 0,i=1,\cdots ,m , h_{\mathrm{j}}\left( x \right) =0,j=1,\cdots ,q$ —— constrain set $C=\left\{ x\in \mathbb{R} ^n:f_{\mathrm{i}}\left( x \right) \leqslant 0,h_{\mathrm{j}}\left( x \right) =0 \right\}$ , if $x\in C$ , then $x$ is called feasible

decison variable $x\in \mathbb{R} ^n$ , domain $\mathcal{D}$ , referred to as primal problem

optimal value $p^*$

is called a convex optimization problem if $f_0,...,f_{\mathrm{m}}$ are convex and $h_1,...,h_{\mathrm{q}}$ are affine —— means objective function $f_0$ is convex and constrain set is convex

typically convex optimization can be solved efficiently

Categories :
objective func (Linear/affine) + constrain set/func(Linear/affine) —— Linear Program LP
objective func (Quardratic - convex) + constrain set/func(Linear/affine) —— Quardratic Program QP
objective func (Quardratic - convex) + constrain set/func(uardratic) —— Quardratic Constrained Quardratic Program QCQP - Hard to solve
How to find optimal solutions?
optimality condition: for unconstrained problems : 1st-order optimality condition $x^*$ is local minimizer then $\nabla f\left( x^* \right) =0$ (Taylor expension)
For convex problem , above condition guarantees $x^*$ is global minimizer

Question : what about constrained optimization?

4.2 Lagrangian

Associated Lagrangian : $L:\mathcal{D} \times \mathbb{R} ^m\times \mathbb{R} ^q\rightarrow \mathbb{R}$
$L\left( x,\lambda ,\nu \right) =f_0\left( x \right) +\sum_{i=1}^m{\lambda _{\mathrm{i}}f_{\mathrm{i}}\left( x \right)}+\sum_{j=1}^q{\nu _{\mathrm{j}}h_{\mathrm{j}}\left( x \right)},\lambda _{\mathrm{i}}\geqslant 0,\nu _{\mathrm{j}}\geqslant 0$
weighted sum of objective and constraints functions
$\lambda _{\mathrm{i}}$ : Lagrangian multiplier associated with $f_{\mathrm{i}}\left( x \right) \leqslant 0$
$\nu _{\mathrm{j}}$ : Lagrangian multiplier associated with $h_{\mathrm{j}}\left( x \right) =0$

4.2.1 Lagrangian Dual Problems

Lagrangian Dual Problems : $g:\mathbb{R} ^m\times \mathbb{R} ^q\rightarrow \mathbb{R}$
$g\left( \lambda ,\nu \right) =\underset{x\in \mathcal{D}}{\mathrm{inf}}L\left( x,\lambda ,\nu \right) =\underset{x\in \mathcal{D}}{\mathrm{inf}}\left\{ f_0\left( x \right) +\sum_{i=1}^m{\lambda _{\mathrm{i}}f_{\mathrm{i}}\left( x \right)}+\sum_{j=1}^q{\nu _{\mathrm{j}}h_{\mathrm{j}}\left( x \right)} \right\}$

$g$ is convex(always true - regardless fo whether the primal peoblem is convex or not) , can be $-\infty$ for some $\lambda ,\nu$
Lower bound property : If $\lambda \succeq 0$ (elementwise) , then $g\left( \lambda ,\nu \right) \leqslant p^*$
Let $\tilde{x}$ be arbitrary feasible primal variable and $\lambda \geqslant 0$ , $f_0\left( \tilde{x} \right) \geqslant L\left( \tilde{x},\lambda ,\nu \right) \geqslant \underset{x\in \mathcal{D}}{\mathrm{inf}}L\left( x,\lambda ,\nu \right) =g\left( \lambda ,\nu \right) \Rightarrow \underset{\tilde{x}\,\,feasible}{\min}f_0\left( \tilde{x} \right) \geqslant g\left( \lambda ,\nu \right)$

Lagrangian Dual Problems :
maximize : $g\left( \lambda ,\nu \right)$
subject to : $\lambda \succeq 0$
$\Leftrightarrow$ change convex optimization problem
min : $-g\left( \lambda ,\nu \right)$
subject to : $-\lambda \preceq 0$

Fined the best lower bound on $p^*$ using the Lagrange dual function

Dual problem is a convex optimization problem even when the primal is nonconvex

optimal value denoted $d^*$

$\left( \lambda ,\nu \right)$ is called dual feasible if $\lambda \succeq 0$ and $\left( \lambda ,\nu \right) \in dom\left( g \right)$

Often simplified by making the implicit constraint $\left( \lambda ,\nu \right) \in dom\left( g \right)$ explicit

例子-见 5

4.2.2 Duality Theorems

Weak Duality : $d^*\leqslant p^*$
always hold (for convex and nonconvex problems)
can be used to find nontrivial lower bounds for difficult problems
Strong Duality : $d^*= p^*$
not true in general, but typically holds for convex problems
conditions that guarantee strong duality in convex problems are called constriant qualifications
Slater’s constraint qualification : Primal is strictly feasible

4.3 General Optimality Conditions

For general optimization problem:
minimize : $f_0\left( x \right)$
subject to : $f_{\mathrm{i}}\left( x \right) \leqslant 0,i=1,\cdots ,m,h_{\mathrm{j}}\left( x \right) =0,j=1,\cdots ,q$

General Optimality Conditions : strong duality and $\left( x^*,\lambda ^*,\nu ^* \right)$ is primal-dual optimal $\Leftrightarrow$

$x^*=arg\min _{\mathrm{x}}L\left( x,\lambda ^*,\nu ^* \right)$ —— Lagrange optimality
$\lambda _{\mathrm{i}}^{*}f_{\mathrm{i}}\left( x \right) =0,\forall i$ —— Complementarity
$f_{\mathrm{i}}\left( x^* \right) \leqslant 0,h_{\mathrm{j}}\left( x^* \right) =0,\forall i,j$ —— primal feasibility
$\lambda _{\mathrm{i}}^{*}\geqslant 0,\forall i$ —— dual feasibility

Proof Necessity
Assume $x^*$ and $\left( \lambda ^*,\nu ^* \right)$ are primal-dual optimal slns with zero duality gap

$f_0\left( x^* \right) =g\left( \lambda ^*,\nu ^* \right) =\underset{x\in \mathcal{D}}{\min}\left( f_0\left( x \right) +\sum_{i=1}^m{\lambda _{\mathrm{i}}^{*}f_{\mathrm{i}}\left( x \right)}+\sum_{j=1}^q{\nu _{\mathrm{j}}^{*}h_{\mathrm{j}}\left( x \right)} \right) \leqslant f_0\left( x^* \right) +\sum_{i=1}^m{\lambda _{\mathrm{i}}^{*}f_{\mathrm{i}}\left( x^* \right)}+\sum_{j=1}^q{\nu _{\mathrm{j}}^{*}h_{\mathrm{j}}\left( x^* \right)}\leqslant f_0\left( x^* \right)$

Therefore, all inequalities are actually equalities

Replacing the first inequality with equality $\Rightarrow x^*=arg\min _{\mathrm{x}}L\left( x,\lambda ^*,\nu ^* \right)$

Replacing the second inequality with equality $\Rightarrow$ complementarity condition

Proof of Sufficiency
Assume $\left( x^*,\lambda ^*,\nu ^* \right)$ satisfies the optimality conditions :
$g\left( \lambda ^*,\nu ^* \right) =f\left( x^* \right) +\sum_{i=1}^m{\lambda _{\mathrm{i}}^{*}f_{\mathrm{i}}\left( x^* \right)}+\sum_{j=1}^q{\nu _{\mathrm{j}}^{*}h_{\mathrm{j}}\left( x^* \right)}=f\left( x^* \right)$

The first equality is by Lagrange optimality, and the 2nd equality is due to conplementarity

Therefore, the duality gap is zero, and $\left( x^*,\lambda ^*,\nu ^* \right)$ is the primal dual optimal solution

4.4 KKT Conditions

For convex optimization problem:
minimize : $f_0\left( x \right)$
subject to : $f_{\mathrm{i}}\left( x \right) \leqslant 0,i=1,\cdots ,m,h_{\mathrm{j}}\left( x \right) =0,j=1,\cdots ,q$

Suppose duality gap is zero , then $\left( x^*,\lambda ^*,\nu ^* \right)$ is primal-dual optimal if and only if it satisfies the Karush-Kuhn-Tucker(KKT) conditions

$\frac{\partial L}{\partial x}\left( x,\lambda ^*,\nu ^* \right) =0$ —— Stationarity
$\lambda _{\mathrm{i}}^{*}f_{\mathrm{i}}\left( x^* \right) =0,\forall i$ —— Complementarity
$f_{\mathrm{i}}\left( x^* \right) \leqslant 0,h_{\mathrm{j}}\left( x^* \right) =0,\forall i,j$ —— primal feasibility
$\lambda _{\mathrm{i}}^{*}\geqslant 0,\forall i$ —— dual feasibility

5. Linear Program

Primal Formulations
minimize : $c^{\mathrm{T}}x$
subject to : $Ax+b,x\geqslant 0$

Lagrangian func : $L\left( x,\lambda ,\nu \right) =c^{\mathrm{T}}x+\lambda ^{\mathrm{T}}\left( -x \right) +\nu ^{\mathrm{T}}\left( Ax-b \right)$
$\Rightarrow g\left( \lambda ,\nu \right) =\underset{x\in \mathbb{R} ^n}{\mathrm{inf}}\left\{ \left( c^{\mathrm{T}}-\lambda ^{\mathrm{T}}+\nu ^{\mathrm{T}}A \right) x-\nu ^{\mathrm{T}}b\,\, \right\} =\begin{cases} -\infty if\,\,c^{\mathrm{T}}-\lambda ^{\mathrm{T}}+\nu ^{\mathrm{T}}A\ne 0\\ -b^{\mathrm{T}}\nu \,\, if\,\,c^{\mathrm{T}}-\lambda ^{\mathrm{T}}+\nu ^{\mathrm{T}}A=0\\ \end{cases}$
$\Rightarrow \underset{\lambda ,\nu}{\max}g\left( \lambda ,\nu \right)$ , subject to : $\lambda \geqslant 0,c^{\mathrm{T}}-\lambda ^{\mathrm{T}}+\nu ^{\mathrm{T}}A=0$

Its Dual:
maximize : $-b^{\mathrm{T}}\nu$
subject to : $A^{\mathrm{T}}\nu +c\geqslant 0$

$n$ variables $q$ equality constraint $n$ inequalities $\Rightarrow$ $q$ variables $n$ inequalities constraint

6. Quadratic Program

Unconstrained Quadratic Program : Least Squares

minimize : $J\left( x \right) =\frac{1}{2}x^{\mathrm{T}}Qx+q^{\mathrm{T}}x+q_0$
Problem is convex iff $Q\succeq 0$
When $J$ is convex , it can be wrtitten as : $J\left( x \right) =\left\| Q^{\frac{1}{2}}x-y \right\| ^2+c$