1、SVM另一种推法
我们不管分类平面,直接去假设Margin的两个边界:
Plus-plane ={x:w⋅x+b=+1}Minus-plane ={x:w⋅x+b=−1}\begin{aligned} & \text { Plus-plane }=\{\boldsymbol{x}: \boldsymbol{w} \cdot \boldsymbol{x}+b=+1\} \\ & \text { Minus-plane }=\{\boldsymbol{x}: \boldsymbol{w} \cdot \boldsymbol{x}+b=-1\} \end{aligned} Plus-plane ={x:w⋅x+b=+1} Minus-plane ={x:w⋅x+b=−1}
这个时候Margin就是这两个平面之间的距离了
回忆一下:
Given 2 parallel lines with equations
ax+by+c1=0a x+b y+c_1=0 ax+by+c1=0
and
ax+by+c2=0a x+b y+c_2=0 ax+by+c2=0
the distance between them is given by:
d=∣c2−c1∣a2+b2d=\frac{\left|c_2-c_1\right|}{\sqrt{a^2+b^2}} d=a2+b2∣c2−c1∣
于是就有:
maximize 2∥w∥\frac{2}{\|\mathbf{w}\|}∥w∥2
such that
For yi=+1,wTxi+b≥1y_i=+1, \quad \mathbf{w}^T \mathbf{x}_i+b \geq 1yi=+1,wTxi+b≥1
For yi=−1,wTxi+b≤−1y_i=-1, \quad \mathbf{w}^T \mathbf{x}_i+b \leq-1yi=−1,wTxi+b≤−1
进一步的,有
argminw,b∑i=1dwi2subject to ∀xi∈D:yi(xi⋅w+b)≥1\begin{aligned} & \underset{\mathbf{w}, b}{\operatorname{argmin}} \sum_{i=1}^d w_i^2 \\ & \text { subject to } \forall \mathbf{x}_i \in D: y_i\left(\mathbf{x}_i \cdot \mathbf{w}+b\right) \geq 1 \end{aligned} w,bargmini=1∑dwi2 subject to ∀xi∈D:yi(xi⋅w+b)≥1
模型是一样的
2、二次规划(Quadratic Programming)
二次规划问题是这样的
Find argmaxuc+dTu+uTRu2\text { Find } \underset{\mathbf{u}}{\arg \max } \quad c+\mathbf{d}^T \mathbf{u}+\frac{\mathbf{u}^T R \mathbf{u}}{2} Find uargmaxc+dTu+2uTRu
若干个不等式约束
a11u1+a12u2+…+a1mum≤b1a21u1+a22u2+…+a2mum≤b2:an1u1+an2u2+…+anmum≤bn\begin{gathered} a_{11} u_1+a_{12} u_2+\ldots+a_{1 m} u_m \leq b_1 \\ a_{21} u_1+a_{22} u_2+\ldots+a_{2 m} u_m \leq b_2 \\ : \\ a_{n 1} u_1+a_{n 2} u_2+\ldots+a_{n m} u_m \leq b_n \end{gathered} a11u1+a12u2+…+a1mum≤b1a21u1+a22u2+…+a2mum≤b2:an1u1+an2u2+…+anmum≤bn
若干个等式约束
a(n+1)1u1+a(n+1)2u2+…+a(n+1)mum=b(n+1)a(n+2)1u1+a(n+2)2u2+…+a(n+2)mum=b(n+2):a(n+e)1u1+a(n+e)2u2+…+a(n+e)mum=b(n+e)\begin{gathered} a_{(n+1) 1} u_1+a_{(n+1) 2} u_2+\ldots+a_{(n+1) m} u_m=b_{(n+1)} \\ a_{(n+2) 1} u_1+a_{(n+2) 2} u_2+\ldots+a_{(n+2) m} u_m=b_{(n+2)} \\ : \\ a_{(n+e) 1} u_1+a_{(n+e) 2} u_2+\ldots+a_{(n+e) m} u_m=b_{(n+e)} \end{gathered} a(n+1)1u1+a(n+1)2u2+…+a(n+1)mum=b(n+1)a(n+2)1u1+a(n+2)2u2+…+a(n+2)mum=b(n+2):a(n+e)1u1+a(n+e)2u2+…+a(n+e)mum=b(n+e)
而我们线性SVM要求解的问题是
{w⃗∗,b∗}=minw⃗,b∑iwi2subject to yi(w⃗⋅x⃗i+b)≥1for all training data (x⃗i,yi)\begin{aligned} & \left\{\vec{w}^*, b^*\right\}=\min _{\vec{w}, b} \sum_i w_i^2 \\ & \text { subject to } y_i\left(\vec{w} \cdot \vec{x}_i+b\right) \geq 1 \text { for all training data }\left(\vec{x}_i, y_i\right) \end{aligned} {w∗,b∗}=w,bmini∑wi2 subject to yi(w⋅xi+b)≥1 for all training data (xi,yi)
其实本质上就是一个QP问题
{w⃗∗,b∗}=argmaxw⃗,b{0+0→⋅w⃗−w⃗TInw⃗}\left\{\vec{w}^*, b^*\right\}=\underset{\vec{w}, b}{\operatorname{argmax}}\left\{0+\overrightarrow{0} \cdot \vec{w}-\vec{w}^T \mathbf{I}_{\mathbf{n}} \vec{w}\right\} {w∗,b∗}=w,bargmax{0+0⋅w−wTInw}
y1(w⃗⋅x⃗1+b)≥1y2(w⃗⋅x⃗2+b)≥1…yN(w⃗⋅x⃗N+b)≥1\begin{aligned} & y_1\left(\vec{w} \cdot \vec{x}_1+b\right) \geq 1 \\ & y_2\left(\vec{w} \cdot \vec{x}_2+b\right) \geq 1 \\ & \ldots \\ & y_N\left(\vec{w} \cdot \vec{x}_N+b\right) \geq 1 \end{aligned} y1(w⋅x1+b)≥1y2(w⋅x2+b)≥1…yN(w⋅xN+b)≥1
3、Soft Margin SVM
咱们的Hard Margin SVM要求样本必须是线性可分的(看它的约束条件),那么问题来了,要是样本线性不可分呢?
那么咱们就希望Margin大的同时,让分类的损失尽量小一点,于是问题就变为
Minimize
w⋅w+C\boldsymbol{w}\cdot \boldsymbol{w}+Cw⋅w+C (#train errors)
这样问题就来了。首先这不再是一个QP问题,QP问题的求解方法很成熟;其次这边有一个超参数C,这个参数又叫tradeoff parameter。C越大说明你更希望分类误差小一点,C越小说明你更希望Margin大一点,所以这边C的取值就是一门学问了。
我们将误差建模为分类错误点到分类平面的距离
{w⃗∗,b∗}=minw⃗,b∑i=1dwi2+c∑j=1Nεjy1(w⃗⋅x⃗1+b)≥1−ε1y2(w⃗⋅x⃗2+b)≥1−ε2…yN(w⃗⋅x⃗N+b)≥1−εN\begin{aligned} & \left\{\vec{w}^*, b^*\right\}=\min _{\vec{w}, b} \sum_{i=1}^{\mathrm{d}} w_i^2+c \sum_{j=1}^N \varepsilon_j \\ & y_1\left(\vec{w} \cdot \vec{x}_1+b\right) \geq 1-\varepsilon_1 \\ & y_2\left(\vec{w} \cdot \vec{x}_2+b\right) \geq 1-\varepsilon_2 \\ & \ldots \\ & y_N\left(\vec{w} \cdot \vec{x}_N+b\right) \geq 1-\varepsilon_N \end{aligned} {w∗,b∗}=w,bmini=1∑dwi2+cj=1∑Nεjy1(w⋅x1+b)≥1−ε1y2(w⋅x2+b)≥1−ε2…yN(w⋅xN+b)≥1−εN
如果εi<0?\varepsilon_i<0?εi<0?,这是我们不想看到的,因为当样本分类正确时,我们希望损失是000。
{w⃗∗,b∗}=minw⃗,b∑i=1dwi2+c∑j=1Nεjy1(w⃗⋅x⃗1+b)≥1−ε1,ε1>=0y2(w⃗⋅x⃗2+b)≥1−ε2,ε2>=0…yN(w⃗⋅x⃗N+b)≥1−εN,εN>=0\begin{aligned} & \left\{\vec{w}^*, b^*\right\}=\min _{\vec{w}, b} \sum_{i=1}^{\mathrm{d}} w_i^2+c \sum_{j=1}^N \varepsilon_j \\ & y_1\left(\vec{w} \cdot \vec{x}_1+b\right) \geq 1-\varepsilon_1 ,\varepsilon_1>=0\\ & y_2\left(\vec{w} \cdot \vec{x}_2+b\right) \geq 1-\varepsilon_2 ,\varepsilon_2>=0\\ & \ldots \\ & y_N\left(\vec{w} \cdot \vec{x}_N+b\right) \geq 1-\varepsilon_N,\varepsilon_N>=0 \end{aligned} {w∗,b∗}=w,bmini=1∑dwi2+cj=1∑Nεjy1(w⋅x1+b)≥1−ε1,ε1>=0y2(w⋅x2+b)≥1−ε2,ε2>=0…yN(w⋅xN+b)≥1−εN,εN>=0
想象一下,一个样本点错得很离谱,离分界面无穷大,那么其对应的损失也是无限大,所以我们说,SVM对噪声是很敏感的
SVM的损失函数是Hinge loss
hinge(x)=max(1−x,0)\operatorname{hinge}(x)=\max (1-x, 0) hinge(x)=max(1−x,0)