1. 背景
数学小白一枚,看推理过程需要很多时间。好在有大神们源码和DS帮忙,教程里的推理过程才能勉强拼凑一二。
* 留意: 推导过程中X都是向量组表达: shape(feature, sample_n); 和numpy中的默认矩阵正好相反。
2. PCA / KPCA
PCA | KPCA(Linear Kernel) |
详细推理基本过程找教程。(详细步骤我也推不出来,数学太菜) 大概过程: 1. 求最小|X-XWWt|^2 时的W 2. 通过trace的性质,等价于求trace(AtA) 3. 最后推导出:需要最大化XXtW=lambdaW,又要降低维度; 所以计算比例lamda中由大到小排序,保留满足一定阈值的前n个特征值和对应的特征向量(就是W)。 输出: 降维Xd= X@Wd 代码很简单. | 详细推理基本过程找教程。(详细步骤我也推不出来,数学太菜) 大概过程: 1. 巧妙的设了一个A=XW/sqrt(lambda), K=XtX 2. 通过推导KA=lamdaA,W=XtA/sqrt(lamda) * 大模型解释的A为什么要这么设 输出: 降维Xd= X@Wd = lambda * sqrt(lamda) 代码相对复杂一些。运行的结果和PCA一样的。 |
PCA
import numpy as np
from sklearn.datasets import load_digits, load_iris
from sklearn.decomposition import KernelPCA#global round float to scale 2
np.set_printoptions(precision=2, suppress=True)X, _ = load_iris(return_X_y=True)X=X[:5]
print(X)#========================================================
#PCA
# 1. W= XtX's eigVec (responding to max eigVal)
# 2. X_rec=X@W@W.T
#========================================================
eVals, eVecs=np.linalg.eig(X.T@X)
print(np.allclose(X.T@X, eVecs@np.diag(eVals)@eVecs.T))
print('val',eVals)
print('val',eVals[:2])
print('vec',eVecs)
print('vec',eVecs[:2])W=eVecs.T[:2].Tprint("W",W)
X_rec=X@W@W.T
print(X_rec)
print(X)
print(np.linalg.norm(X - X_rec))
print(np.var(X - X_rec))
KPCA
import numpy as np
from sklearn.datasets import load_digits, load_iris
from sklearn.decomposition import KernelPCA#global round float to scale 2
np.set_printoptions(precision=2, suppress=True)X, _ = load_iris(return_X_y=True)# X=X[:5]
# XMean=np.mean(X)
# X=X-XMean
print(X)#========================================================
# KPCA
# set: A= XW/sqrt(lambda)
# based on PCA's conclusion: XtXW=lambda W //由于W有约束, WtW=1 单位正交向量组
# >>> W=XtXW/lambda = XtA/sqrt(lambda) //WtW == 1 == AtXXtA/lambda = AtKA/lambda = At lambda A/lambda = lambda/lambda AtA = 1
# >>> 同时:XXtXW=lambda XW >>> KA*sqrt(lambda) = lambda A*sqrt(lambda) >>> KA = lambda A //设A时XW/n(任意值),这个公式都成立;但按上面的设定,可以保证W单位正交。
#
# 1. W = XtA/sqrt(lambda) (A is eigVec of X@X.T)
# 2. X_rec=X@W@W.T
#========================================================
# create a callable kernel PCA object
# transformer = KernelPCA(n_components=2, kernel='linear')
# X_transformed = transformer.fit_transform(X)
eVals, eVecs=np.linalg.eig(X@X.T)
print(np.allclose(X@X.T, eVecs@np.diag(eVals)@eVecs.T))print('val',eVals)
print('val',eVals[:2])
print('vec',eVecs)
print('vec',eVecs[:2])# W = XtA/sqrt(lambda)
W=X.T@eVecs.T[:2].T@np.linalg.pinv(np.sqrt(np.diag(eVals[:2])))# X_hat = XW = XXtA/sqrt(lambda)= KA/sqrt(lambda) = lambda A/sqrt(lambda) = A*sqrt(lambda)
# 这就是源码中直接用 A*sqrt(lambda) 返回X_transformed的原因:
#<code>
# no need to use the kernel to transform X, use shortcut expression
# X_transformed = self.eigenvectors_ * np.sqrt(self.eigenvalues_)
#</code># print(X_transformed.shape)
# print(X_transformed)
#
# W=X.T@transformer.eigenvectors_
# print(transformer.eigenvectors_.shape)
# print(transformer.eigenvectors_)print("W",W)
X_rec=X@W@W.T
print(X_rec)
print(X)
print(np.linalg.norm(X - X_rec))
print(np.var(X - X_rec))
参考:
《Python机器学习》