直接上代码:
x = np.random.rand(10,5) #随机生成一组样本
x -= x.mean(axis=0) # 见详注1
C = x.T.dot(x) # 计算自协方差矩阵
lam,v= lina.eig(C) # 特征分解,v是
new_index = np.argsort(lam)[::-1] # 特征值排序,见详注2
A = -v[:,new_index] # 得到A
w = x.dot(A) # 计算变换后的特征
r = lam[new_index]/lam.sum() # 计算所有特征对应的贡献率
测试一下:
w[:,:2] # 新特征的前2个>>> array([[-0.3939524518, -0.4184678305],[-0.5907434013, 0.2033346207],[-0.4585388051, -0.111367225 ],[ 0.4552495673, -0.0405062598],[-0.2335902798, -0.4260334862],[ 0.4523182771, 0.039755097 ],[ 0.0902288594, 0.1869543779],[ 0.089419155 , 0.7656098218],[ 0.7645053936, -0.3353675658],[-0.1748963144, 0.1360884499]])r # 各个特征值对应的贡献率>>> array([0.4026073116, 0.2589988934, 0.2088275432, 0.0902665298,0.0392997221])
对比SKLEARN实现:
pca = PCA(n_components=2)pca.fit(x) # x还是最开始那个xpca.explained_variance_ratio_>>> array([0.4026073116, 0.2589988934]) # 前2个特征对应的贡献率(完全一致)pca.transform(x) # 降维变换(完全一致)>>> array([[-0.3939524518, -0.4184678305],[-0.5907434013, 0.2033346207],[-0.4585388051, -0.111367225 ],[ 0.4552495673, -0.0405062598],[-0.2335902798, -0.4260334862],[ 0.4523182771, 0.039755097 ],[ 0.0902288594, 0.1869543779],[ 0.089419155 , 0.7656098218],[ 0.7645053936, -0.3353675658],[-0.1748963144, 0.1360884499]])
计算变换后的特征差值:
w1 = w[:,0]w2 = pca.transform(c)[:,0]
((w1-w2)**2).sum()
>>> 2.2980196096428498e-30
贡献率的值:
pca.explained_variance_ratio_ - r[:2]
>>> array([ 1.1102230246e-16, -1.6653345369e-16])
详细注解见: https://blog.csdn.net/cauchy7203/article/details/107421996