pdist, squareform
- 1.pdist, squareform使用例子
- 2.通过矩阵的四则运算实现上述pdist, squareform
scipy.spatial.distance 距离计算库中有两个函数:pdist, squareform,用于计算样本对之间的欧式距离,并且将样本间距离用方阵表示出来。
(题外话)
SciPy: 基于Numpy,提供方法(函数库)直接计算结果,封装了一些高阶抽象和物理模型
Numpy: 来存储和处理大型矩阵,比Python自身的嵌套列表(nested list structure)结构要高效的多,本身是由C语言开发。
Pandas: 基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。
参考 资料:https://www.jianshu.com/p/32cb09d84487
(回正题)
1.pdist, squareform使用例子
pdist, squareform的操作基于numpy,
>>> import numpy as np
>>> from scipy.spatial.distance import pdist, squareform
>>> x=np.array([[1,1,1],[2,2,2],[4,4,4]]) #三个一维向量:x1=[1,1,1] x2=[2,2,2],x3=[4,4,4]>>> Dis=pdist(x)
>>> Dis # d(x1,x2)=sqrt(3)=1.7 ,d(x1,x3)=sqrt(27),d(x2,x3)=sqrt(8)
array([1.73205081, 5.19615242, 3.46410162])>>> D=squareform(Dis)
array([[0. , 1.73205081, 5.19615242], # d(x1,x1),d(x1,x2),d(x1,x3)[1.73205081, 0. , 3.46410162], # d(x2,x1),d(x2,x2),d(x2,x3)[5.19615242, 3.46410162, 0. ]]) # d(x3,x1),d(x3,x2),d(x3,x1)
因为距离度量具有对称性,即d(x1,x2)=d(x2,x1)d(x1,x2)=d(x2,x1)d(x1,x2)=d(x2,x1),所以上述矩阵为一个对称阵。
2.通过矩阵的四则运算实现上述pdist, squareform
有三个三维样本:x1=[1,1,1],x2=[2,2,2]x3=[4,4,4],样本之间距离的方阵为:
D=[d(x1,x1)d(x1,x2)d(x1,x3)d(x2,x1)d(x2,x2)d(x2,x3)d(x3,x1)d(x3,x2)d(x3,x3)]D=\begin{bmatrix} d(x1,x1)& d(x1,x2) & d(x1,x3)\\ d(x2,x1)& d(x2,x2) & d(x2,x3)\\ d(x3,x1)& d(x3,x2) & d(x3,x3)\end{bmatrix} D=⎣⎡d(x1,x1)d(x2,x1)d(x3,x1)d(x1,x2)d(x2,x2)d(x3,x2)d(x1,x3)d(x2,x3)d(x3,x3)⎦⎤
d(x,y)=xxT+yyT−2xyTd(x,y)=xx^T+yy^T-2xy^Td(x,y)=xxT+yyT−2xyT
所以:
D=[x1x1T+x1x1T−2x1x1T,x1x1T+x2x2T−2x1x2T,x1x1T+x3x3T−2x1x3Tx2x2T+x1x1T−2x2x1T,x2x2T+x2x2T−2x2x1T,x2x2T+x3x3T−2x2x3Tx3x3T+x1x1T−2x3x1T,x3x3T+x2x2T−2x3x2T,x3x3T+x3x3T−2x3x3T]D=\begin{bmatrix} x_1x_1^T+x_1x_1^T-2x_1x_1^T,& x_1x_1^T+x_2x_2^T-2x_1x_2^T ,& x_1x_1^T+x_3x_3^T-2x_1x_3^T\\ x_2x_2^T+x_1x_1^T-2x_2x_1^T,& x_2x_2^T+x_2x_2^T-2x_2x_1^T ,& x_2x_2^T+x_3x_3^T-2x_2x_3^T\\ x_3x_3^T+x_1x_1^T-2x_3x_1^T,& x_3x_3^T+x_2x_2^T-2x_3x_2^T ,& x_3x_3^T+x_3x_3^T-2x_3x_3^T\end{bmatrix} D=⎣⎡x1x1T+x1x1T−2x1x1T,x2x2T+x1x1T−2x2x1T,x3x3T+x1x1T−2x3x1T,x1x1T+x2x2T−2x1x2T,x2x2T+x2x2T−2x2x1T,x3x3T+x2x2T−2x3x2T,x1x1T+x3x3T−2x1x3Tx2x2T+x3x3T−2x2x3Tx3x3T+x3x3T−2x3x3T⎦⎤
=[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]+[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]T−2[x1x1T,x1x2T,x1x3Tx2x1T,x2x1T,x2x3Tx3x1T,x3x2T,x3x3T]=\begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix}+ \begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix}^T-2 \begin{bmatrix} x_1x_1^T,& x_1x_2^T ,&x_1x_3^T\\ x_2x_1^T,& x_2x_1^T ,&x_2x_3^T\\ x_3x_1^T,& x_3x_2^T ,& x_3x_3^T\end{bmatrix} =⎣⎡x1x1T,x2x2T,x3x3T,x1x1T,x2x2T,x3x3T,x1x1Tx2x2Tx3x3T⎦⎤+⎣⎡x1x1T,x2x2T,x3x3T,x1x1T,x2x2T,x3x3T,x1x1Tx2x2Tx3x3T⎦⎤T−2⎣⎡x1x1T,x2x1T,x3x1T,x1x2T,x2x1T,x3x2T,x1x3Tx2x3Tx3x3T⎦⎤
=>[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]=> \begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix} =>⎣⎡x1x1T,x2x2T,x3x3T,x1x1T,x2x2T,x3x3T,x1x1Tx2x2Tx3x3T⎦⎤
矩阵对应元素相乘,行复制
[x1x1T,x1x2T,x1x3Tx2x1T,x2x1T,x2x3Tx3x1T,x3x2T,x3x3T]=[x1x2x3]∗[x1x2x3]T\begin{bmatrix} x_1x_1^T,& x_1x_2^T ,&x_1x_3^T\\ x_2x_1^T,& x_2x_1^T ,&x_2x_3^T\\ x_3x_1^T,& x_3x_2^T ,& x_3x_3^T\end{bmatrix}= \begin{bmatrix} x1\\ x2\\ x3\end{bmatrix}* \begin{bmatrix} x1\\ x2\\ x3\end{bmatrix}^T ⎣⎡x1x1T,x2x1T,x3x1T,x1x2T,x2x1T,x3x2T,x1x3Tx2x3Tx3x3T⎦⎤=⎣⎡x1x2x3⎦⎤∗⎣⎡x1x2x3⎦⎤T
程序实现:
X=np.array([[1,1,1],[2,2,2],[3,3,3]])
X2=(X*X).sum(1)*np.ones([3,3])
XXT=np.matmul(X,X.T)
D=X2+X2.T-2*XXT
D=np.sqrt(D2)
print (D)# 输出
[[ 0. 1.73205081 5.19615242][ 1.73205081 0. 3.46410162][ 5.19615242 3.46410162 0. ]]
**温馨提示:**上述矩阵为距离矩阵,在实际应用的过程中,注意使用的是距离的平方,还是距离。