我正在尝试使用以下代码在
Python中遵循
Wikipedia Article on latent semantic indexing:
documentTermMatrix = array([[ 0.,1.,0.,1.],[ 0.,0.],[ 1.,0.]])
u,s,vt = linalg.svd(documentTermMatrix,full_matrices=False)
sigma = diag(s)
## remove extra dimensions...
numberOfDimensions = 4
for i in range(4,len(sigma) -1):
sigma[i][i] = 0
queryVector = array([[ 0.],# same as first column in documentTermMatrix
[ 0.],[ 0.],[ 1.],[ 1.]])
数学怎么说应该有效:
dtMatrixToQueryAgainst = dot(u,dot(s,vt))
queryVector = dot(inv(s),dot(transpose(u),queryVector))
similarityToFirst = cosineDistance(queryVector,dtMatrixToQueryAgainst[:,0]
# gives 'matrices are not aligned' error. should be 1 because they're the same
什么工作,数学看起来不正确:(从here)
dtMatrixToQueryAgainst = dot(s,vt)
queryVector = dot(transpose(u),queryVector)
similarityToFirst = cosineDistance(queryVector,dtMatrixToQueryAgainsst[:,0])
# gives 1,which is correct
为什么路由工作,而第一个没有,当我能找到关于LSA数学的所有东西显示第一个是正确的?我觉得我错过了一些明显的东西……