什么是皮尔逊相关系数?
皮尔逊相关系数(Pearson correlation coefficient):由卡尔·皮尔逊(Karl Pearson)提出,是衡量两个变量线性相关程度的统计指标,它的值介于-1与1之间,其中1表示完全正相关,-1表示完全负相关,0则意味着没有线性相关。
公式:
r = ∑ i = 1 n ( x i − x ‾ ) ( y i − y ‾ ) ∑ i = 1 n ( x i − x ‾ ) 2 ∑ i = 1 n ( y i − y ‾ ) 2 r = \frac{\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})} {\sqrt{\sum_{i=1}^{n} (x_i - \overline{x})^2} \sqrt{\sum_{i=1}^{n} (y_i - \overline{y})^2}} r=∑i=1n(xi−x)2∑i=1n(yi−y)2∑i=1n(xi−x)(yi−y)
简单实现
有一个旅游平台,有景点信息,计算用户之间的相似度,从而实现景点推荐
Python实现:
# 用字典的形式表示测试数据data,其中键是用户id,值是另一个字典,该字典的键是景点id,值是对该景点评分
data = {"user1": {"spot1": 5, "spot2": 3, "spot3": 2},"user2": {"spot1": 5, "spot2": 1, "spot3": 2, "spot4": 4, "spot5": 5}
}def cal_similarity(user1, user2):# 1.获取两个用户都评论的景点,避免不必要计算common_spots = set(data[user1]).intersection(data[user2])n = len(common_spots)if n == 0:return 0# 2.计算用户对景点评分和''' sum1 = 0for spot in common_spots:sum1 += data[user1][spot] '''sum1 = sum(data[user1][spot] for spot in common_spots)sum2 = sum(data[user2][spot] for spot in common_spots)# 3.计算评分平方和sumsq1 = sum(pow(data[user1][spot], 2) for spot in common_spots)sumsq2 = sum(pow(data[user2][spot], 2) for spot in common_spots)# 4.乘积和pSum = sum(data[user1][spot] * data[user2][spot] for spot in common_spots)# 4.计算系数num = pSum - (sum1 * sum2) / nden = ((sumsq1 - pow(sum1, 2) / n) * (sumsq2 - pow(sum2, 2) / n)) ** 0.5if den == 0:return 0r = round(num / den, 2)return ruser1 = 'user1'
user2 = 'user2'
similarity = cal_similarity(user1, user2)
print(f'用户{user1}和用户{user2}的相似度为: {similarity}')
Java实现:
public class PearsonCorrelation {public static void main(String[] args) {Map<String, Map<String, Integer>> ratings = new HashMap<>();ratings.put("user1", new HashMap<>() {{put("spot1", 5);put("spot2", 3);put("spot3", 2);}});ratings.put("user2", new HashMap<>() {{put("spot1", 5);put("spot2", 1);put("spot3", 4);put("spot4", 4);put("spot5", 4);}});double similarity = calculatePearsonSimilarity(ratings, "user1", "user2");System.out.println("用户1和用户2的相似度为: " + similarity);}public static double calculatePearsonSimilarity(Map<String, Map<String, Integer>> ratings, String user1, String user2) {Set<String> commonSpots = ratings.get(user1).keySet();commonSpots.retainAll(ratings.get(user2).keySet());if (commonSpots.isEmpty()) {return 0;}double sum1 = 0;double sum2 = 0;double sumSq1 = 0;double sumSq2 = 0;double pSum = 0;int n = commonSpots.size();for (String spot : commonSpots) {int score1 = ratings.get(user1).get(spot);int score2 = ratings.get(user2).get(spot);sum1 += score1;sum2 += score2;sumSq1 += Math.pow(score1, 2);sumSq2 += Math.pow(score2, 2);pSum += score1 * score2;}double num = pSum - (sum1 * sum2 / n);double den = Math.sqrt((sumSq1 - Math.pow(sum1, 2) / n) * (sumSq2 - Math.pow(sum2, 2) / n));if (den == 0) {return 0;}return Double.parseDouble(String.format("%.2f", num / den));}
}