因果关系和相关关系 大数据
Let’s jump into it right away.
让我们马上进入。
相关性 (Correlation)
Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.
关联是指与另一个变量的关系和关联。 例如,一个变量的运动与另一变量的运动相关。 例如,随着天气变热,冰淇淋销售量上升。
A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.
正相关表示运动方向相同(左图); 负相关表示变量沿相反方向移动(中间图)。 最右边的图是变量之间没有相关性时。
因果关系 (Causation)
Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.
因果关系意味着一个变量导致另一个变量改变,这意味着一个变量依赖于另一个变量。 也称为因果关系。 一个例子是随着天气变热,人们遭受更多的晒伤。 在这种情况下,天气会导致晒伤。
相关与因果差异 (Correlation vs Causation Difference)
Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.
让我们尝试另一个可视化示例。 您的计算机电池电量耗尽会导致其关闭。 它还会导致视频播放器关闭。 现在,计算机和视频播放器的关闭事件是相关的。 实际原因是电池电量耗尽。
为什么这在数据科学中很重要? (Why is this important in data science?)
How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?
您看过多少次暗示A导致B的研究。例如,去健身房可以提高工作效率和专注力。 这真的是因果关系吗?
As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.
作为数据科学家,您不应让相关性强加偏见,因为它可能导致错误的特征工程和错误的结论。
Correlation does not imply causation.
相关并不表示因果关系。
If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.
如果您要为健身房和生产力之间的关系编写机器学习模型,而不是专注于相关的功能(去健身房),则应关注造成高性能的实际原因(努力,毅力,例行等)以进行验证因果关系。
R中的相关性 (Correlation in R)
Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.
假设您有一个数据集,并且想要评估数据集中的某些特征是否相关。 我正在使用mtcars数据集,这是R中的内置数据集之一。
library(ggcorrplot)#read mtcars, one of the built in dataset in R
data(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)
Try it yourself. Copy & paste the above code in R.
自己尝试。 将以上代码复制并粘贴到R中。
When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.
运行代码时,应该获得带有相关图和值的输出。 接近+1的值表示正相关,如果接近-1则意味着负相关。 在上面的示例中,您可以观察到disp和wt呈正相关,为+0.89 ; mpg和cyl呈负相关-0.85 。
因果影响方法 (Causal Impact Methods)
Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.
因果关系比关联性更难断定,但可能。 确定因果影响的最常见方法之一是通过实验和增量研究。
Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.
继续通过本视频学习因果影响方法。 它涵盖了因果影响方法论,尤其是数字实验(A / B测试)和带有实际示例的随机化技术。
👩🏻💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube
👩🏻💻了解更多关于我在sundaskhalid.com 📝与我连接上LinkedIn , Twitter的 , Instagram , YouTube的
翻译自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0
因果关系和相关关系 大数据
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389343.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!