因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据

Let’s jump into it right away.

让我们马上进入。

相关性 (Correlation)

Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.

关联是指与另一个变量的关系和关联。 例如,一个变量的运动与另一变量的运动相关。 例如,随着天气变热,冰淇淋销售量上升。

A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.

正相关表示运动方向相同(左图); 负相关表示变量沿相反方向移动(中间图)。 最右边的图是变量之间没有相关性时。

因果关系 (Causation)

Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.

因果关系意味着一个变量导致另一个变量改变,这意味着一个变量依赖于另一个变量。 也称为因果关系。 一个例子是随着天气变热,人们遭受更多的晒伤。 在这种情况下,天气会导致晒伤。

Image for post
Anthony Figueroa Anthony Figueroa摄correlation is not causation关联不是因果关系

相关与因果差异 (Correlation vs Causation Difference)

Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.

让我们尝试另一个可视化示例。 您的计算机电池电量耗尽会导致其关闭。 它还会导致视频播放器关闭。 现在,计算机和视频播放器的关闭事件是相关的。 实际原因是电池电量耗尽。

Image for post
correlation vs causation相关性与因果关系

为什么这在数据科学中很重要? (Why is this important in data science?)

How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?

您看过多少次暗示A导致B的研究。例如,去健身房可以提高工作效率和专注力。 这真的是因果关系吗?

As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.

作为数据科学家,您不应让相关性强加偏见,因为它可能导致错误的特征工程和错误的结论。

Correlation does not imply causation.

相关并不表示因果关系。

If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.

如果您要为健身房和生产力之间的关系编写机器学习模型,而不是专注于相关的功能(去健身房),则应关注造成高性能的实际原因(努力,毅力,例行等)以进行验证因果关系。

R中的相关性 (Correlation in R)

Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.

假设您有一个数据集,并且想要评估数据集中的某些特征是否相关。 我正在使用mtcars数据集,这是R中的内置数据集之一。

library(ggcorrplot)#read mtcars, one of the built in dataset in R
data(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)

Try it yourself. Copy & paste the above code in R.

自己尝试。 将以上代码复制并粘贴到R中。

Image for post
output from above code snippet
以上代码段的输出

When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.

运行代码时,应该获得带有相关图和值的输出。 接近+1的值表示正相关,如果接近-1则意味着负相关。 在上面的示例中,您可以观察到dispwt呈正相关,为+0.89mpgcyl呈负相关-0.85

因果影响方法 (Causal Impact Methods)

Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.

因果关系比关联性更难断定,但可能。 确定因果影响的最常见方法之一是通过实验增量研究。

Image for post
Photo by Analytics Vidya What’s the difference between Causality and Correlation?
因果摄影和相关性之间有什么区别?

Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.

继续通过本视频学习因果影响方法。 它涵盖了因果影响方法论,尤其是数字实验(A / B测试)和带有实际示例的随机化技术。

Sundas YouTube ChannelSundas YouTube频道

👩🏻‍💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube

👩🏻💻了解更多关于我在sundaskhalid.com 📝与我连接上LinkedIn , Twitter的 , Instagram , YouTube的

翻译自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0

因果关系和相关关系 大数据

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389343.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作&#xff0c;并且像任何新工作一样&#xff0c;一…

STL-开篇

基本概念 STL&#xff1a; Standard Template Library&#xff0c;标准模板库 定义&#xff1a; c引入的一个标准类库 特点&#xff1a;1&#xff09;数据结构和算法的 c实现&#xff08; 采用模板类和模板函数&#xff09;2&#xff09;数据的存储和算法的分离3&#xff09;高…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…

分类结果可视化python_可视化分类结果的另一种方法

分类结果可视化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜欢出色的数据可视化。 早在我获得…

算法组合 优化算法_算法交易简化了风险价值和投资组合优化

算法组合 优化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

PS抠发丝技巧 「选择并遮住…」

PS抠发丝技巧 「选择并遮住…」 现在的海报设计&#xff0c;大多数都有模特MM&#xff0c;然而MM的头发实用太多了&#xff0c;有的还飘起来…… 对于设计师(特别是淘宝美工)没有一个强大、快速、实用的抠发丝技巧真的混不去哦。而PS CC 2017版本开始&#xff0c;就有了一个强大…

covid 19如何重塑美国科技公司的工作文化

未来 &#xff0c; 技术 &#xff0c; 观点 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…

python生日悖论分析_生日悖论

python生日悖论分析If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?如果您在一个房间里有一群人&#xff0c;那么您需要多少个才能使两个或两个以上的人有相同的生日&a…

rstudio 管道符号_R中的管道指南

rstudio 管道符号R基础知识 (R Fundamentals) Data analysis often involves many steps. A typical journey from raw data to results might involve filtering cases, transforming values, summarising data, and then running a statistical test. But how can we link al…

蒙特卡洛模拟预测股票_使用蒙特卡洛模拟来预测极端天气事件

蒙特卡洛模拟预测股票In a previous article, I outlined the limitations of conventional time series models such as ARIMA when it comes to forecasting extreme temperature values, which in and of themselves are outliers in the time series.在上一篇文章中 &#…

直方图绘制与直方图均衡化实现

一&#xff0c;直方图的绘制 1.直方图的概念&#xff1a; 在图像处理中&#xff0c;经常用到直方图&#xff0c;如颜色直方图、灰度直方图等。 图像的灰度直方图就描述了图像中灰度分布情况&#xff0c;能够很直观的展示出图像中各个灰度级所 占的多少。 图像的灰度直方图是灰…

时间序列因果关系_分析具有因果关系的时间序列干预:货币波动

时间序列因果关系When examining a time series, it is quite common to have an intervention influence that series at a particular point.在检查时间序列时&#xff0c;在特定时间点对该序列产生干预影响是很常见的。 Some examples of this could be:例如&#xff1a; …

微生物 研究_微生物监测如何工作,为何如此重要

微生物 研究Background背景 While a New York Subway station is bustling with swarms of businessmen, students, artists, and millions of other city-goers every day, its floors, railings, stairways, toilets, walls, kiosks, and benches are teeming with non-huma…

Linux shell 脚本SDK 打包实践, 收集assets和apk, 上传FTP

2019独角兽企业重金招聘Python工程师标准>>> git config user.name "jenkins" git config user.email "jenkinsgerrit.XXX.net" cp $JENKINS_HOME/maven.properties $WORKSPACE cp $JENKINS_HOME/maven.properties $WORKSPACE/app cp $JENKINS_…

opencv:卷积涉及的基础概念,Sobel边缘检测代码实现及卷积填充模式

具体参考我的另一篇文章&#xff1a; opencv:卷积涉及的基础概念&#xff0c;Sobel边缘检测代码实现及Same&#xff08;相同&#xff09;填充与Vaild&#xff08;有效&#xff09;填充 这里是对这一篇文章的补充&#xff01; 卷积—三种填充模式 橙色部分为image, 蓝色部分为…

无法从套接字中获取更多数据_数据科学中应引起更多关注的一个组成部分

无法从套接字中获取更多数据介绍 (Introduction) Data science, machine learning, artificial intelligence, those terms are all over the news. They get everyone excited with the promises of automation, new savings or higher earnings, new features, markets or te…

web数据交互_通过体育运动使用定制的交互式Web应用程序数据科学探索任何数据...

web数据交互Most good data projects start with the analyst doing something to get a feel for the data that they are dealing with.大多数好的数据项目都是从分析师开始做一些事情&#xff0c;以便对他们正在处理的数据有所了解。 They might hack together a Jupyter n…

PCA(主成分分析)思想及实现

PCA的概念&#xff1a; PCA是用来实现特征提取的。 特征提取的主要目的是为了排除信息量小的特征&#xff0c;减少计算量等。 简单来说&#xff1a; 当数据含有多个特征的时候&#xff0c;选取主要的特征&#xff0c;排除次要特征或者不重要的特征。 比如说&#xff1a;我们要…

【安富莱二代示波器教程】第8章 示波器设计—测量功能

第8章 示波器设计—测量功能 二代示波器测量功能实现比较简单&#xff0c;使用2D函数绘制即可。不过也专门开辟一个章节&#xff0c;为大家做一个简单的说明&#xff0c;方便理解。 8.1 水平测量功能 8.2 垂直测量功能 8.3 总结 8.1 水平测量功能 水平测量方…