因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据

Let’s jump into it right away.

让我们马上进入。

相关性 (Correlation)

Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.

关联是指与另一个变量的关系和关联。 例如,一个变量的运动与另一变量的运动相关。 例如,随着天气变热,冰淇淋销售量上升。

A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.

正相关表示运动方向相同(左图); 负相关表示变量沿相反方向移动(中间图)。 最右边的图是变量之间没有相关性时。

因果关系 (Causation)

Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.

因果关系意味着一个变量导致另一个变量改变,这意味着一个变量依赖于另一个变量。 也称为因果关系。 一个例子是随着天气变热,人们遭受更多的晒伤。 在这种情况下,天气会导致晒伤。

Image for post
Anthony Figueroa Anthony Figueroa摄correlation is not causation关联不是因果关系

相关与因果差异 (Correlation vs Causation Difference)

Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.

让我们尝试另一个可视化示例。 您的计算机电池电量耗尽会导致其关闭。 它还会导致视频播放器关闭。 现在,计算机和视频播放器的关闭事件是相关的。 实际原因是电池电量耗尽。

Image for post
correlation vs causation相关性与因果关系

为什么这在数据科学中很重要? (Why is this important in data science?)

How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?

您看过多少次暗示A导致B的研究。例如,去健身房可以提高工作效率和专注力。 这真的是因果关系吗?

As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.

作为数据科学家,您不应让相关性强加偏见,因为它可能导致错误的特征工程和错误的结论。

Correlation does not imply causation.

相关并不表示因果关系。

If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.

如果您要为健身房和生产力之间的关系编写机器学习模型,而不是专注于相关的功能(去健身房),则应关注造成高性能的实际原因(努力,毅力,例行等)以进行验证因果关系。

R中的相关性 (Correlation in R)

Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.

假设您有一个数据集,并且想要评估数据集中的某些特征是否相关。 我正在使用mtcars数据集,这是R中的内置数据集之一。

library(ggcorrplot)#read mtcars, one of the built in dataset in R
data(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)

Try it yourself. Copy & paste the above code in R.

自己尝试。 将以上代码复制并粘贴到R中。

Image for post
output from above code snippet
以上代码段的输出

When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.

运行代码时,应该获得带有相关图和值的输出。 接近+1的值表示正相关,如果接近-1则意味着负相关。 在上面的示例中,您可以观察到dispwt呈正相关,为+0.89mpgcyl呈负相关-0.85

因果影响方法 (Causal Impact Methods)

Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.

因果关系比关联性更难断定,但可能。 确定因果影响的最常见方法之一是通过实验增量研究。

Image for post
Photo by Analytics Vidya What’s the difference between Causality and Correlation?
因果摄影和相关性之间有什么区别?

Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.

继续通过本视频学习因果影响方法。 它涵盖了因果影响方法论,尤其是数字实验(A / B测试)和带有实际示例的随机化技术。

Sundas YouTube ChannelSundas YouTube频道

👩🏻‍💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube

👩🏻💻了解更多关于我在sundaskhalid.com 📝与我连接上LinkedIn , Twitter的 , Instagram , YouTube的

翻译自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0

因果关系和相关关系 大数据

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389343.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Pytorch构建模型的3种方法

这个地方一直是我思考的地方&#xff01;因为学的代码太多了&#xff0c;构建的模型各有不同&#xff0c;这里记录一下&#xff01; 可以使用以下3种方式构建模型&#xff1a; 1&#xff0c;继承nn.Module基类构建自定义模型。 2&#xff0c;使用nn.Sequential按层顺序构建模…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作&#xff0c;并且像任何新工作一样&#xff0c;一…

Flask-SocketIO 简单使用指南

Flask-SocketIO 使 Flask 应用程序能够访问客户端和服务器之间的低延迟双向通信。客户端应用程序可以使用 Javascript&#xff0c;C &#xff0c;Java 和 Swift 中的任何 SocketIO 官方客户端库或任何兼容的客户端来建立与服务器的永久连接。 安装 直接使用 pip 来安装&#xf…

STL-开篇

基本概念 STL&#xff1a; Standard Template Library&#xff0c;标准模板库 定义&#xff1a; c引入的一个标准类库 特点&#xff1a;1&#xff09;数据结构和算法的 c实现&#xff08; 采用模板类和模板函数&#xff09;2&#xff09;数据的存储和算法的分离3&#xff09;高…

Symbol Mc1000 声音的设置以及播放

首先引用Symbol.Audio 加一命名空间using Symbol.Audio; /声音设备的设置 //Select Device from device list Symbol.Audio.Device MyDevice (Symbol.Audio.Device)Symbol.StandardForms.SelectDevice.Select( Symbol.Audio.Controller.Title, Symbol.Audio.Devic…

/bin/bash^M: 坏的解释器: 没有那个文件或目录

在win下编辑的时候&#xff0c;换行结尾是\n\r &#xff0c; 而在linux下 是\n&#xff0c;所以会多出来一个\r&#xff0c;这样会出现错误 此时执行 sed -i s/\r$// file.sh 将file.sh中的\r都替换为空白&#xff0c;问题解决转载于:https://www.cnblogs.com/zzdbullet/p/9890…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

pandas处理丢失数据与数据导入导出

3.4pandas处理丢失数据 头文件&#xff1a; import numpy as np import pandas as pd丢弃数据部分&#xff1a; dates pd.date_range(20130101,periods6) df pd.DataFrame(np.random.randn(6,4),indexdates,columns[A,B,C,D]) df.iloc[0,1] np.nan df.iloc[1,2] np.nanp…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…

分类结果可视化python_可视化分类结果的另一种方法

分类结果可视化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜欢出色的数据可视化。 早在我获得…

算法组合 优化算法_算法交易简化了风险价值和投资组合优化

算法组合 优化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

Symbol Mc1000 快捷键 的 设置 事件 开发

switch (e.KeyCode) { ///数据 case Keys.F1://清除数据 if(File.Exists("Storage Card/CG.sdf")) { Mc.gConn.Close(); Mc.gConn.Dispose(); File.Delete("Storage Card/CG.sdf"); } MessageBox.S…

pandas合并concatmerge和plot画图

3.6&#xff0c;3.7pandas合并concat&merge 头文件&#xff1a; import pandas as pd import numpy as npconcat基础合并用法 df1 pd.DataFrame(np.ones((3,4))*0,columns [a,b,c,d]) df2 pd.DataFrame(np.ones((3,4))*1,columns [a,b,c,d]) df3 pd.DataFrame(np.ones…

Android跳转WIFI界面的四种方式

第一种 Intent intent new Intent(); intent.setAction("android.net.wifi.PICK_WIFI_NETWORK"); startActivity(intent); 第二种 startActivity(new Intent(android.provider.Settings.ACTION_WIFI_SETTINGS)); 第三种 Intent i new Intent(); if(android.os.Buil…

PS抠发丝技巧 「选择并遮住…」

PS抠发丝技巧 「选择并遮住…」 现在的海报设计&#xff0c;大多数都有模特MM&#xff0c;然而MM的头发实用太多了&#xff0c;有的还飘起来…… 对于设计师(特别是淘宝美工)没有一个强大、快速、实用的抠发丝技巧真的混不去哦。而PS CC 2017版本开始&#xff0c;就有了一个强大…

covid 19如何重塑美国科技公司的工作文化

未来 &#xff0c; 技术 &#xff0c; 观点 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…

Symbol Mc1000 Text文本阅读器整体代码

using System; using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using System.Collections;using System.IO;namespace text{ /// <summary> /// Form1 的摘要说明。 /// </summary> public c…

python生日悖论分析_生日悖论

python生日悖论分析If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?如果您在一个房间里有一群人&#xff0c;那么您需要多少个才能使两个或两个以上的人有相同的生日&a…

统计0-n数字中出现k的次数

/*** 统计0-n数字中出现k的次数&#xff0c;其中k范围为0-9 */ public static int countOne(int k, int n) {if (k > n) {return 0;}int sum 0;int right 0;for (int i 0; n > 0; i) {int last n % 10;sum last * i * (int) Math.pow(10, i - 1);if (k 0) {sum - (…

房价预测 search Search 中对数据预处理的学习

对于缺失的数据&#xff1a; 我们对连续数值的特征做标准化&#xff08;standardization&#xff09;&#xff1a;设该特征在整个数据集上的均值为 μ &#xff0c;标准差为 σ 。那么&#xff0c;我们可以将该特征的每个值先减去 μ 再除以 σ 得到标准化后的每个特征值。对于…