边际概率条件概率_数据科学家解释的边际联合和条件概率

边际概率条件概率

Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.

P robability起着数据科学非常重要的作用,为数据科学家经常试图绘制可以用来更好地预测数据或分析数据的统计推断。

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability (Source: Wikipedia), hence understanding random variables and their probability distributions is a required skill to work on many Data Science problems.

统计推断是使用数据分析来推断潜在概率分布的属性的过程( 来源 :Wikipedia),因此了解随机变量及其概率分布是解决许多数据科学问题的必备技能。

I am going to start this discussion by providing a scenario as we are going to be learning about probability distributions from this scenario.

我将通过提供一个场景开始此讨论,因为我们将从该场景中学习概率分布。

情境 (Scenario)

A survey was carried out with 500 strangers in London’s West End to determine people’s favorite sports. The options were Football, Rugby and the rest was grouped together in Other; The results of the test are displayed in Figure 1.

在伦敦西区,对500个陌生人进行了一项调查,以确定人们最喜欢的运动。 选项包括“足球”,“橄榄球”,其余分组在“其他”中。 测试结果如图1所示。

Image for post
Figure 1: The Results of the test
图1:测试结果

Figure 1 is not quite a probability distribution, but if we want to get the probability distribution we can simply divide each number in Figure 1 by 500 (number of observations) and the result will be the image in Figure 2.

图1并不是一个概率分布,但是如果我们想要获得概率分布,我们可以简单地将图1中的每个数字除以500(观察值的数量),结果将是图2中的图像。

Image for post
Figure 2: Probability Distribution
图2:概率分布

联合概率 (Joint Probability)

The Joint probability is a statistical measure that is used to calculate the probability of two events occurring together at the same time — P(A and B) or P(A,B). For example, using Figure 2 we can see that the joint probability of someone being a male and liking football is 0.24.

联合概率是一种统计量度,用于计算两个事件同时发生的概率-P(A和B)或P(A,B)。 例如,使用图2可以看到某人是男性并且喜欢足球的联合概率为0.24。

Image for post
Figure 3: The Joint Probability Distribution.
图3:联合概率分布。

Note: The cells highlighted in Figure 3 (the Joint Probability Distribution) must sum to 1 because everyone in the distribution must be in one of the cells.

注意 :图3中的单元格(联合概率分布)必须加1,因为分布中的每个人都必须位于其中一个单元格中。

The Joint probability is symmetrical meaning that P(Male and Football) = P(Football and Male) and we can also use it to find other types of distributions, the marginal distribution and the conditional distribution.

联合概率是对称的,意味着P(男和足球)= P(足球和男),我们也可以用它来找到其他类型的分布,即边际分布和条件分布。

边际分布 (Marginal Distribution)

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables (Source: Wikipedia) — If that was too much jargon, to put it simply, the marginal probability is the probability of an event irrespective of the outcome of another variable — P(A) or P(B).

在概率论和统计学中,随机变量集合的子集的边际分布是子集中包含的变量的概率分布。 它给出了子集中变量的各种值的概率,而没有参考其他变量的值( 来源 : Wikipedia )—如果说的话太多了,简单来说,边际概率就是事件的概率另一个变量-P(A)或P(B)的结果。

Image for post
Figure 4: The Marginal Distribution
图4:边际分布

Note: Whether we ignore the gender or the sport our Marginal Distributions must sum to 1.

注意 :无论我们忽略性别还是运动,我们的边际分布总和必须为1。

A fun fact of marginal probability is that all the marginal probabilities appear in the margins — how cool is that. Hence the P(Female) = 0.46 which completely ignores the sport the Female prefers, and the P(Rugby) = 0.25 completely ignores the gender.

边际概率的一个有趣的事实是,所有边际概率都出现在边际中-这多么酷。 因此,P(女性)= 0.46完全忽略了女性偏爱的运动,而P(Rugby)= 0.25则完全忽略了性别。

条件概率 (Conditional Probability)

The conditional probability concept is one of the most fundamental in probability theory and in my opinion is a trickier type of probability. It defines the probability of one event occurring given that another event has occurred (by assumption, presumption, assertion or evidence).

条件概率概念是概率论中最基本的概念之一,在我看来是一种棘手的概率类型。 它定义了假设已发生另一事件(通过假设,推定,主张或证据)而发生一个事件的概率。

Image for post
Figure 5: Expression of the Conditional Probability
图5:条件概率的表达式

To make sense of this let’s again use Figure 2; If we want to calculate the probability that a person would like Rugby given that they are a female, we must take the joint probability that the person is female and likes rugby (P(Female and Rugby)) and divide it by the probability of the condition. In this case, the probability is that the person is a female (P(Female)) which we can work out from the margin to be 0.46 hence we get 0.11 (2 decimal places).

为了理解这一点,让我们再次使用图2 ; 如果要计算某人喜欢橄榄球的概率(假设某人是女性),则必须考虑该人是女性并且喜欢橄榄球的联合概率( P(Female and Rugby) ),然后将其除以概率健康)状况。 在这种情况下,概率是该人是一个女性( P(Female) ),我们可以从裕度算出其为0.46,因此得到0.11(小数点后两位)。

Let's write that up neater:

让我们写得更整洁一些:

P(Female, Rugby) = 0.05

P(女,橄榄球)= 0.05

P(Female) = 0.46

P(女)= 0.46

P(Rugby | Female) = 0.05 / 0.46 = 0.11 (to 2 decimal places).

P(橄榄球|母)= 0.05 / 0.46 = 0.11(小数点后2位)。

If we continued to fill in the probability of preferring a sport given the observant is a female then we would have a Conditional Probability Distribution.

如果在观察者是女性的情况下,如果我们继续填写喜欢某项运动的可能性,那么我们将获得条件概率分布。

结语 (Wrap Up)

This is guide is a very simple introduction to joint, marginal and conditional probability. Being a Data Scientist and knowing about these distributions may still get you death stares from the envious Statisticians, but at least this time it’s because they are just angry people rather than you being wrong — I am joking!

本指南是对联合概率,边际概率和条件概率的非常简单的介绍。 作为数据科学家并了解这些分布可能仍然会让您羡慕嫉妒的统计学家,但至少这次是因为他们只是在生气,而不是您在做错- 我在开玩笑!

Let’s continue the conversation on LinkedIn…

让我们继续在LinkedIn上进行对话…

翻译自: https://towardsdatascience.com/marginal-joint-and-conditional-probabilities-explained-by-data-scientist-4225b28907a4

边际概率条件概率

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389660.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

袋装决策树_袋装树是每个数据科学家需要的机器学习算法

袋装决策树袋装树木介绍 (Introduction to Bagged Trees) Without diving into the specifics just yet, it’s important that you have some foundation understanding of decision trees.尚未深入研究细节,对决策树有一定基础了解就很重要。 From the evaluatio…

[JS 分析] 天_眼_查 字体文件

0. 参考 js分析 猫_眼_电_影 字体文件 font-face 1. 分析 1.1 定位目标元素 1.2 查看网页源代码 1.3 requests 请求提取得到大量错误信息 对比猫_眼_电_影抓取到unicode编码,天_眼_查混合使用正常字体和自定义字体,难点在于如何从 红 转化为 美。 一开始…

经天测绘测量工具包_公共土地测量系统

经天测绘测量工具包部分-乡镇第一师 (Sections — First Divisions of Townships) The PLSS Townships are typically divided into 36 Sections (nominally one mile on a side), but in the national standard this feature is called the first division because Townships …

洛谷 P4012 深海机器人问题【费用流】

题目链接:https://www.luogu.org/problemnew/show/P4012 洛谷 P4012 深海机器人问题 输入输出样例 输入样例#1: 1 1 2 2 1 2 3 4 5 6 7 2 8 10 9 3 2 0 0 2 2 2 输出样例#1: 42 说明 题解:建图方法如下: 对于矩阵中的每…

opencv实现对象跟踪_如何使用opencv跟踪对象的距离和角度

opencv实现对象跟踪介绍 (Introduction) Tracking the distance and angle of an object has many practical uses, especially in robotics. This tutorial explains how to get an accurate distance and angle measurement, even when the target is at a strong angle from…

spring cloud 入门系列七:基于Git存储的分布式配置中心--Spring Cloud Config

我们前面接触到的spring cloud组件都是基于Netflix的组件进行实现的,这次我们来看下spring cloud 团队自己创建的一个全新项目:Spring Cloud Config.它用来为分布式系统中的基础设施和微服务提供集中化的外部配置支持,分为服务端和客户端两个…

熊猫数据集_大熊猫数据框的5个基本操作

熊猫数据集Tips and Tricks for Data Science数据科学技巧与窍门 Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.Pandas是使用Python编程语言编写的功能强大且易于使用…

图嵌入综述 (arxiv 1709.07604) 译文五、六、七

应用 图嵌入有益于各种图分析应用,因为向量表示可以在时间和空间上高效处理。 在本节中,我们将图嵌入的应用分类为节点相关,边相关和图相关。 节点相关应用 节点分类 节点分类是基于从标记节点习得的规则,为图中的每个节点分配类标…

聊聊自动化测试框架

无论是在自动化测试实践,还是日常交流中,经常听到一个词:框架。之前学习自动化测试的过程中,一直对“框架”这个词知其然不知其所以然。 最近看了很多自动化相关的资料,加上自己的一些实践,算是对“框架”有…

移动磁盘文件或目录损坏且无法读取资料如何找回

文件或目录损坏且无法读取说明这个盘的文件系统结构损坏了。在平时如果数据不重要,那么可以直接格式化就能用了。但是有的时候里面的数据很重要,那么就必须先恢复出数据再格式化。具体恢复方法可以看正文了解(不格式化的恢复方法)…

python 平滑时间序列_时间序列平滑以实现更好的聚类

python 平滑时间序列In time series analysis, the presence of dirty and messy data can alter our reasonings and conclusions. This is true, especially in this domain, because the temporal dependency plays a crucial role when dealing with temporal sequences.在…

帮助学生改善学习方法_学生应该如何花费时间改善自己的幸福

帮助学生改善学习方法There have been numerous studies looking into the relationship between sleep, exercise, leisure, studying and happiness. The results were often quite like how we expected, though there have been debates about the relationship between sl…

Spring Boot 静态资源访问原理解析

一、前言 springboot配置静态资源方式是多种多样,接下来我会介绍其中几种方式,并解析一下其中的原理。 二、使用properties属性进行配置 应该说 spring.mvc.static-path-pattern 和 spring.resources.static-locations这两属性是成对使用的,如…

深挖“窄带高清”的实现原理

过去几年,又拍云一直在点播、直播等视频应用方面潜心钻研,取得了不俗的成果。我们结合点播、直播、短视频等业务中的用户场景,推出了“省带宽、压成本”系列文章,从编码技术、网络架构等角度出发,结合又拍云的产品成果…

Redis 服务安装

下载 客户端可视化工具: RedisDesktopManager redis官网下载: http://redis.io/download windos服务安装 windows服务安装/卸载下载文件并解压使用 管理员身份 运行命令行并且切换到解压目录执行 redis-service --service-install windowsR 打开运行窗口, 输入 services.msc 查…

熊猫数据集_对熊猫数据框使用逻辑比较

熊猫数据集P (tPYTHON) Logical comparisons are used everywhere.逻辑比较随处可见 。 The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comp…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…

gl3520 gl3510_带有gl gl本机的跨平台地理空间可视化

gl3520 gl3510Editor’s note: Today’s post is by Ib Green, CTO, and Ilija Puaca, Founding Engineer, both at Unfolded, an “open core” company that builds products and services on the open source deck.gl / vis.gl technology stack, and is also a major contr…

uiautomator +python 安卓UI自动化尝试

使用方法基本说明:https://www.cnblogs.com/mliangchen/p/5114149.html,https://blog.csdn.net/Eugene_3972/article/details/76629066 环境准备:https://www.cnblogs.com/keeptheminutes/p/7083816.html 简单实例 1.自动化安装与卸载 &#…

power bi中的切片器_在Power Bi中显示选定的切片器

power bi中的切片器Just recently, while presenting my session: “Magnificent 7 — Simple tricks to boost your Power BI Development” at the New Stars of Data conference, one of the questions I’ve received was:就在最近,在“新数据之星”会议上介绍我…