条件概率分布_条件概率

条件概率分布

If you’re currently in the job market or looking to switch careers, you’ve probably noticed an increase in popularity of Data Science jobs. In 2019, LinkedIn ranked “data scientist” the №1 most promising job in the U.S. based on job openings, salary, and career advancement opportunities and reported a 56% rise in job openings for data scientists over the previous year. Despite its popularity, however, data science can me a difficult field to enter, let alone to learn. I know from my personal experience, the amount of statistics involved made it very challenging. Probability, in particular, can be quite complicated but is fundamental to many machine learning models such as decision tree learning. So the purpose of this article is to provide a rudimentary undertanding of conditional probability.

如果您目前正处于就业市场或正在寻求转行,您可能已经注意到Data Science职位的受欢迎程度有所提高。 根据职位空缺,薪水和职业晋升机会,LinkedIn在2019年将“数据科学家”排在美国最有前途的工作之一,并报告说数据科学家的职位空缺比上一年增长了56%。 尽管它非常流行,但是数据科学还是一个很难进入的领域,更不用说学习了。 从我的亲身经历,我知道所涉及的统计数据非常具有挑战性。 概率尤其可能非常复杂,但是对于许多机器学习模型(例如决策树学习)而言,这是基础。 因此,本文的目的是提供对条件概率的基本理解。

How To Calculate Probability

如何计算概率

Simply put, the probability of an event happening is equal to the number of times an event could happen divided by the total number of outcomes. For example, imagine you have a deck of cards and you want to calculate the probability that you’ll randomly pull a king from the deck. How would you calculate that? Well, since there are 4 kings in a deck of cards, there are 4 possible ways you can draw a king from the deck; and since there are 52 cards in the deck, there’s 52 possible outcomes. So 4 divided by 52 is .076 or 7.6% chance your card will be a king. Now say you want to figure out the probability of drawing another king — the answer will depend on how you handle replacement. Sampling with replacement means that you place the first card back into the deck making the two events independant (the probability of drawing each king doesn’t change). Sampling without replacement means you’re not placing the first card back, which affects the probability of drawing the second king (total number of outcomes is now 51). If event A is drawing the first king card and event B os drawing the second king card, then we’d say the probability of B given A is equal to the probability of event A multiplied by the probability of event B given that A occurs.

简而言之,事件发生的概率等于事件可能发生的次数除以结果总数。 例如,假设您有一副扑克牌,并且想要计算随机从该副牌中拉出国王的概率。 您将如何计算? 好吧,由于在一副纸牌中有4个国王,因此有四种方法可以从纸牌中抽出一张国王; 而且由于套牌中有52张牌,因此有52种可能的结果。 因此,将4除以52得出的结果是.076,即7.6%的机会是您的卡成为王牌。 现在,您要确定吸引另一位国王的可能性-答案将取决于您如何进行替换 进行替换采样意味着您将第一张卡放回卡组中,从而使两个事件无关(抽出每位国王的概率不变)。 无需更换就可以进行采样,这意味着您不会放回第一张纸牌,这会影响抽出第二张王牌的可能性(现在总结果为51)。 如果事件A吸引第一张王牌而事件B os吸引第二张王牌,那么我们说给定A的B概率等于事件A的概率乘以给定A发生的事件B的概率。

Mathematical Notation
P(A and B) = P(A) x P(B|A) = 4/52 x 3/51 = .45%

Tree Diagram

树状图

Mathematics isn’t intuitive to everyone; it certainly wasn’t for me as I was just starting out in this field. Visualizations, however, can be a great tool when it comes to reenforcing complex topics. A tree diagram is one example that can help you break down a general problem into smaller components — perfect for probability problems that involves multiple events that lead to a variety of outcomes. For example, take a look at the diagram I’ve created that helps answer the following question: If you have a bag of 23 marbles (5 green, 8 blue, and 10 red), what’s the probability that you’ll randomly pull out a blue marble and a green marble? Let’s break it down.

数学不是每个人都直观的。 因为我刚开始涉足这一领域,所以对我当然不是。 但是,在强化复杂主题时,可视化可能是一个很好的工具。 树形图是一个示例,可以帮助您将一般问题分解为较小的部分-非常适合涉及多个事件并导致各种结果的概率问题。 例如,看一下我创建的有助于回答以下问题的图表:如果您有一袋23颗大理石(5颗绿色,8颗蓝色和10颗红色),那么您随机抽出的概率是多少?蓝色大理石和绿色大理石? 让我们分解一下。

  1. The probability of grabbing a blue marble is 35%, because there are 8 way you can get a blue marble and 23 total potential outcomes.

    抓住蓝色大理石的可能性为35%,因为有8种方法可以获取蓝色大理石,并且有23种潜在结果。
  2. Now given that you pulled out a blue marble, the probability of grabbing a green marble from the bag is 23% — 5 green marbles divided by 22 potential outcomes (notice how the total number of outcomes changes the second time, hence the change in probability).

    现在,假设您拔出一块蓝色大理石,则从袋子中抓取绿色大理石的概率为23%-5个绿色大理石除以22个潜在结果(请注意结果总数如何第二次更改,因此概率发生变化)

  3. Finally, calculating the probability of both these events happening involves multiplying the probability of both events (.35 x .23 = 8%).

    最后,计算这两个事件发生的概率涉及将两个事件的概率相乘(.35 x .23 = 8%)。

Conclusion

结论

Hopefully this demsonstration has given you a clearer mental picture of statistical probability. Even though conditional probability may seem elementary compared to the more advanced concepts in machine learning, having a solid understanding of the foundation of which data science is built on is extremely important. So whenever you begin to learn something new, remember that no topic is too small and relearning is reenforcement.

希望这种演示能使您对统计概率有更清晰的认识。 尽管与机器学习中更高级的概念相比,条件概率似乎是基本的,但对数据科学所基于的基础有扎实的了解仍然非常重要。 因此,每当您开始学习新知识时,请记住,没有一个主题太小,重新学习就是强化。

翻译自: https://medium.com/swlh/conditional-probability-7f519a81655e

条件概率分布

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389475.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

MP实战系列(十七)之乐观锁插件

声明,目前只是仅仅针对3.0以下版本,2.0以上版本。 意图: 当要更新一条记录的时候,希望这条记录没有被别人更新 乐观锁实现方式: 取出记录时,获取当前version 更新时,带上这个version 执行更新时…

二叉树删除节点,(查找二叉树最大值节点)

从根节点往下分别查找左子树和右子树的最大节点,再比较左子树,右子树,根节点的大小得到结果,在得到左子树和右子树最大节点的过程相似,因此可以采用递归的 //树节点结构 public class TreeNode { TreeNode left;…

Tensorflow框架:InceptionV3网络概念及实现

卷积神经网络迁移学习-Inception • 有论文依据表明可以保留训练好的inception模型中所有卷积层的参数,只替换最后一层全连接层。在最后 这一层全连接层之前的网络称为瓶颈层。 • 原理:在训练好的inception模型中,因为将瓶颈层的输出再通过…

View详解(4)

在上文中我们简单介绍了Canvas#drawCircle()的使用方式,以及Paint#setStyle(),Paint#setStrokeWidth(),Paint#setColor()等相关函数,不知道小伙伴们了解了多少?那么是不是所有的图形都能通过圆来描述呢?当然不行,那么熟…

成为一名真正的数据科学家有多困难

Data Science and Machine Learning are hard sports to play. It’s difficult enough to motivate yourself to sit down and learn some maths, let alone to becoming an expert on the matter.数据科学和机器学习是一项艰巨的运动。 激励自己坐下来学习一些数学知识是非常…

Ubuntu 装机软件

Ubuntu16.04 软件商店闪退打不开 sudo apt-get updatesudo apt-get dist-upgrade# 应该执行一下更新就好,不需要重新安装软件中心 sudo apt-get install –reinstall software-center Ubuntu16.04 深度美化 https://www.jianshu.com/p/4bd2d9b1af41 Ubuntu18.04 美化…

数据分析中的统计概率_了解统计和概率:成为专家数据科学家

数据分析中的统计概率Data Science is a hot topic nowadays. Organizations consider data scientists to be the Crme de la crme. Everyone in the industry is talking about the potential of data science and what data scientists can bring in their BigTech and FinT…

Keras框架:Mobilenet网络代码实现

Mobilenet概念: MobileNet模型是Google针对手机等嵌入式设备提出的一种轻量级的深层神经网络,其使用的核心思想便是depthwise separable convolution。 Mobilenet思想: 通俗地理解就是3x3的卷积核厚度只有一层,然后在输入张量上…

clipboard 在 vue 中的使用

简介 页面中用 clipboard 可以进行复制粘贴&#xff0c;clipboard能将内容直接写入剪切板 安装 npm install --save clipboard 使用方法一 <template><span>{{ code }}</span><iclass"el-icon-document"title"点击复制"click"co…

数据驱动开发_开发数据驱动的股票市场投资方法

数据驱动开发Data driven means that your decision are driven by data and not by emotions. This approach can be very useful in stock market investment. Here is a summary of a data driven approach which I have been taking recently数据驱动意味着您的决定是由数据…

前端之sublime text配置

接下来我们来了解如何调整sublime text的配置&#xff0c;可能很多同学下载sublime text的时候就是把它当成记事本来使用&#xff0c;也就是没有做任何自定义的配置&#xff0c;做一些自定义的配置可以让sublime text更适合我们的开发习惯。 那么在利用刚才的命令面板我们怎么打…

python 时间序列预测_使用Python进行动手时间序列预测

python 时间序列预测Time series analysis is the endeavor of extracting meaningful summary and statistical information from data points that are in chronological order. They are widely used in applied science and engineering which involves temporal measureme…

keras框架:目标检测Faster-RCNN思想及代码

Faster-RCNN&#xff08;RPN CNN ROI&#xff09;概念 Faster RCNN可以分为4个主要内容&#xff1a; Conv layers&#xff1a;作为一种CNN网络目标检测方法&#xff0c;Faster RCNN首先使用一组基础的convrelupooling层提取 image的feature maps。该feature maps被共享用于…

算法偏见是什么_算法可能会使任何人(包括您)有偏见

算法偏见是什么在上一篇文章中&#xff0c;我们展示了当数据将情绪从动作中剥离时会发生什么 (In the last article, we showed what happens when data strip emotions out of an action) In Part 1 of this series, we argued that data can turn anyone into a psychopath, …

大数据笔记-0907

2019独角兽企业重金招聘Python工程师标准>>> 复习: 1.clear清屏 2.vi vi xxx.log i-->edit esc-->command shift:-->end 输入 wq 3.cat xxx.log 查看 --------------------------- 1.pwd 查看当前光标所在的path 2.家目录 /boot swap / 根目录 起始位置 家…

Tensorflow框架:目标检测Yolo思想

Yolo-You Only Look Once YOLO算法采用一个单独的CNN模型实现end-to-end的目标检测&#xff1a; Resize成448448&#xff0c;图片分割得到77网格(cell)CNN提取特征和预测&#xff1a;卷积部分负责提取特征。全链接部分负责预测&#xff1a;过滤bbox&#xff08;通过nms&#…

线性回归非线性回归_了解线性回归

线性回归非线性回归Let’s say you’re looking to buy a new PC from an online store (and you’re most interested in how much RAM it has) and you see on their first page some PCs with 4GB at $100, then some with 16 GB at $1000. Your budget is $500. So, you es…

朴素贝叶斯和贝叶斯估计_贝叶斯估计收入增长的方法

朴素贝叶斯和贝叶斯估计Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works wi…

numpy统计分布显示

import numpy as np from sklearn.datasets import load_iris dataload_iris()petal_lengthnumpy.array(list(len[2]for len in data[data]))#取出花瓣长度数据 print(np.max(petal_length))#花瓣长度最大值 print(np.mean(petal_length))#花瓣长度平均值 print(np.std(petal_l…

python数据结构:进制转化探索

*********************************第一部分******************************************************************************************************************************************************************************************# 输入excel的行号&#xff0c;…