大数据定律与中心极限定理_为什么中心极限定理对数据科学家很重要?

大数据定律与中心极限定理

数据科学 (Data Science)

The Central Limit Theorem is at the center of statistical inference what each data scientist/data analyst does every day.

中心极限定理是每个数据科学家/数据分析师每天所做的统计推断的中心。

Central Limit Theorem performs a significant part in statistical inference. It depicts precisely how much an increase in sample size diminishes sampling error, which tells us about the precision or margin of error for estimates of statistics, for example, percentages, from samples.

中心极限定理在统计推断中起着重要作用。 它精确地描述了样本数量的增加在多大程度上减少了抽样误差,从而告诉我们关于统计估计值(例如,样本中的百分比)的精度或误差范围。

Statistical inference depends on the possibility that it is conceivable to take a broad view results from a sample to the population. How might we guarantee that relations seen in an example are not just because of the possibility?

统计推断取决于是否有可能对样本进行总体评估。 我们如何保证在示例中看到的关系不仅仅是因为可能性?

Significance tests are intended to offer a target measure to inform decisions about the validity of the broad view. For instance, one can locate a negative relationship in a sample between education and income. However, added information is essential to show that the outcome isn’t just because of possibility, yet that it is statistically significant.

重要性测试旨在提供一种目标度量,以告知有关广泛视野有效性的决策。 例如,可以在样本中发现教育与收入之间的负相关关系。 但是,添加信息对于显示结果不仅是因为可能,而且在统计上也很重要至关重要。

The Central Limit Theorem (CLT) is a mainstay of statistics and probability. The theorem expresses that as the size of the sample expands, the distribution of the mean among multiple samples will be like a Gaussian distribution.

中心极限定理 (CLT)是统计和概率的中流tay柱。 该定理表示,随着样本大小的扩展,多个样本之间的均值分布将类似于高斯分布

We can think of doing a trial and getting an outcome or an observation. We can rehash the test again and get another independent observation. Accumulated, numerous observations represent a sample of observations.

我们可以考虑进行试验并获得结果或观察结果。 我们可以再次重新测试,并获得另一个独立的观察结果。 累积的大量观察值代表观察值样本。

On the off chance that we calculate the mean of a sample, it will approximate the mean of the population distribution. In any case, like any estimate, it will not be right and will contain some mistakes. On the off chance that we draw numerous independent samples, and compute their means, the distribution of those means will shape a Gaussian distribution.

在计算样本均值的偶然机会上,它将近似于总体分布的均值。 无论如何,像任何估计一样,这都是不正确的,并且会包含一些错误。 在偶然的机会下,我们将抽取大量独立样本并计算其均值,这些均值的分布将形成高斯分布。

It is significant that every trial that outcomes in an observation be autonomous and acted similarly. This is to guarantee that the sample is drawing from the equivalent fundamental population distribution. More officially, this desire is alluded to as autonomous and indistinguishably distributed or set of comparative statements.

重要的是,观察结果中的每项试验都应具有自主性并采取类似的行动。 这是为了确保样本来自等效的基本人口分布。 更正式地说,这种愿望被指为自主的,无差别的分布或一组比较表述。

As far as possible, the central limit theorem is regularly mistaken for the law of large numbers (LLN) by beginners. They are non -identical, and the key differentiation between them is that the LLN relies upon the size of a single sample, though the CLT relies upon the number of samples.

初学者经常将中心极限定理经常误认为是大数定律 (LLN)。 它们是不同的,它们之间的主要区别在于LLN依赖于单个样本的大小,而CLT则依赖于样本的数量。

LLN expresses that the sample means of independent and indistinguishably distributed observations perceptions joins to a certain value as far as possible CLT portrays the distribution of the distinction between the sample means and the value.

LLN表示,独立且无差别分布的观测知觉的样本均值将加入一个特定值,而CLT则描绘了样本均值与值之间的区别的分布。

Since as far as possible, the central limit theorem gives us a certain distribution over our estimations. We can utilize this to pose an inquiry about the probability of an estimate that we make. For example, assume we are attempting to think about how an election will turn out.

由于尽可能地,中心极限定理给了我们估计值的一定分布。 我们可以利用它来提出关于我们做出估计的概率的询问。 例如,假设我们试图考虑选举的结果。

We take a survey and discover that in our sample, 30% of individual would decide in favor of candidate A over candidate B. Obviously, we have just seen a small sample of the total population, so we had preferred to know whether our outcome can be said to hold for the whole population, and if it can’t, we’d like to understand how substantial the error may be.

我们进行了一项调查,发现在我们的样本中,有30%的人会选择候选人A胜过候选人B。显然,我们只看到了总人口中的一小部分,因此我们更想知道我们的结果是否可以据说可以容纳整个人口,如果不能,我们想了解这个错误可能有多大。

As far as possible, the central limit theorem discloses to us that on the off chance that we ran the survey over and again, the subsequent theories would be normally distributed across the real population value.

中心极限定理尽可能地向我们揭示,如果我们不需一次又一次地进行调查,那么随后的理论将在实际人口价值上呈正态分布。

The CLT works from the center out. That implies on the off chance that you are presuming close to the center, for example, that around two-thirds of future totals will fall inside one standard deviation of the mean, you can be secure even with little samples.

CLT从中央开始工作。 这意味着您很有可能会假设自己靠近中心,例如,大约三分之二的未来总量将落在均值的一个标准差之内,即使样本量很少,您也可以放心。

However, if you talk about the tails, for example, presuming that whole in excess of five standard deviations from the mean is almost unthinkable, you can be mortified, even with sizable samples.

但是,如果您谈论的是尾巴,例如,假设与平均值相比超出5个标准差的整数几乎是不可想象的,那么即使有相当大的样本,您也可能会被贬低。

The CLT disappoints when a distribution has a non-limited variance. These cases are rare yet might be significant in certain fields.

当分布具有无限制的方差时,CLT会令人失望。 这些情况很少见,但在某些领域可能很重要。

CLT asserts the prominence of the Gaussian distribution as a natural restricting distribution. It legitimizes numerous theories associated to statistics, for example, the normality of the error terms in linear regression is the independent totality of numerous random variables with limited variance or undetectable errors, we can normally expect it is normally distributed.

CLT断言, 高斯分布的突出之处是自然的限制性分布。 它使与统计有关的众多理论合法化,例如,线性回归中误差项的正态性是方差有限或无法检测到的众多随机变量的独立总数,我们通常可以期望其呈正态分布。

Solidly, when you don’t have a clue about the distribution of certain data, at that point, you can utilize the CLT to presume about their normality.

当然,当您对某些数据的分布一无所知时,可以使用CLT推测其正常性。

In any case, the drawback of the CLT is that it is frequently utilized without checking the suspicions, which has been the situation in finance domain for quite a while, assuming returns were normal, though they have a fat-tailed distribution, which characteristically carries a greater number of dangers than the normal distribution.

无论如何,CLT的缺点是经常使用它而没有检查怀疑,这在金融领域已经存在了相当长的一段时间,假设收益是正常的,尽管它们具有肥大的分布 ,通常具有危险性比正常分布更大。

CLT doesn’t have any significant bearing when you are managing with sums of dependent random variables or sums of non- indistinguishably distributed random variables or sums of random variables that breach both the autonomy condition and the indistinguishably distributed condition.

当您处理因变量随机和的总和,不可区分分布的随机变量的总和或违反自治条件和不可区分分布的条件的随机变量的总和时,CLT没有任何重要意义。

There are additional central limit theorems that loosen up the autonomy or indistinguishably distributed conditions. For example, there is the Lindberg-Feller theorem, which despite everything, necessitates that the random variables be independent, yet it loosens up the indistinguishably distributed condition.

还有其他的中心极限定理,可以放宽自治性或难以区分的分布条件。 例如,有一个Lindberg-Feller定理,尽管有所有这些定理,但它要求随机变量是独立的,但它却松开了难以区分的分布条件。

In conclusion, the advantage of the CLT is that it is powerful, meaning implying that regardless of whether the data originates from an assortment of distributions if their mean and variance are the equivalent, the theorem can even now be utilized.

总之,CLT的优势在于功能强大,这意味着无论数据的均值和方差是否相等,无论数据是否源自各种分布,该定理现在都可以使用。

翻译自: https://medium.com/towards-artificial-intelligence/why-is-central-limit-theorem-important-to-data-scientist-49a40f4f0b4f

大数据定律与中心极限定理

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390548.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

useEffect语法讲解

useEffect语法讲解 用法 useEffect(effectFn, deps)能力 useEffect Hook 相当于 componentDidMount,componentDidUpdate 和 componentWillUnmount 这三个函数的组合。 可以模拟渲染后、更新后、销毁三个动作。 案例演示 渲染后更新标题 useEffect(()>{doc…

leetcode 726. 原子的数量

给定一个化学式formula(作为字符串),返回每种原子的数量。 原子总是以一个大写字母开始,接着跟随0个或任意个小写字母,表示原子的名字。 如果数量大于 1,原子后会跟着数字表示原子的数量。如果数量等于 1…

web相关基础知识1

2017-12-13 09:47:11 关于HTML 1.绝对路径和相对路径 相对路径:相对于文件自身为参考。 (工作中一般是使用相对路径) 这里我们用html文件为参考。如果说html和图片平级,那直接使用src 如果说图片在和html平级的文件夹里面&#xf…

JavaScript循环:标签语句,继续语句和中断语句说明

标签声明 (Label Statement) The Label Statement is used with the break and continue statements and serves to identify the statement to which the break and continue statements apply. Label语句与break和continue语句一起使用,用于标识break和continue语…

马约拉纳费米子:推动量子计算的“天使粒子”

据《人民日报》报道,以华人科学家为主体的科研团队找到了正反同体的“天使粒子”——马约拉纳费米子,从而结束了国际物理学界对这一神秘粒子长达80年的漫长追寻。该成果由加利福尼亚大学洛杉矶分校何庆林、王康隆课题组,美国斯坦福大学教授张…

leetcode 1711. 大餐计数

大餐 是指 恰好包含两道不同餐品 的一餐,其美味程度之和等于 2 的幂。 你可以搭配 任意 两道餐品做一顿大餐。 给你一个整数数组 deliciousness ,其中 deliciousness[i] 是第 i​​​​​​​​​​​​​​ 道餐品的美味程度,返回你可以用…

您的第一个简单的机器学习项目

This article is for those dummies like me, who’ve never tried to know what machine learning was or have left it halfway for the sole reason of being overwhelmed. Follow through every line and stay along. I promise you’d be quite acquainted with giving yo…

eclipse报Access restriction: The type 'BASE64Decoder' is not API处理方法

今天从svn更新代码之后,由于代码中使用了BASE64Encoder 更新之后报如下错误: Access restriction: The type ‘BASE64Decoder’ is not API (restriction on required library ‘D:\java\jdk1.7.0_45\jre\lib\rt.jar’) 解决其实很简单,把JR…

【跃迁之路】【451天】程序员高效学习方法论探索系列(实验阶段208-2018.05.02)...

(跃迁之路)专栏 实验说明 从2017.10.6起,开启这个系列,目标只有一个:探索新的学习方法,实现跃迁式成长实验期2年(2017.10.06 - 2019.10.06)我将以自己为实验对象。我将开源我的学习方法,方法不断…

react jest测试_如何使用React测试库和Jest开始测试React应用

react jest测试Testing is often seen as a tedious process. Its extra code you have to write, and in some cases, to be honest, its not needed. But every developer should know at least the basics of testing. It increases confidence in the products they build,…

面试题 17.10. 主要元素

题目 数组中占比超过一半的元素称之为主要元素。给你一个 整数 数组,找出其中的主要元素。若没有,返回 -1 。请设计时间复杂度为 O(N) 、空间复杂度为 O(1) 的解决方案。 示例 1: 输入:[1,2,5,9,5,9,5,5,5] 输出:5 …

简单团队-爬取豆瓣电影T250-项目进度

本次主要讲解一下我们的页面设计及展示最终效果: 页面设计主要用到的软件是:html,css,js, 主要用的编译器是:sublime,dreamweaver,eclipse,由于每个人使用习惯不一样&…

鸽子为什么喜欢盘旋_如何为鸽子回避系统设置数据收集

鸽子为什么喜欢盘旋鸽子回避系统 (Pigeon Avoidance System) Disclaimer: You are reading Part 2 that describes the technical setup. Part 1 gave an overview of the Pigeon Avoidance System and Part 3 provides details about the Pigeon Recognition Model.免责声明&a…

scrum认证费用_如何获得专业Scrum大师的认证-快速和慢速方式

scrum认证费用A few months ago, I got the Professional Scrum Master Certification (PSM I). 几个月前,我获得了专业Scrum Master认证(PSM I)。 This is a trending certification nowadays, because most companies operate with some sort of agile methodolo…

981. 基于时间的键值存储

创建一个基于时间的键值存储类 TimeMap,它支持下面两个操作: set(string key, string value, int timestamp) 存储键 key、值 value,以及给定的时间戳 timestamp。 get(string key, int timestamp) 返回先前调用 set(key, value, timesta…

前端开发-DOM

文档对象模型(Document Object Model,DOM)是一种用于HTML和XML文档的编程接口。它给文档提供了一种结构化的表示方法,可以改变文档的内容和呈现方式。我们最为关心的是,DOM把网页和脚本以及其他的编程语言联系了起来。…

css 绘制三角形_解释CSS形状:如何使用纯CSS绘制圆,三角形等

css 绘制三角形Before we start. If you want more free content but in video format. Dont miss out on my Youtube channel where I publish weekly videos on FrontEnd coding. https://www.youtube.com/user/Weibenfalk----------Are you new to web development and CSS?…

密码学基本概念(一)

区块链兄弟社区,区块链技术专业问答先行者,中国区块链技术爱好者聚集地 作者:于中阳 来源:区块链兄弟 原文链接:http://www.blockchainbrother.com/article/72 著权归作者所有。商业转载请联系作者获得授权&#xff0c…

JAVA-初步认识-第十三章-多线程(验证同步函数的锁)

一. 至于同步函数用的是哪个锁,我们可以验证一下,借助原先卖票的例子 对于程序中的num,从100改为400,DOS的结果显示的始终都是0线程,票号最小都是1。 票号是没有问题的,因为同步了。 有人针对只出现0线程&a…