大数据定律与中心极限定理
数据科学 (Data Science)
The Central Limit Theorem is at the center of statistical inference what each data scientist/data analyst does every day.
中心极限定理是每个数据科学家/数据分析师每天所做的统计推断的中心。
Central Limit Theorem performs a significant part in statistical inference. It depicts precisely how much an increase in sample size diminishes sampling error, which tells us about the precision or margin of error for estimates of statistics, for example, percentages, from samples.
中心极限定理在统计推断中起着重要作用。 它精确地描述了样本数量的增加在多大程度上减少了抽样误差,从而告诉我们关于统计估计值(例如,样本中的百分比)的精度或误差范围。
Statistical inference depends on the possibility that it is conceivable to take a broad view results from a sample to the population. How might we guarantee that relations seen in an example are not just because of the possibility?
统计推断取决于是否有可能对样本进行总体评估。 我们如何保证在示例中看到的关系不仅仅是因为可能性?
Significance tests are intended to offer a target measure to inform decisions about the validity of the broad view. For instance, one can locate a negative relationship in a sample between education and income. However, added information is essential to show that the outcome isn’t just because of possibility, yet that it is statistically significant.
重要性测试旨在提供一种目标度量,以告知有关广泛视野有效性的决策。 例如,可以在样本中发现教育与收入之间的负相关关系。 但是,添加信息对于显示结果不仅是因为可能,而且在统计上也很重要至关重要。
The Central Limit Theorem (CLT) is a mainstay of statistics and probability. The theorem expresses that as the size of the sample expands, the distribution of the mean among multiple samples will be like a Gaussian distribution.
中心极限定理 (CLT)是统计和概率的中流tay柱。 该定理表示,随着样本大小的扩展,多个样本之间的均值分布将类似于高斯分布 。
We can think of doing a trial and getting an outcome or an observation. We can rehash the test again and get another independent observation. Accumulated, numerous observations represent a sample of observations.
我们可以考虑进行试验并获得结果或观察结果。 我们可以再次重新测试,并获得另一个独立的观察结果。 累积的大量观察值代表观察值样本。
On the off chance that we calculate the mean of a sample, it will approximate the mean of the population distribution. In any case, like any estimate, it will not be right and will contain some mistakes. On the off chance that we draw numerous independent samples, and compute their means, the distribution of those means will shape a Gaussian distribution.
在计算样本均值的偶然机会上,它将近似于总体分布的均值。 无论如何,像任何估计一样,这都是不正确的,并且会包含一些错误。 在偶然的机会下,我们将抽取大量独立样本并计算其均值,这些均值的分布将形成高斯分布。
It is significant that every trial that outcomes in an observation be autonomous and acted similarly. This is to guarantee that the sample is drawing from the equivalent fundamental population distribution. More officially, this desire is alluded to as autonomous and indistinguishably distributed or set of comparative statements.
重要的是,观察结果中的每项试验都应具有自主性并采取类似的行动。 这是为了确保样本来自等效的基本人口分布。 更正式地说,这种愿望被指为自主的,无差别的分布或一组比较表述。
As far as possible, the central limit theorem is regularly mistaken for the law of large numbers (LLN) by beginners. They are non -identical, and the key differentiation between them is that the LLN relies upon the size of a single sample, though the CLT relies upon the number of samples.
初学者经常将中心极限定理经常误认为是大数定律 (LLN)。 它们是不同的,它们之间的主要区别在于LLN依赖于单个样本的大小,而CLT则依赖于样本的数量。
LLN expresses that the sample means of independent and indistinguishably distributed observations perceptions joins to a certain value as far as possible CLT portrays the distribution of the distinction between the sample means and the value.
LLN表示,独立且无差别分布的观测知觉的样本均值将加入一个特定值,而CLT则描绘了样本均值与值之间的区别的分布。
Since as far as possible, the central limit theorem gives us a certain distribution over our estimations. We can utilize this to pose an inquiry about the probability of an estimate that we make. For example, assume we are attempting to think about how an election will turn out.
由于尽可能地,中心极限定理给了我们估计值的一定分布。 我们可以利用它来提出关于我们做出估计的概率的询问。 例如,假设我们试图考虑选举的结果。
We take a survey and discover that in our sample, 30% of individual would decide in favor of candidate A over candidate B. Obviously, we have just seen a small sample of the total population, so we had preferred to know whether our outcome can be said to hold for the whole population, and if it can’t, we’d like to understand how substantial the error may be.
我们进行了一项调查,发现在我们的样本中,有30%的人会选择候选人A胜过候选人B。显然,我们只看到了总人口中的一小部分,因此我们更想知道我们的结果是否可以据说可以容纳整个人口,如果不能,我们想了解这个错误可能有多大。
As far as possible, the central limit theorem discloses to us that on the off chance that we ran the survey over and again, the subsequent theories would be normally distributed across the real population value.
中心极限定理尽可能地向我们揭示,如果我们不需一次又一次地进行调查,那么随后的理论将在实际人口价值上呈正态分布。
The CLT works from the center out. That implies on the off chance that you are presuming close to the center, for example, that around two-thirds of future totals will fall inside one standard deviation of the mean, you can be secure even with little samples.
CLT从中央开始工作。 这意味着您很有可能会假设自己靠近中心,例如,大约三分之二的未来总量将落在均值的一个标准差之内,即使样本量很少,您也可以放心。
However, if you talk about the tails, for example, presuming that whole in excess of five standard deviations from the mean is almost unthinkable, you can be mortified, even with sizable samples.
但是,如果您谈论的是尾巴,例如,假设与平均值相比超出5个标准差的整数几乎是不可想象的,那么即使有相当大的样本,您也可能会被贬低。
The CLT disappoints when a distribution has a non-limited variance. These cases are rare yet might be significant in certain fields.
当分布具有无限制的方差时,CLT会令人失望。 这些情况很少见,但在某些领域可能很重要。
CLT asserts the prominence of the Gaussian distribution as a natural restricting distribution. It legitimizes numerous theories associated to statistics, for example, the normality of the error terms in linear regression is the independent totality of numerous random variables with limited variance or undetectable errors, we can normally expect it is normally distributed.
CLT断言, 高斯分布的突出之处是自然的限制性分布。 它使与统计有关的众多理论合法化,例如,线性回归中误差项的正态性是方差有限或无法检测到的众多随机变量的独立总数,我们通常可以期望其呈正态分布。
Solidly, when you don’t have a clue about the distribution of certain data, at that point, you can utilize the CLT to presume about their normality.
当然,当您对某些数据的分布一无所知时,可以使用CLT推测其正常性。
In any case, the drawback of the CLT is that it is frequently utilized without checking the suspicions, which has been the situation in finance domain for quite a while, assuming returns were normal, though they have a fat-tailed distribution, which characteristically carries a greater number of dangers than the normal distribution.
无论如何,CLT的缺点是经常使用它而没有检查怀疑,这在金融领域已经存在了相当长的一段时间,假设收益是正常的,尽管它们具有肥大的分布 ,通常具有危险性比正常分布更大。
CLT doesn’t have any significant bearing when you are managing with sums of dependent random variables or sums of non- indistinguishably distributed random variables or sums of random variables that breach both the autonomy condition and the indistinguishably distributed condition.
当您处理因变量随机和的总和,不可区分分布的随机变量的总和或违反自治条件和不可区分分布的条件的随机变量的总和时,CLT没有任何重要意义。
There are additional central limit theorems that loosen up the autonomy or indistinguishably distributed conditions. For example, there is the Lindberg-Feller theorem, which despite everything, necessitates that the random variables be independent, yet it loosens up the indistinguishably distributed condition.
还有其他的中心极限定理,可以放宽自治性或难以区分的分布条件。 例如,有一个Lindberg-Feller定理,尽管有所有这些定理,但它要求随机变量是独立的,但它却松开了难以区分的分布条件。
In conclusion, the advantage of the CLT is that it is powerful, meaning implying that regardless of whether the data originates from an assortment of distributions if their mean and variance are the equivalent, the theorem can even now be utilized.
总之,CLT的优势在于功能强大,这意味着无论数据的均值和方差是否相等,无论数据是否源自各种分布,该定理现在都可以使用。
翻译自: https://medium.com/towards-artificial-intelligence/why-is-central-limit-theorem-important-to-data-scientist-49a40f4f0b4f
大数据定律与中心极限定理
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390548.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!