贝叶斯统计 传统统计_统计贝叶斯如何补充常客

贝叶斯统计 传统统计

For many years, academics have been using so-called frequentist statistics to evaluate whether experimental manipulations have significant effects.

多年以来,学者们一直在使用所谓的常客统计学来评估实验操作是否具有significant效果。

Frequentist statistic is based on the concept of hypothesis testing, which is a mathematical based estimation of whether your results can be obtained by chance. The lower the value, the more significant it would be (in frequentist terms). By the same token, you can obtain non-significant results using the same approach. Most of these "negative" results are disregarded in research, although there is tremendous added value in also knowing what manipulations do not have an effect. But that’s for another post ;)

频率统计基于假设检验的概念,假设检验是基于数学的估计,您是否可以偶然获得结果。 值越低,它的意义就越大(以常用术语而言)。 同样,您可以使用相同的方法获得不重要的结果。 尽管大多数“负面”结果在了解什么操作没有效果的过程中具有巨大的附加价值 ,但它们在研究中被忽略。 但这是另一篇文章;)

Thing is, in such cases where no effect can be found, frequentist statistics are limited in their explanatory power, as I will argue in this post.

事实是,在找不到效果的情况下,常客统计资料的解释力受到限制,正如我将在本文中指出的那样。

Below, I will be exploring one limitation of frequentist statistics, and proposing an alternative method to frequentist hypothesis testing: Bayesian statistics. I will not go into a direct comparison between the two approaches. There is quite some reading out there, if you are interested. I will rather explore how why the frequentist approach presents some shortcomings, and how the two approaches can be complementary in some situations (rather than seeing them as mutually exclusive, as sometimes argued).

下面,我将探讨频率论者统计的局限性,并提出一种用于频率论者假设检验的替代方法: Bayesian统计。 我不会直接比较这两种方法。 如果您有兴趣的话,可以在这里读很多书。 我宁愿探索为什么频频主义者的方法会带来一些缺点,以及两种方法在某些情况下如何互补(而不是像有时所说的那样将它们视为互斥的)。

This is the first of two posts, where I will be focusing on the inability of frequentist statistics to disentangle between the absence of evidence and the evidence of absence.

这是两篇文章中的第一篇,我将重点关注常客统计数据无法区分缺乏证据缺乏证据之间的情况。

缺乏证据与缺乏证据 (Absence of evidence vs evidence of absence)

背景 (Background)

In the frequentist world, statistics typically output some statistical measures (t, F, Z values… depending on your test), and the almighty p-value. I discuss the limitations of only using p-values in another post, which you can read to get familiar with some concepts behind its computation. Briefly, the p-value, if significant (i.e., below an arbitrarily decided threshold, called alpha level, typically set at 0.05), determines that your manipulation most likely has an effect.

在常人世界中,统计数据通常会输出一些统计量度(t,F,Z值……取决于您的测试)以及全能的p值。 我将在另一篇文章中讨论仅使用p值的局限性,您可以阅读以熟悉其计算背后的一些概念。 简而言之,如果p值显着(即低于任意确定的阈值,称为alpha水平,通常设置为0.05),则表明您的操作最有可能产生效果。

However, what if (and that happens a lot), your p-value is > 0.05? In the frequentist world, such p-values do not allow you to disentangle between an absence of evidence and an evidence of absence of effect.

但是,如果(而且经常发生)您的p值> 0.05怎么办? 在常识世界中,此类p值不允许您在缺乏证据缺乏效果的证据之间做出区分。

Let that sink in for a little bit, because it is the crucial point here. In other words, frequentist statistics are pretty effective at quantifying the presence of an effect, but are quite poor at quantifying evidence for the absence of an effect. See here for literature.

让它陷入一点,因为这是关键。 换句话说,频繁出现的统计数据在量化效果存在方面非常有效,但在量化效果不存在的证据方面却很差。 有关文学,请参见此处 。

The demonstration below is taken from some work that was performed at the Netherlands Institute for Neuroscience, back when I was working in neuroscience research. A very nice paper was recently published on this topic, that I encourage you to read. The code below is inspired by the paper repository, written in R.

下面的演示摘自我在神经科学研究领域工作时在荷兰神经科学研究所所做的一些工作。 最近发表了一篇关于该主题的非常好的论文 ,我鼓励您阅读。 以下代码受R编写的纸质存储库的启发。

模拟数据 (Simulated Data)

Say we generate a random distribution with mean=0.5 and standard deviation=1.

假设我们生成一个均值= 0.5和标准差= 1的随机分布。

np.random.seed(42)
mean = 0.5; sd=1; sample_size=1000
exp_distibution = np.random.normal(loc=mean, scale=sd, size=sample_size)
plt.hist(exp_distibution)
Image for post
Figure 1 | Histogram depicting random draw from a normal distribution centered at 0.5
图1 直方图,描绘了以0.5为中心从正态分布随机抽取

That would be our experimental distribution, and we want to know whether that distribution is significantly different from 0. We could run a one sample t-test (which would be okay since the distribution seems very Gaussian, but you should theoretically prove that parametric testing assumptions are fulfilled; let’s assume they are)

那将是我们的实验分布,我们想知道该分布是否与0显着不同。我们可以运行一个样本t检验(因为分布看起来非常高斯,所以可以,但是理论上您应该证明参数测试满足假设;让我们假设它们是)

t, p = stats.ttest_1samp(a=exp_distibution, popmean=0)
print(‘t-value = ‘ + str(t))
print(‘p-value = ‘ + str(p))
Image for post

Quite a nice p-value that would make every PhD student’s spine shiver with happiness ;) Note that with that kind of sample size, almost anything gets significant, but let’s move on with the demonstration.

相当不错的p值会使每个博士生都对幸福感颤抖;)请注意,使用这种样本量,几乎所有东西都变得很重要,但让我们继续进行演示。

Now let’s try a distribution centered at 0, which should not be significantly different from 0

现在,让我们尝试以0为中心的分布,该分布与0的差别应该不大

mean = 0; sd=1; sample_size=1000
exp_distibution = np.random.normal(loc=mean, scale=sd, size=sample_size)
plt.hist(exp_distibutiont, p = stats.ttest_1samp(a=exp_distibution, popmean=0)
print(‘t-value = ‘ + str(t))
print(‘p-value = ‘ + str(p))
Image for post

Here, we have as expected a distribution that does not significantly differ from 0. And here is where things get a bit tricky: in some situations, frequentist statistics cannot really tell whether a p-value > 0.05 is an absence of evidence, and an evidence for absence, although that is a crucial point that would allow you to completely rule out an experimental manipulation from having an effect.

在这里,我们期望的分布与0的差异不大。在这里,情况变得有些棘手:在某些情况下,常客统计学不能真正判断p值> 0.05是否缺少证据,而缺席的证据,尽管这是至关重要的一点,可以让您完全排除实验性操作的影响。

Let’s take an hypothetical situation:

让我们假设一个情况:

You want to know whether a manipulation has an effect. It might be a novel marketing approach in your communication, a interference with biological activity or a “picture vs no picture” test in a mail you are sending. You of course have a control group to compare your experimental group to.

您想知道操作是否有效。 这可能是您交流中的一种新颖的营销方式,是对生物活动的干扰,也可能是您发送的邮件中的“图片无图片”测试。 您当然有一个对照组来比较您的实验组。

When collecting your data, you could see different patterns:

收集数据时,您会看到不同的模式:

  • (i) the two groups differ.

    (i)两组不同。
  • (ii) the two groups behave similarly.

    (ii)两组的行为相似。
  • (iii) you do not have enough observations to conclude (sample size too small)

    (iii)您没有足够的观察结论(样本量太小)

While option (i) is an evidence against the null hypothesis H0 (i.e., you have evidence that your manipulation had an effect), situations (ii) (=evidence for H0, i.e, evidence of absence) and (iii) (=no evidence, i.e, absence of evidence) cannot be disentangled using frequentist statistics. But maybe the bayesian approach can add something to this story...

尽管选项(i)是针对null hypothesis H0的证据(即,您有证据证明您的操纵有效果),但情况(ii)(= H0的证据,即不存在的证据)和(iii)(=否)证据,即没有证据)不能使用常客统计来弄清。 但是也许贝叶斯方法可以为这个故事增添些...

p值如何受效应和样本量影响 (How p-values are affected by effect and sample sizes)

The first thing is to illustrate the situations where frequentist statistics have shortcomings.

首先是要说明常客统计数据存在缺陷的情况。

方法背景 (Approach background)

What I will be doing is plotting how frequentist p-values behave when changing both effect size (i.e., the difference between your control, here with a mean=0, and your experimental distributions) and sample size (number of observations or data points).

我要做的是绘制同时更改效果大小 (即,控件的均值= 0和实验分布之间的差异)和样本大小 (观察值或数据点的数量)时,频繁P值的行为。

Let’s first write a function that would compute these p-values:

让我们首先编写一个可以计算这些p值的函数:

def run_t_test(m,n,iterations):
"""
Runs a t-test for different effect and sample sizes and stores the p-value
"""
my_p = np.zeros(shape=[1,iterations])
for i in range(0,iterations):
x = np.random.normal(loc=m, scale=1, size=n)
# Traditional one tailed t test
t, p = stats.ttest_1samp(a=x, popmean=0)
my_p[0,i] = p
return my_p

We can then define the parameters of the space we want to test, with different sample and effect sizes:

然后,我们可以使用不同的样本和效果大小来定义要测试的空间的参数:

# Defines parameters to be tested
sample_sizes = [5,8,10,15,20,40,80,100,200]
effect_sizes = [0, 0.5, 1, 2]
nSimulations = 1000

We can finally run the function and visualize:

我们最终可以运行该函数并进行可视化:

# Run the function to store all p-values in the array "my_pvalues"
my_pvalues = np.zeros((len(effect_sizes), len(sample_sizes),nSimulations))for mi in range(0,len(effect_sizes)):
for i in range(0, len(sample_sizes)):
my_pvalues[mi,i,] = run_t_test(m=effect_sizes[mi],
n=sample_sizes[i],
iterations=nSimulations
)

I will quickly visualize the data to make sure that the p-values seem correct. The output would be:

我将快速可视化数据以确保p值看起来正确。 输出为:

p-values for sample size = 5
Effect sizes:
0 0.5 1.0 2
0 0.243322 0.062245 0.343170 0.344045
1 0.155613 0.482785 0.875222 0.152519
p-values for sample size = 15
Effect sizes:
0 0.5 1.0 2
0 0.004052 0.010241 0.000067 1.003960e-08
1 0.001690 0.000086 0.000064 2.712946e-07

I would make two main observations here:

我将在这里做两个主要观察:

  1. When you have high enough sample size (lower section), the p-values behave as expected and decrease with increasing effect sizes (since you have more robust statistical power to detect the effect).

    当样本量足够大时(下半部分),p值将按预期表现,并随着效果大小的增加而减小(因为您有更强大的统计能力来检测效果)。
  2. However, we also see that the p-values are not significant for a small sample sizes, even if the effect sizes are quite large (upper section). That is quite striking, since the effect sizes are the same, only the number of data points is different.

    但是,我们也看到即使样本量很大(上半部分),p值对于小样本量也并不重要。 这是非常惊人的,因为效果大小相同,所以只有数据点的数量不同。

Let’s visualize that.

让我们想象一下。

可视化 (Visualization)

For each sample size (5, 8, 10, 15, 20, 40, 80, 100, 200), we will count the number of p-values falling in significance level bins.

对于每个样本大小(5、8、10、15、20、40、80、100、200),我们将计算落入显着性等级箱中的p值的数量。

Let’s first compare two distributions of equal mean, that is, we have an effect size = 0.

让我们首先比较两个均值相等的分布,即我们的效果大小= 0。

Image for post
Figure 2 | Number of p values located in each “significance” bins for effect size = 0
图2 | 效果大小= 0时,每个“重要性”块中位于p值的数量

As we can see from the plot above, most of the p-values computed by the t-test are not significant for an experimental distribution of mean 0. That makes sense, since these two distributions are not different in their means.

从上图可以看出,通过t检验计算出的大多数p值对于平均值为0的实验分布而言并不重要。这是有道理的,因为这两种分布的均值没有差异。

We can, however, see that in some cases, we do obtain significant p values, which can happen when using very particular data points drawn from the overall population. These are typically false positive, and the reason why it is important to repeat experiments and replicate results ;)

但是,我们可以看到,在某些情况下,我们确实获得了显着的p值,当使用从总体总体中得出的非常特殊的数据点时,可能会发生这种情况。 这些通常都是假阳性,是重复实验和复制结果很重要的原因;)

Let’s see what happens if we use a distribution whose mean differs by 0.5 compared to the control:

让我们看看如果我们使用与控件相比均值相差0.5的分布会发生什么:

Image for post
Figure 3 | Number of p values per “significance” bins for effect size = 0.5
图3 | 每个“显着性”区域的p值数量,效果大小= 0.5

Now, we clearly see that increasing sample size dramatically increases the ability to detect the effect, with still many non significant p-values for low sample sizes.

现在,我们清楚地看到,增加样本量会极大地提高检测效果的能力,但对于低样本量,仍有许多不重要的p值。

Below, as expected, you see that for highly different distributions (effect size = 2), the number of significant p-values increase:

如下所示,可以看到,对于高度不同的分布(效果大小= 2),有效p值的数量增加:

Image for post
Figure 3 | Number of p values per “significance” bins for effect size = 2
图3 | 每个“显着性”仓的p值数量(效果大小= 2)

OK, so that was it for an illustrative example of how p-values are affected by sample and effect sizes.

好的,那是一个示例性示例,说明p值如何受样本和效果大小影响。

Now, the problem is that when you have a non significant p value, you are not always sure whether you might have missed the effect (say because you had a low sample size, due to limited observations or budget) or whether your data really suggest the absence of an effect. As matter of fact, most scientific research have a problem of statistical power, because they have limited observations (due to experimental constraints, budget, time, publishing time pressure, etc…).

现在的问题是,当您的p值不显着时,您将无法始终确定是否可能错过了效果(例如,由于观察或预算有限,样本量较小)还是您的数据确实暗示了没有效果。 实际上,大多数科学研究都有统计能力的问题,因为它们的观察力有限(由于实验限制,预算,时间,出版时间压力等)。

Since the reality of data in research is a rather low sample size, you still might want to draw meaningful conclusions from non significant results based on low sample sizes.

由于研究中数据的真实性相当低,因此您可能仍想根据低样本量从不重要的结果中得出有意义的结论。

Here, Bayesian statistics could help you make one more step with your data ;)

在这里,贝叶斯统计信息可以帮助您在数据处理方面迈出新一步;)

Stay tuned for the following post where I explore the Titanic and Boston data sets to demonstrate how Bayesian statistics can be useful in such cases!

请继续关注以下文章,在该文章中我将探索泰坦尼克号和波士顿的数据集,以证明贝叶斯统计量在这种情况下如何有用!

You can find this notebook in the following repo: https://github.com/juls-dotcom/bayes

您可以在以下回购中找到此笔记本: https : //github.com/juls-dotcom/bayes

翻译自: https://medium.com/@julien.her/statistics-how-bayesian-can-complement-frequentist-9ff171bb6396

贝叶斯统计 传统统计

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389773.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

saltstack二

配置管理 haproxy的安装部署 haproxy各版本安装包下载路径https://www.haproxy.org/download/1.6/src/,跳转地址为http,改为https即可 创建相关目录 # 创建配置目录 [rootlinux-node1 ~]# mkdir /srv/salt/prod/pkg/ [rootlinux-node1 ~]# mkdir /srv/sa…

319. 灯泡开关

319. 灯泡开关 初始时有 n 个灯泡处于关闭状态。第一轮,你将会打开所有灯泡。接下来的第二轮,你将会每两个灯泡关闭一个。 第三轮,你每三个灯泡就切换一个灯泡的开关(即,打开变关闭,关闭变打开&#xff0…

因为你的电脑安装了即点即用_即你所爱

因为你的电脑安装了即点即用Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and …

2074. 反转偶数长度组的节点

2074. 反转偶数长度组的节点 给你一个链表的头节点 head 。 链表中的节点 按顺序 划分成若干 非空 组,这些非空组的长度构成一个自然数序列(1, 2, 3, 4, …)。一个组的 长度 就是组中分配到的节点数目。换句话说: 节点 1 分配给…

团队管理新思考_需要一个新的空间来思考讨论和行动

团队管理新思考andrew wong安德鲁黄 Follow跟随 Sep 4 九月4 There is a need for a new space to think, discuss, and act. This need are being felt by the majority of AI / ML / Data Product Managers out there. They are exhausted by the ever increasing data volum…

2075. 解码斜向换位密码

2075. 解码斜向换位密码 字符串 originalText 使用 斜向换位密码 ,经由 行数固定 为 rows 的矩阵辅助,加密得到一个字符串 encodedText 。 originalText 先按从左上到右下的方式放置到矩阵中。 先填充蓝色单元格,接着是红色单元格&#xff…

微服务实战(六):落地微服务架构到直销系统(事件存储)

在CQRS架构中,一个比较重要的内容就是当命令处理器从命令队列中接收到相关的命令数据后,通过调用领域对象逻辑,然后将当前事件的对象数据持久化到事件存储中。主要的用途是能够快速持久化对象此次的状态,另外也可以通过未来最终一…

时间序列数据的多元回归_清理和理解多元时间序列数据

时间序列数据的多元回归No matter what kind of data science project one is assigned to, making sense of the dataset and cleaning it always critical for success. The first step is to understand the data using exploratory data analysis (EDA)as it helps us crea…

vue-cli搭建项目的目录结构及说明

vue-cli基于webpack搭建项目的目录结构 build文件夹 ├── build // 项目构建的(webpack)相关代码 │ ├── build.js // 生产环境构建代码(在npm run build的时候会用到这个文件夹)│ ├── check-versions.js // 检查node&am…

391. 完美矩形

391. 完美矩形 给你一个数组 rectangles ,其中 rectangles[i] [xi, yi, ai, bi] 表示一个坐标轴平行的矩形。这个矩形的左下顶点是 (xi, yi) ,右上顶点是 (ai, bi) 。 如果所有矩形一起精确覆盖了某个矩形区域,则返回 true ;否…

bigquery 教程_bigquery挑战实验室教程从数据中获取见解

bigquery 教程This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud R…

学习linux系统到底有没捷径?

2019独角兽企业重金招聘Python工程师标准>>> 说起linux操作系,可能对于很多不了解的人来说,第一个想到的就是类似于黑客帝国中的黑框框以及一串串不知所云的代码,总之这些感觉都可以总结成为一个字,那就是——酷&#…

wxpython实现界面跳转

wxPython实现Frame之间的跳转/更新的一种方法 wxPython是Python中重要的GUI框架,下面通过自己的方法实现模拟类似PC版微信登录,并跳转到主界面(朋友圈)的流程。 (一)项目目录 【说明】 icon : 保存项目使用…

java职业技能了解精通_如何通过精通数字分析来提升职业生涯的发展,第8部分...

java职业技能了解精通Continuing from the seventh article in this series, we are going to explore ways to present data. Over the past few years, Marketing and SEO field has become more data-driven than in the past thanks to tools like Google Webmaster Tools …

kfc流程管理炸薯条几秒_炸薯条成为数据科学的最后前沿

kfc流程管理炸薯条几秒In February, our Data Science team had an argument about which restaurant we went to made the best French Fry.2月,我们的数据科学团队对我们去哪家餐厅做得最好的炸薯条产生了争议。 We decided to make it a competition throughout…

bigquery_到Google bigquery的sql查询模板,它将您的报告提升到另一个层次

bigqueryIn this post, we’re sharing report templates that you can build with SQL queries to Google BigQuery data.在本文中,我们将分享您可以使用SQL查询为Google BigQuery数据构建的报告模板。 First, you’ll find out about what you can calculate wit…

分类树/装袋法/随机森林算法的R语言实现

原文首发于简书于[2018.06.12] 本文是我自己动手用R语言写的实现分类树的代码,以及在此基础上写的袋装法(bagging)和随机森林(random forest)的算法实现。全文的结构是: 分类树 基本知识predginisplitrules…

数据科学学习心得_学习数据科学时如何保持动力

数据科学学习心得When trying to learn anything all by yourself, it is easy to lose motivation and get thrown off track.尝试自己学习所有东西时,很容易失去动力并偏离轨道。 In this article, I will provide you with some tips that I used to stay focus…

用php当作cat使用

今天,本来是想敲 node test.js 执行一下,test.js文件,结果 惯性的敲成了 php test.js, 原文输出了 test.js的内容。 突然觉得,这东西 感觉好像是 cat 命令,嘿嘿,以后要是ubuntu 上没装 cat , …

建信01. 间隔删除链表结点

建信01. 间隔删除链表结点 给你一个链表的头结点 head,每隔一个结点删除另一个结点(要求保留头结点)。 请返回最终链表的头结点。 示例 1: 输入:head [1,2,3,4] 输出: [1,3] 解释: 蓝色结点为删除的结点…