贝叶斯朴素贝叶斯

介绍 (Introduction)

Bayesian analysis offers the possibility to get more insights from your data compared to the pure frequentist approach. In this post, I will walk you through a real life example of how a Bayesian analysis can be performed. I will demonstrate what may go wrong when choosing a wrong prior and we will see how we can summarize our results. For you to follow this post, I assume you are familiar with the foundations of Bayesian statistics and with Bayes' theorem.

与纯频率论方法相比，贝叶斯分析提供了从数据中获得更多见解的可能性。在本文中，我将向您介绍如何执行贝叶斯分析的真实示例。我将演示选择错误的先验时可能出问题的地方，我们将看到如何总结我们的结果。为了让您关注这篇文章，我假设您熟悉贝叶斯统计的基础和贝叶斯定理。

情境 (Scenario)

As an example analysis, we will discuss a real life problem from a physics lab. No worries, you don't need any physics knowledge for that. We want to determine the efficiency of a particle detector. A particle detector is a sensor that may produce a measurable signal when certain particles traverse it. The efficiency of the detector we want to evaluate is the chance that the detector actually measures the traversing particle. In order to measure this, we put the detector that we want to evaluate in between two other sensors in a sandwich-like structure. If we measure a signal in the top and bottom sensors we know that a particle should have also traversed the detector in the middle. A picture of the experimental setup is shown below.

作为示例分析，我们将讨论物理实验室中的现实生活中的问题。不用担心，您不需要任何物理知识。我们要确定粒子探测器的效率。粒子检测器是一种传感器，当某些粒子经过时会产生可测量的信号。我们要评估的检测器效率是检测器实际测量横越粒子的机会。为了对此进行测量，我们将要评估的检测器放在其他两个传感器之间，呈三明治状。如果我们在顶部和底部传感器中测量信号，我们知道粒子也应该在中间穿过检测器。实验设置的图片如下所示。

For the measurement, we count the number of traversing particles N in a certain time (as reported by the top and bottom sensors) as well as the number of signals measured in our detector r. For this example, we assume N=100 and r=98.

为了进行测量，我们计算了一定时间内(由顶部和底部传感器报告的)遍历粒子N的数量，以及在探测器r中测得的信号数量。对于此示例，我们假设N = 100和r = 98 。

频频结果 (Frequentist Result)

In a frequentist approach, we could use our measured data and arrive at the conclusion that the efficiency of the detector is e = r/N = 98%. This gives us only a point estimate. If we want to answer more complicated questions, for example: "What is the probability that the efficiency of the detector is above 99%", then we need a more complex analysis.

在常用方法中，我们可以使用我们的测量数据得出结论，即探测器的效率为e = r / N = 98％ 。这仅给我们一个点估计。如果我们想回答更复杂的问题，例如： “检测器的效率高于99％的概率是多少” ，那么我们需要进行更复杂的分析。

贝叶斯分析 (The Bayesian Analysis)

The goal of the Bayesian approach is to derive the full posterior probability distribution of the efficiency of the detector given our data p(e|D). In order to do so, we need Bayes' theorem:

贝叶斯方法的目标是在给定我们的数据p(e | D)的情况下 ，得出探测器效率的全部后验概率分布。为此，我们需要贝叶斯定理：

Bayes' Theorem: p(e|D) = p(D|e)p(e) / p(D) — Bayes' Theorem

We will go over the different terms in the following.

下面我们将讨论不同的术语。

概率模型/可能性： p(D | e) (Probability Model / Likelihood: p(D|e))

As always in a Bayesian analysis, we need to select a model that describes the process we want to analyse, called the likelihood. For our problem, we can interpret the efficiency as the chance to have a success (r) out of a certain number of trails (N). This class of problems, similar to determining the chance of a coin showing head, can be modeled by the binomial distribution:

与贝叶斯分析一样，我们需要选择一个模型来描述我们要分析的过程，即可能性。对于我们的问题，我们可以将效率解释为从一定数量的线索( N )中获得成功( r )的机会。此类问题类似于确定硬币出现正面的机会，可以通过二项分布来建模：

先前：p(e) (Prior: p(e))

Next, we need to define a prior. Here, we start with the most trivial choice, a flat prior. We will discuss the influence of a different prior choice later.

接下来，我们需要定义一个先验。在这里，我们从最简单的选择开始，即优先选择。稍后，我们将讨论不同的优先选择的影响。

边际可能性：p(D) (Marginal Likelihood: p(D))

The marginal likelihood is the denominator in Bayes' theorem. Luckily it is just a normalization constant and not dependent on the efficiency. We can determine it numerical by finding the constant that normalizes the posterior to 1.

边际可能性是贝叶斯定理中的分母。幸运的是，这只是一个归一化常数，与效率无关。我们可以通过找到将后验归一化为1的常数来确定它的数值。

结果 (Results)

Now we can calculate the posterior following Bayes' theorem.

现在我们可以计算遵循贝叶斯定理的后验。

You can see that the most probable value is e=98% which is the same as the intuitive frequentist result. But we obtained much more information here, as we got the full posterior probability distribution. For example, we can see that the distribution is asymmetric. An efficiency below 97% has a higher probability than an efficiency above 99%. And to both probabilities, we can assign exact numbers. How did we get this extra information? It is because we took advantage of more information, meaning we have assumed that the behaviour of the detector follows a binomial distribution as well as we assumed a flat prior distribution.

您可以看到最可能的值是e = 98％ ，这与直观的常客结果相同。但是，由于获得了完整的后验概率分布，我们在这里获得了更多的信息。例如，我们可以看到分布是不对称的。低于97％的效率比高于99％的效率更高的概率。对于这两种概率，我们可以分配确切的数字。我们如何获得这些额外信息？这是因为我们利用了更多的信息，这意味着我们假设检测器的行为遵循二项式分布，并且假设了先验分布平坦。

先验的影响 (Influence of the Prior)

The prior plays an important role in a Bayesian analysis. In the following, we will see what happens if we change it. Let’s say we find a statement in the datasheet of the detector that the efficiency can be assumed to be gaussian distributed around 98% with a standard deviation of s=1%. In an older version of the datasheet, however, we find that the efficiency of the detector should be Gaussian distributed around 92% with the same standard deviation of s=1%. We incorporate this information into the posterior by changing the priors accordingly. The results for both cases can be seen below.

先验在贝叶斯分析中起重要作用。在下面，我们将看到如果更改它会发生什么。假设我们在检测器的数据表中找到一条陈述，即效率可以假定为高斯分布，其标准偏差为s = 1％，约为98 ％。但是，在数据表的旧版本中，我们发现检测器的效率应为高斯分布，约为92％，且标准偏差为s = 1％ 。我们通过相应地更改先验将这些信息合并到后验中。这两种情况的结果都可以在下面看到。

Here, the posterior is shown in the top panel and the corresponding priors in the panel below. The black curve shows the previous result with the flat prior. When changing the prior to a gaussian one with mean m=98% (green) the posterior peaks again at 98% and the confidence in our estimates are stronger compared to the case with the flat prior. The prior supports our data. While an efficiency below 95% still had a reasonable probability in the case of the flat prior, it is nearly excluded now. Taking the prior from the old data sheet that peaked at an efficiency of 92% (red), we can see that the posterior differs significantly from the other two. The most probable value is around 93%, completely changing our results. How can this be? The problem is that by choosing a wrong prior the data and the prior are not consistent with each other. This example shows, that choosing a wrong prior may have catastrophic consequences. It is important to always evaluate the consistency between the prior, the probability model and the posterior.

在这里，后部显示在顶部面板中，而相应的先验显示在下方面板中。黑色曲线显示先前的结果，平坦的先验结果。当将先验者转换为均值m = 98％ (绿色)的高斯验算器时，后验峰再次以98％的峰值出现，并且与持平先验者相比，我们的估计信心更大。先验支持我们的数据。而效率低于在之前持平的情况下，仍有95％的人具有合理的可能性，现在几乎将其排除在外。从旧数据表中的先验数据以92％(红色)的效率达到峰值，我们可以看到，后验数据与其他两个数据表明显不同。最可能的值约为93％，完全改变了我们的结果。怎么会这样？问题在于，通过选择错误的先验，数据和先验数据彼此不一致。此示例表明，选择错误的先验可能会带来灾难性的后果。 始终评估先验概率模型和后验模型之间的一致性很重要。

合并其他度量 (Incorporating Additional Measurements)

Another use case for a prior is an additional measurement. Imagine your colleague measured the same detector. He measured N1=300 and r1=280. How can we correctly make use of this data? We can use it as a prior for our analysis. The results are shown below.

先验的另一个用例是额外的度量。想象一下您的同事测量了相同的检测器。他测得N1 = 300和r1 = 280 。我们如何正确利用这些数据？我们可以将其用作分析的先验条件。结果如下所示。

You can see the posterior distribution of our measurement (black) and the colleague's measurement (blue) both using flat priors. If we use our colleague's measurement as a prior for our analysis, we arrive at the green curve. The most probable value of the green curve is in between the other two curves, but more shifted to the blue curve as our colleague's measurement has more data. Also, the distribution for the green curve is slightly narrower compared to the other two. Side note: The resulting posterior is again a binomial distribution. Moreover, we will arrive at the same posterior as if we would redo the analysis and assume only one measurement with N=N1+N2=400 and r=r1+r2=378. As you would expect it, the results are also independent of the order the two measurements were performed. This can be easily verified analytically.

您可以使用平坦先验值来查看我们的度量的后验分布(黑色)和同事的度量(蓝色)。如果我们将同事的测量结果作为分析的先验条件，则会得出绿色曲线。绿色曲线的最可能值在其他两条曲线之间，但是随着我们同事的测量结果具有更多数据，更多地转移到了蓝色曲线。此外，绿色曲线的分布比其他两条曲线略窄。旁注：产生的后验再次是二项分布。此外，我们将得出相同的后验，就好像我们要重做分析并假设只有一个测量值N = N1 + N2 = 400且r = r1 + r2 = 378一样 。如您所料，结果也与两次测量的执行顺序无关。可以很容易地进行分析验证。

如何呈现结果 (How to present your results)

After calculating the posterior, we now want to present our results. Ideally, you want to show the full posterior distribution, as this reflects the full information. However, this is not always possible and you may want to summarize it with a set of values. Often you want to give a point estimate along with an interval that summarizes the width of the distribution. There are different ways how to do this. Popular choices include:

在计算后验后，我们现在要展示我们的结果。理想情况下，您希望显示完整的后验分布，因为这反映了完整的信息。但是，这并非总是可能的，您可能需要用一组值对其进行总结。通常，您需要给出一个点估计值以及一个总结分布宽度的间隔。有不同的方法来执行此操作。受欢迎的选择包括：

Expectation value & standard deviation
期望值和标准偏差
Median & central interval
中位和中心间隔
Mode & smallest interval
模式和最小间隔

Additionally, we need to select how much probability should be included in the intervals (often used: 68% or 90%).

此外，我们需要选择在间隔中应包含多少概率(通常使用：68％或90％)。

For a normal distribution, all three choices of point estimate and confidence interval give identical results. However, in our case of a skewed distribution this is not the case.

对于正态分布，点估计和置信区间的所有三个选择都给出相同的结果。但是，在我们的分布偏斜的情况下，情况并非如此。

You can see that all three choices lead to different results. None of these is wrong or correct, it is just important to report exactly what point estimates you used and how you constructed your intervals. Here we could say for example that the most probable value (mode) of our posterior is 0.98 with a confidence interval of 0.962-0.991 (smallest interval including 68% of the probability density).

您会看到所有三个选择导致不同的结果。这些都不是错误或正确的，重要的是准确报告您使用的点估计以及间隔的构造方式。在这里我们可以说，例如，我们后验的最可能值(众数)为0.98，置信区间为0.962-0.991(最小区间，包括68％的概率密度)。

结论 (Conclusions)

We performed a full Bayesian analysis starting by setting up a probability model, choosing appropriate priors all the way to summarizing the posterior with a point estimate and a corresponding interval. The advantage of the Bayesian approach is that we gain access to the full posterior probability distribution. This enabled us to elegantly incorporate prior knowledge, as for example the manufacturer's information, or a previous measurement. Furthermore, we saw that the choice of a wrong prior may have a significant influence on our results, highlighting that a careful choice of the prior and an evaluation of its consistency with the probability model and the posterior is of high importance in any Bayesian analysis.

我们从建立概率模型开始，进行了完整的贝叶斯分析，从一开始就选择适当的先验以总结出后验点，并给出点估计和相应的间隔。贝叶斯方法的优点是我们可以访问全部后验概率分布。这使我们能够优雅地结合先前的知识，例如制造商的信息或先前的测量。此外，我们发现选择错误的先验可能会对我们的结果产生重大影响，强调在任何贝叶斯分析中，谨慎选择先验以及评估其与概率模型和后验的一致性都非常重要。

A python notebook producing the numbers and figures can be found here.

可以在此处找到生成数字和数字的python笔记本。