pymc3 贝叶斯线性回归

内部AI (Inside AI)

If you’ve steered clear of Bayesian regression because of its complexity, this article shows how to apply simple MCMC Bayesian Inference to linear data with outliers in Python, using linear regression and Gaussian random walk priors, testing assumptions on observation errors from Normal vs Student-T prior distributions and comparing against ordinary least squares.

如果您由于复杂性而避开了贝叶斯回归，那么本文将介绍如何使用线性回归和高斯随机游走先验，将简单的MCMC贝叶斯推断应用于带有异常值的Python线性数据，测试来自Normal或Student的观察误差假设-T先验分布，并与普通最小二乘法进行比较。

Ref: Jupyter Notebook and GitHub Repository

参考： Jupyter Notebook 和 GitHub存储库

The beauty of Bayesian methods is their ability to generate probability estimates of the quality of derived models based on a priori expectations of the behaviour of the model parameters. Rather than a single point-solution, we obtain a distribution of solutions each with an assigned probability. A priori expectations of the data behaviour are formulated into a set of Priors, probability distributions over the model parameters.

的T贝叶斯方法，他的美是他们产生衍生车型的基础上，模型参数的行为的先验预期质量的概率估计能力。而不是单点解，我们获得了具有指定概率的解的分布。数据行为的先验期望被公式化为一组Priors ，即模型参数上的概率分布。

The following scenario is based on Thomas Wiecki’s excellent article “GLM: Robust Linear Regression”¹ and is an exploration of the Python library PyMC3² as a means to model data using Markov chain Monte Carlo (MCMC) methods in Python, with some discussion of the predictive power of the derived Bayesian models. Different priors are considered, including Normal, Student-T, Gamma and Gaussian Random Walk, with a view to model the sample data both using generalized linear models (GLM) and by taking a non-parametric approach, estimating observed data directly from the imputed priors.

以下场景基于Thomas Wiecki的出色文章“ GLM：稳健的线性回归”¹，并且是对Python库PyMC3²的探索，该方法是使用Python中的马尔可夫链蒙特卡洛(MCMC)方法对数据进行建模的一种方法，并进行了一些讨论。贝叶斯模型的预测能力。考虑了不同的先验，包括法线，Student-T，伽马和高斯随机游走，以期使用广义线性模型(GLM)并通过采用非参数方法对样本数据建模，从而直接从推算中估算观察到的数据先验。

样本数据 (Sample Data)

Our sample data is generated as a simple linear model with given intercept and slope, with an additional random Gaussian noise term added to each observation. To make the task interesting, a small number of outliers are added to the distribution which have no algebraic relationship to the true linear model. Our objective is to model the observed linear model as closely as possible with the greatest possible robustness to disruption by the outliers and the observation error terms.

我们的样本数据是作为具有给定截距和斜率的简单线性模型生成的，每个观察值都添加了一个额外的随机高斯噪声项。为了使任务有趣，向分布中添加了少量离群值，这些离群值与真实线性模型没有代数关系。我们的目标是尽可能密切地建模观察到的线性模型，以最大程度地增强对异常值和观察误差项的干扰。

普通与学生T先验 (Normal vs Student-T priors)

We compare models based on Normal versus Student-T observation error priors. Wiecki shows that a Student-T prior is better at handling outliers than the Normal as the distribution has longer tails, conveying the notion that an outlier is not to be considered wholly unexpected.

我们比较基于正常与学生-T观察误差先验的模型。 Wiecki表示，由于分布的尾部较长，因此Student-T先验值在处理异常值方面要比正态值更好，这表达了一个观点，即异常值不应被视为完全出乎意料。

真线性模型和普通最小二乘(OLS)回归 (True linear model and Ordinary Least Squares (OLS) regression)

def add_outliers(y, outliers_x, outliers_y):y_observed = yfor i in range(len(outliers_x)):y_observed[outliers_x[i]] = outliers_y[i]y_predicted = np.append(y_observed, [0] * observables)y_predicted = np.ma.MaskedArray(y_predicted,np.arange(y_predicted.size)>=observables)return y_observed, y_predictedx_observed = np.linspace(0, 1.5, observables)
x_delta = np.linspace(1.5, 3.0, observables)
x_predicted = np.append(x_observed, x_delta)predicted_line = true_intercept + true_slope * x_predictedy = true_intercept + true_slope * x_observed + np.random.default_rng(seed=123).normal(scale=0.5, size=observables) # Add noisey_observed, y_predicted = add_outliers(y, outliers_x, outliers_y)

def plot_sample_data(ax, y_min, y_max):ax.set_xlim(-0.04, 3.04)ax.set_ylim(y_min, y_max)plt.plot(x_predicted, y_predicted, '+', label='Sampled data', c=pal_2[5])def plot_true_regression_line(title, loc):plt.plot(x_predicted, predicted_line, lw=2., label = 'True regression', c='orange')plt.axvline(x=x_predicted[observables],linestyle='--', c='b',alpha=0.25)plt.title(title)plt.legend(loc=loc)data = dict(x=x_observed, y=y_observed)
fig = plt.figure(figsize=(fig_size_x, fig_size_y))
ax1 = fig.add_subplot(121)
plot_sample_data(ax1, y_min_compact, y_max_compact)
seaborn.regplot(x="x", y="y", data=data, marker="+", color="b", scatter=False, label="RMSE regression")
plot_true_regression_line('Simulated sample data - OLS Regression', 'best')ax2 = fig.add_subplot(122)
plot_sample_data(ax2, y_min_compact, y_max_compact)
seaborn.regplot(x="x", y="y", data=data, marker="+", color="g", robust=True, scatter=False, label="Robust RMSE regression")
plot_true_regression_line('Simulated sample data - Robust OLS Regression', 'best')

We start with a simulated dataset of observations of the form y=ax+b+epsilon, with the noise term epsilon drawn from a normal distribution. Here we 1) create observation values in the range x from 0 to 1.5, based on the ‘true’ linear model, 2) add test outliers and 3) extend the x axis from 1.5 to 3.0 so we can test the predictive power of our models into this region.

我们从y = ax + b + epsilon形式的观测数据的模拟数据集开始，噪声项epsilon从正态分布中得出。在这里，我们1)基于“真实”线性模型创建介于0到1.5范围内的x的观测值，2)添加测试异常值，3)将x轴从1.5扩展到3.0以测试我们的预测能力建模到该区域。

In the predictive region we have used numpy.MaskedArray() function in order to indicate to PyMC3 that we have unknown observational data in this region¹⁵.

在预测区域中，我们使用了numpy.MaskedArray()函数来向PyMC3指示在该区域中我们未知的观测数据。

It is useful to visualize the simulated data, along with the true regression line and ordinary least squares regression.

可视化模拟数据以及真实的回归线和普通的最小二乘回归非常有用。

The Seaborn library allows us to draw this plot directly⁴, and we can also set a ‘robust’ flag which calculates the regression line while de-weighting outliers.

Seaborn库允许我们直接绘制此图⁴，我们还可以设置一个“ robust ”标志，该标志在对异常值进行加权时计算回归线。

Plot of Sample Points, true regression line and Ordinary Least Squares regression (OLS) and Robust OLS.

PyMC3的贝叶斯回归 (Bayesian Regression with PyMC3)

Following the example of Wiecki, we can create linear regression models (GLM) in PyMC3, generating the linear model from y(x)= ‘y ~ x’.

˚Following Wiecki的例子中，我们可以创建线性回归模型(GLM)在PyMC3，生成从y中(x)的线性模型= 'Y〜X'。

To test the notion of robust regression, we create two models, one based on a Normal prior of observational errors and a second based on the Student-T distribution, which we expect to be less influenced by outliers.

为了测试鲁棒回归的概念，我们创建了两个模型，一个模型基于观测误差的正态先验，另一个模型基于Student-T分布，我们希望它们不受异常值的影响较小。

data = dict(x=x_observed, y=y_observed)
with pm.Model() as n_model:pm.glm.GLM.from_formula('y ~ x', data)trace_normal_glm = pm.sample(samples_glm, tune=tune_glm, random_seed=123, cores=2, chains=2)with pm.Model() as t_model:pm.glm.GLM.from_formula('y ~ x', data, family = pm.glm.families.StudentT())trace_studentT_glm = pm.sample(samples_glm, tune=tune_glm, random_seed=123, cores=2, chains=2)

fig = plt.figure(figsize=(fig_size_x, fig_size_y))
ax1 = fig.add_subplot(121)
plot_sample_data(ax1, y_min_compact, y_max_compact)
pm.plot_posterior_predictive_glm(trace_normal_glm, eval=x_predicted, samples=50, label='Posterior predictive regression', c=pal_1[4])
plot_true_regression_line('Regression Fit assuming Normal error prior', 'best')ax2 = fig.add_subplot(122)
plot_sample_data(ax2, y_min_compact, y_max_compact)
pm.plot_posterior_predictive_glm(trace_studentT_glm, eval=x_predicted, samples=50, label='Posterior predictive regression', c=pal_1[4])
plot_true_regression_line('Regression Fit assuming Student-T error prior', 'best')

The pymc3.sample() method allows us to sample conditioned priors. In the case of the Normal model, the default priors will be for intercept, slope and standard deviation in epsilon. In the case of the Student-T model priors will be for intercept, slope and lam⁷.

pymc3.sample()方法允许我们采样条件先验。在普通模型的情况下，默认优先级将用于ε，ε的截距，斜率和标准偏差。对于Student-T模型，先验将用于截距，斜率和lam。

Running these two models creates trace objects containing a family of posteriors sampled for each model over all chains, conditioned on the observations, from which we we can plot individual glm regression lines.

运行这两个模型将创建一个跟踪对象，该跟踪对象包含所有链上每个模型采样的后验对象，并以观测为条件，从中我们可以绘制单个glm回归线。

Plot of Sample Points, true regression line and Bayesian Linear Model, with Normal and Student-T priors

正态误差与Student-T误差分布的比较 (Comparison of Normal vs Student-T error distributions)

A summary of the parameter distributions can be obtained from pymc3.summary(trace) and we see here that the intercept from the Student-T prior is 1.155 with slope 2.879. Hence much closer to the expected values of 1, 3 than those from the Normal prior, which gives intercept 2.620 with slope 1.608.

可以从pymc3.summary(trace)获得参数分布的摘要，我们在这里看到来自Student-T先验的截距为1.155 ，斜率为2.879。 因此，与正常先验的期望值相比，更接近于1、3的期望值，即截距2.620的斜率为1.608 。

完善Student-T错误分布 (Refinement of Student-T error distribution)

It is clear from the plots above that we have improved the regression by using a Student-T prior, but we we hope to do better by refining the Student-T degrees of freedom parameter.

这是从上面我们已经改善了现有使用Student-T的回归情节清楚，但我们希望大家通过细化学生-T自由度参数做的更好。

def create_normal_glm_model(samples, tune, x, y):with pm.Model() as model_normal:a_0 = pm.Normal('a_0', mu=1, sigma=10)b_0 = pm.Normal('b_0', mu=1, sigma=10)mu_0 = a_0 * x + b_0sigma_0 = pm.HalfCauchy('sigma_0', beta=10)y_0 = pm.Normal('y_0', mu=mu_0, sigma=sigma_0, observed=y)trace_normal = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace_normal, pm.sample_posterior_predictive(trace_normal, random_seed=123)['y_0']def create_studentT_glm_model(samples, tune, x, y):with pm.Model() as model_studentT:a_0 = pm.Normal('a_0', mu=1, sigma=10)b_0 = pm.Normal('b_0', mu=1, sigma=10)mu_0 = a_0 * x + b_0nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace_studentT = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace_studentT, pm.sample_posterior_predictive(trace_studentT, random_seed=123)['y_0']trace_normal_glm_explicit, y_posterior_normal_glm = create_normal_glm_model(samples_glm, tune_glm, x_observed, y_observed)
trace_studentT_glm_explicit, y_posterior_studentT_glm = create_studentT_glm_model(samples_glm, tune_glm, x_observed, y_observed)

Models: The distribution for the Normal model’s standard deviation prior is a HalfCauchy, a positive domain Student-T with one degree of freedom¹⁴, which mirrors our previous model.

模型：常规模型的标准偏差之前的分布是HalfCauchy，这是一个具有一个自由度的正Student-T域，与我们的先前模型类似。

We also need to construct a new Student-T model where as well as parameterizing the glm we also create a prior on nu, the degrees of freedom parameter.

我们还需要构造一个新的Student-T模型，在参数化glm的同时，还要在nu (自由度参数)上创建先验。

nu is modelled as a Gamma function, as recommended by Andrew Gelman⁸ nu ~ gamma(alpha=2, beta=0.1), and retaining the lam value of 8.368 from the run above.

nu被建模为Gamma函数，正如AndrewGelman⁸nu〜gamma(alpha = 2，beta = 0.1)所建议的那样，并且从上述运行中保留了lam值为8.368。

Results: From summary(trace) we can see that the intercept from the Student-T error distribution is improved to 1.144, with slope 2.906 against the previous result of 1.155, with slope 2.879.

结果：从摘要(痕量)，我们可以看到，从学生-T误差分布截距提高到1.144，用斜率2.906针对1.155先前的结果，具有斜率2.879。

def calculate_regression_line(trace):a0 = trace['a_0']b0 = trace['b_0']y_pred = np.array([])for x_i in x_predicted:y_quartiles = np.percentile(a0 * x_i + b0, quartiles)y_pred = np.append(y_pred, y_quartiles)return np.transpose(y_pred.reshape(len(x_predicted), no_quartiles))

def plot_intervals(ax, y_pred, title, pal, loc, y_min, y_max):plt.fill_between(x_predicted,y_pred[0,:],y_pred[4,:],alpha=0.5, color=pal[1])plt.fill_between(x_predicted,y_pred[1,:],y_pred[3,:],alpha=0.5, color=pal[2])plot_intervals_line(ax, y_pred, title, pal, loc, 1, y_min, y_max)def plot_intervals_line(ax, y_pred, title, pal, loc, lw, y_min, y_max):plot_sample_data(ax, y_min, y_max)plt.plot(x_predicted, y_pred[0,:], label='CI 95%', lw=lw, c=pal[1])plt.plot(x_predicted, y_pred[4,:], lw=lw, c=pal[1])plt.plot(x_predicted, y_pred[1,:], label='CI 50%', lw=lw, c=pal[2])plt.plot(x_predicted, y_pred[3,:], lw=lw, c=pal[2])plot_true_regression_line(title, loc)def plot_regression(y_pred_normal, y_pred_student, loc, y_min, y_max):fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plot_intervals(ax1, y_pred_normal, 'GLM Regression Fit - Normal error prior', pal_1, loc, y_min, y_max)plt.plot(x_predicted, y_pred_normal[2,:], c=pal_1[5],label='Posterior regression')ax2 = fig.add_subplot(122)plot_intervals(ax2, y_pred_student, 'GLM Regression Fit - Student-T error prior', pal_1, loc, y_min, y_max)plt.plot(x_predicted, y_pred_student[2,:], c=pal_1[5], label='Posterior regression')y_pred_normal_glm = calculate_regression_line(trace_normal_glm_explicit)
y_pred_studentT_glm = calculate_regression_line(trace_studentT_glm_explicit)plot_regression(y_pred_normal_glm, y_pred_studentT_glm, 4, y_min_compact, y_max_compact)

Visualizing credibility intervals: Having fitted the linear models and obtained intercept and slope for every sample we can calculate expected y(x) at every point on the x axis and then use numpy.percentile() to aggregate these samples into credibility intervals (CI)⁹ on x.

可视化可信区间：拟合线性模型并获得每个样本的截距和斜率，我们可以计算x轴上每个点的预期y(x) ，然后使用numpy.percentile()将这些样本汇总为可信区间(CI) ⁹在x上。

We can visualize the mean, 50% and 95% credibility intervals of the sampled conditioned priors as all values for all Markov chains are held in the trace object. It is clear below how close the credibility intervals lie to the true regression for the Student-T model.

由于所有马尔可夫链的所有值都保存在跟踪对象中，因此我们可以可视化采样条件条件先验的均值，50％和95％可信区间。从下面可以清楚地看出，可信区间与Student-T模型的真实回归有多接近。

Later in this article we consider a non-parametric model which follows the linear trend line while modelling fluctuations in observations, which can convey local information which we might wish to preserve.

在本文的稍后部分，我们将考虑一个遵循线性趋势线的非参数模型，同时对观测值的波动建模，可以传达我们可能希望保留的局部信息。

Plot of Credibility Intervals for Bayesian Linear Model, with Normal and Student-T priors

后验预测力 (Posterior predictive power)

Bayesian inference works by seeking modifications to the parameterized prior probability distributions in order to maximise a likelihood function of the observed data over the prior parameters.

乙 ayesian推理作品通过寻求为了最大化比现有参数观测数据的似然函数的修改的参数先验概率分布。

So what happens to the expected posterior in regions where we have missing sample data? We should find that the posterior predictions follow the regression line, taking into account fluctuations due to the error term.

那么，在缺少样本数据的区域中，预期后验会如何处理？考虑到误差项引起的波动，我们应该发现后验预测遵循回归线。

建模未观察到的数据 (Modelling unobserved data)

We have created a y_predicted array of observations which in the region 1.5 to 3 is masked, indicating that these are unobserved samples. PyMC3 adds these missing observations as new priors, y_missing, and computes a posterior distribution over the full sample space.

我们创建了一个y_predicted观测值数组，该数组在1.5到3的范围内被遮罩，表明这些是未观察到的样本。 PyMC3将这些缺失的观测值添加为新的先验y_missing ，并计算整个样本空间的后验分布。

trace_normal_glm_pred, y_posterior_normal_glm_pred = create_normal_glm_model(samples_glm, tune_glm, x_predicted, y_predicted)
trace_studentT_glm_pred, y_posterior_studentT_glm_pred = create_studentT_glm_model(samples_glm, tune_glm, x_predicted, y_predicted)pm.summary(trace_studentT_glm_pred)

Re-running the above models with this extended sample space yields statistics which now include the imputed, observations.

在扩展的样本空间下重新运行上述模型将产生统计信息，其中包括估算的观测值。

def plot_actuals(y_posterior_normal, y_posterior_student, loc, y_min, y_max):y_pred_normal = np.percentile(y_posterior_normal, quartiles, axis=0)y_pred_studentT = np.percentile(y_posterior_student, quartiles, axis=0)fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plt.plot(x_predicted, y_pred_normal[2,:],alpha=0.75, lw=3, color=pal_2[5],label='Posterior mean')plot_intervals(ax1, y_pred_normal, 'Predictive Samples - Normal error prior', pal_2, loc, y_min, y_max)ax2 = fig.add_subplot(122)plt.plot(x_predicted, y_pred_studentT[2,:],alpha=0.75, lw=3, color=pal_2[5],label='Posterior mean')plot_intervals(ax2, y_pred_studentT, 'Predictive Samples - Student-T error prior', pal_2, loc, y_min, y_max)return figfig = plot_actuals(y_posterior_normal_glm_pred, y_posterior_studentT_glm_pred, 4, y_min_compact, y_max_compact)
fig.suptitle('Predicting from Linear Model', fontsize=16)

When we run this model we see in the summary not only the priors for a0, b0, nu0 but also for all the missing y, as y_0_missing.

当我们运行该模型时，我们在摘要中不仅看到a0，b0，nu0的先验，而且还看到所有缺失的y，如y_0_missing。

Having inferred the full range of predicted sample observations we can plot these posterior means, along with credibility intervals.

推断出预测样本观察的全部范围后，我们可以绘制这些后验均值以及可信区间。

We see below that the inferred observations have been well fitted to the expected linear model.

我们在下面看到，推断的观测值已经很好地拟合了预期的线性模型。

Plot of Predicitive samples for Bayesian Linear Model, with Normal and Student-T priors

非参数推理模型 (Non-parametric Inference Models)

Having verified the linear model, it would be nice to see if we can model these sample observations with a non-parametric model, where we dispense with finding intercept and slope parameters and instead try to model the observed data directly with priors on every datapoint across the sample space¹¹.

ħAVING验证线性模型，这将是很好看，如果我们可以用一个非参数模型，其中我们用查找截距和斜率参数分配这些样本观测模型，而是尝试将观察到的数据直接与每一个数据点先验模型整个样品空间¹¹。

非平稳数据和先验随机走动¹² (Non-Stationary data and Random Walk prior¹²)

The simple assumption with a non-parametric model is that the data is stationary, so the observed samples at each datapoint do not trend in a systematic way.

使用非参数模型的简单假设是数据是固定的，因此在每个数据点观察到的样本不会以系统的方式趋向。

def create_normal_stationary_inference_model(samples, tune, y):with pm.Model() as model_normal:sigma_0 = pm.HalfCauchy('sigma_0', beta=10)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)mu_0 = pm.Normal('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.Normal('y_0', mu=mu_0, sd=sigma_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']def create_studentT_stationary_inference_model(samples, tune, y):with pm.Model() as model_studentT:nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)mu_0 = pm.Normal('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace =  pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']

But real-world data does very often trend, or is at least constrained by requirements of smoothness to adjacent data points and then the choice of priors is crucial¹³.

但是，现实世界中的数据确实经常会趋势化，或者至少受到对相邻数据点的平滑度要求的限制，然后选择先验至关重要。

In our scenario the data is non-stationary, since it has a clear trend over x. If we use a simple gaussian prior for our observations, then even after conditioning with the observed data, as soon as we enter the unobserved region, the predicted observations rapidly return to their prior mean.

在我们的场景中，数据是不稳定的，因为在x上有明显的趋势。如果我们使用简单的高斯先验进行观察，那么即使在对观察到的数据进行条件调整之后，只要我们进入未观察到的区域，预测的观察就会Swift返回到其先前的均值。

Plot of Predicitive samples for non-parametric model with stationary prior on y, for Normal and Student-T priors

高斯随机游走先验 (Gaussian Random Walk prior)

Gaussian random walk prior gives us the ability to model non-stationary data because it estimates an observable based on the previous data point state plus a random walk step. The PyMC3 implementation also allows us to model a drift parameter which adds a fixed scalar to each random walk step.

摹 aussian随机游走之前让我们来模拟非平稳数据的能力，因为它的估计基于先前的数据点的状态加上一个随机游走一步可观察到的。 PyMC3的实现还允许我们对漂移参数建模，该参数为每个随机步阶添加了一个固定的标量。

def create_normal_grw_inference_model(samples, tune, y, drift):with pm.Model() as model_normal:sigma_0 = pm.HalfCauchy('sigma_0', beta=10)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)if drift:mu_drift = pm.Normal('mu_drift', mu=1, sigma=2)mu_0 = pm.GaussianRandomWalk('mu_0', mu=mu_drift, sd=sigma_1, shape=y.size)else:mu_0 = pm.GaussianRandomWalk('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.Normal('y_0', mu=mu_0, sd=sigma_0, observed=y)trace = pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']def create_studentT_grw_inference_model(samples, tune, y, drift):with pm.Model() as model_studentT:nu_0 = pm.Gamma('nu_0', alpha=2, beta=0.1)sigma_1 = pm.HalfCauchy('sigma_1', beta=10)if drift:mu_drift = pm.Normal('mu_drift', mu=1, sigma=2)mu_0 = pm.GaussianRandomWalk('mu_0', mu=mu_drift, sd=sigma_1, shape=y.size)else:mu_0 = pm.GaussianRandomWalk('mu_0', mu=0, sd=sigma_1, shape=y.size)y_0 = pm.StudentT('y_0', mu=mu_0, lam=8.368, nu=nu_0, observed=y)trace =  pm.sample(samples, tune=tune, random_seed=123, cores=2, chains=2)return trace, pm.sample_posterior_predictive(trace, random_seed=123)['y_0']

In these two inference models we have used a single scalar drift parameter which in this scenario is equivalent to the slope on the data in the direction of the drift.

在这两个推理模型中，我们使用了单个标量漂移参数，在这种情况下，该参数等于数据在漂移方向上的斜率。

It is important that data, including outliers, is correctly ordered with equal sample step size, as the prior will assume this behaviour in modelling the data. For this reason outliers are inserted into x array at index points, not appended as (x, y).

重要的是，必须以相等的样本步长大小正确排序数据(包括异常值)，因为先验者将在建模数据时采用这种行为。因此，离群值在索引点处插入 x数组中，而不附加为(x，y)。

Below we run the two models, first with the drift parameter set to zero, leading to a static line at the last observation point, and second with the drift inferred from the data.

下面我们运行两个模型，第一个模型的漂移参数设置为零，在最后一个观察点指向一条静态线，第二个模型从数据推断出的漂移。

As can be seen, this final model gives a very nice fit to the linear trend, particularly when using the Student-T prior which is less influenced by the outliers.

可以看出，该最终模型非常适合线性趋势，尤其是在使用Student-T优先级时，该值不受异常值的影响较小。

Plot of posterior means for non-parametric model with random walk without drift prior on y, for Normal and Student-T priors

Plot of posterior means for non-parametric model with random walk with drift prior on y, for Normal and Student-T priors

绘制高斯随机游走的后验样本 (Plot posterior samples from Gaussian random walk)

def plot_samples(y_posterior, samples):show_label = Truefor rand_loc in np.random.randint(0, len(y_posterior), samples):rand_sample = y_posterior[rand_loc]if show_label:plt.plot(x_predicted, rand_sample, alpha=0.05, c=pal_1[5], label='Posterior samples')show_label = Falseelse:plt.plot(x_predicted, rand_sample, alpha=0.05, c=pal_1[5])def plot_posterior_predictive(y_posterior_normal, y_posterior_student, loc, samples, y_min, y_max):y_pred_normal = np.percentile(y_posterior_normal, quartiles, axis=0)y_pred_studentT = np.percentile(y_posterior_student, quartiles, axis=0)fig = plt.figure(figsize=(fig_size_x, fig_size_y))ax1 = fig.add_subplot(121)plot_samples(y_posterior_normal, samples)plt.plot(x_predicted, y_pred_normal[2,:], alpha=0.75, lw=3, color=pal_2[5], label='Posterior mean')plot_intervals_line(ax1, y_pred_normal, 'Random Walk Prior: Predictive Samples - Normal error prior', pal_2, loc, 2, y_min, y_max)ax2 = fig.add_subplot(122)plot_samples(y_posterior_student, samples)plt.plot(x_predicted, y_pred_studentT[2,:], alpha=0.75, lw=3, color=pal_2[5], label='Posterior mean')plot_intervals_line(ax2, y_pred_studentT, 'Random Walk Prior: Predictive Samples - Student-T error prior', pal_2, loc, 2, y_min, y_max)plot_posterior_predictive(y_posterior_normal, y_posterior_studentT, 4, 250, y_min_spread, y_max_compact)pm.traceplot(trace_studentT)

Also, we now have the benefit that we can see the effect of the outliers in local perturbations to linearity.

而且，我们现在有一个好处，就是可以看到离群值在局部扰动对线性的影响。

Below we visualize some of the actual random walk samples.

下面我们可视化一些实际的随机游动样本。

We should not expect any particular path to be close to the actual predicted path of our sample data. In this sense it is a truly stochastic method as our result is derived from the statistical mean of the sampled paths, not from any particular sample’s path.

我们不应期望任何特定路径接近我们样本数据的实际预测路径。从这个意义上说，这是一种真正的随机方法，因为我们的结果是从采样路径的统计平均值中得出的，而不是从任何特定样本的路径中得出的。

Plot of Predictive samples for non-parametric model with random walk with drift prior on y, for Normal and Student-T priors

离群值对随机游走预测后验的影响 (Impact of Outliers on Random Walk predictive posteriors)

# Remove the outliers
y = true_intercept + true_slope * x_observed + np.random.default_rng(seed=123).normal(scale=0.5, size=observables) # Add noisey_observed, y_predicted = add_outliers(y, [], [])trace_normal, y_posterior_normal_linear = create_normal_grw_inference_model(samples_predictive, tune_predictive, y_predicted, True) # With Drift
trace_studentT, y_posterior_studentT_linear = create_studentT_grw_inference_model(samples_predictive, tune_predictive, y_predicted, True) # With Drift

Having seen the effect that the outliers have on the posterior mean, we surmise that the noisiness of the predictive samples might be due to perturbation of the random walk caused by the outliers.

看到离群值对后均值的影响后，我们推测预测样本的噪声可能是由于离群值引起的随机游走扰动。

To test this, we have here removed the outliers and rerun the same two predictive models.

为了测试这一点，我们在这里删除了异常值，然后重新运行相同的两个预测模型。

Plot of Predictive posterior means for no-outlier data, with random walk with drift prior on y, for Normal and Student-T priors

Plot of Predictive samples for no-outlier data, with random walk with drift prior on y, for Normal and Student-T priors

As can be seen, the predictive samples are much cleaner, with very tight credibility intervals and much smaller divergence in the random walk samples. This gives us confidence that with a correct model the Gaussian random walk is a well conditioned prior for non-stationary model fitting.

可以看出，预测样本更干净，可信区间非常紧密，随机游走样本的差异小得多。这使我们相信，使用正确的模型，高斯随机游走是非平稳模型拟合之前的良好条件。

结论 (Conclusion)

The intention of this article was to show the simplicity and power of Bayesian learning with PyMC3 and hopefully we have:

本文的目的是展示使用PyMC3进行贝叶斯学习的简单性和功能，希望我们有：

shown that the Student-T error prior performs better that the Normal prior in robust linear regression;
表明在稳健线性回归中，Student-T误差先验比正常先验表现更好；
demonstrated that the refined degrees of freedom on the Student-T error prior improves the fit to the regression line;
证明了对Student-T错误的改进后的自由度提高了对回归线的拟合度；
shown we can quickly and easily draw credibility intervals on the posterior samples for both the linear regression and the non-parametric models;
表明我们可以在线性样本和非参数模型上快速轻松地在后验样本上绘制可信区间；
demonstrated that the non-parametric model using Gaussian random walk with drift can predict the linear model in unobserved regions and at the same time allow model local fluctuations in the observed data which we might want to preserve;
证明了使用带有漂移的高斯随机游走的非参数模型可以预测未观测区域的线性模型，同时允许我们可能希望保留的观测数据中的模型局部波动；
investigated the Gaussian random walk prior and shown that its chaotic behaviour allows it to explore a wide sample space while not preventing it generating smooth predictive posterior means.
对高斯随机游走进行了先验研究，结果表明其混沌行为使其能够探索广阔的样本空间，同时又不会阻止其生成平滑的预测后验均值。

Thanks for reading, please feel free to add comments or corrections. Full source code can be found in the Jupyter Notebook and GitHub Repository.

感谢您的阅读，请随时添加评论或更正。完整的源代码可以在Jupyter Notebook和GitHub Repository中找到。

翻译自: https://towardsdatascience.com/the-power-of-bayesian-inference-estimated-using-pymc3-6357e3af7f1f

pymc3 贝叶斯线性回归

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/391028.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

Hadoop Streaming详解

一： Hadoop Streaming详解 1、Streaming的作用 Hadoop Streaming框架，最大的好处是，让任何语言编写的map, reduce程序能够在hadoop集群上运行；map/reduce程序只要遵循从标准输入stdin读，写出到标准输出stdout即可其次…

mongodb分布式集群搭建手记

一、架构简介目标单机搭建mongodb分布式集群(副本集分片集群)，演示mongodb分布式集群的安装部署、简单操作。说明在同一个vm启动由两个分片组成的分布式集群，每个分片都是一个PSS(Primary-Secondary-Secondary)模式的数据副本集； Confi…

归约归约冲突_JavaScript映射，归约和过滤-带有代码示例的JS数组函数

归约归约冲突Map, reduce, and filter are all array methods in JavaScript. Each one will iterate over an array and perform a transformation or computation. Each will return a new array based on the result of the function. In this article, you will learn why …

为什么Java里面的静态方法不能是抽象的

为什么Java里面的静态方法不能是抽象的？ 问题是为什么Java里面不能定义一个抽象的静态方法？例如： abstract class foo {abstract void bar( ); // <-- this is okabstract static void bar2(); //<-- this isnt why? }回答一因为抽…

python16_day37【爬虫2】

一、异步非阻塞 1.自定义异步非阻塞 1 import socket2 import select3 4 class Request(object):5 def __init__(self,sock,func,url):6 self.sock sock7 self.func func8 self.url url9 10 def fileno(self): 11 return self.soc…

朴素贝叶斯实现分类_关于朴素贝叶斯分类及其实现的简短教程

朴素贝叶斯实现分类Naive Bayes classification is one of the most simple and popular algorithms in data mining or machine learning (Listed in the top 10 popular algorithms by CRC Press Reference [1]). The basic idea of the Naive Bayes classification is very …

python:改良廖雪峰的使用元类自定义ORM

概要本文仅仅是对廖雪峰老师的使用元类自定义ORM进行改进，并不是要创建一个ORM框架编写fieldclass Field(object):def __init__(self, column_type,max_length,**kwargs):1，删除了参数name，field参数全部为定义字段类型相关参数，…

2019年度年中回顾总结_我的2019年回顾和我的2020年目标（包括数量和收入）

2019年度年中回顾总结In this post were going to take a look at how 2019 was for me (mostly professionally) and were also going to set some goals for 2020! 🤩 在这篇文章中，我们将了解2019年对我来说(主要是职业)如何，我们还将为20…

在Java里重写equals和hashCode要注意什么问题

问题：在Java里重写equals和hashCode要注意什么问题重写equals和hashCode有哪些问题或者陷阱需要注意？ 回答一理论（对于语言律师或比较倾向于数学的人）： equals() (javadoc) 必须定义为一个相等关系（它…

vray阴天室内_阴天有话：第1部分

vray阴天室内When working with text data and NLP projects, word-frequency is often a useful feature to identify and look into. However, creating good visuals is often difficult because you don’t have a lot of options outside of bar charts. Lets face it; ba…