高斯过程分类和高斯过程回归_高斯过程回归建模入门

高斯过程分类和高斯过程回归

Gaussian processing (GP) is quite a useful technique that enables a non-parametric Bayesian approach to modeling. It has wide applicability in areas such as regression, classification, optimization, etc. The goal of this article is to introduce the theoretical aspects of GP and provide a simple example in regression problems.

高斯处理(GP)是一项非常有用的技术,可以采用非参数贝叶斯方法进行建模。 它在回归,分类,优化等领域具有广泛的适用性。本文的目的是介绍GP的理论方面,并为回归问题提供一个简单的示例。

Multivariate Gaussian distribution

多元高斯分布

We first need to do a refresher on multivariate Gaussian distribution, which is what GP is based on. A multivariate Gaussian distribution can be fully defined by its mean vector and covariance matrix

我们首先需要对多元高斯分布进行复习,这是GP所基于的。 多元高斯分布可以通过其均值向量和协方差矩阵完全定义

There are two important properties of Gaussian distributions that make later GP calculations possible: marginalization and conditioning.

高斯分布的两个重要属性使后来的GP计算成为可能:边际化和条件化。

Marginalization

边际化

With a joint Gaussian distribution, this can be written as,

使用联合高斯分布,可以写成:

Image for post

We can retrieve a subset of the multivariate distribution via marginalization. For example, we can marginalize out the random variable Y, with the resulting X random variable expressed as follows,

我们可以通过边际化获取多元分布的子集。 例如,我们可以将随机变量Y边缘化,结果X随机变量表示如下,

Image for post

Note that the marginalized distribution is also a Gaussian distribution.

注意,边缘分布也是高斯分布。

Conditioning

调理

Another important operation is conditioning, which describes the probability of a random variable given the presence of another random variable. This operation enables Bayesian inference, as we will show later, in deriving the predictions given the observed data.

另一个重要的操作是调节,它描述了在存在另一个随机变量的情况下一个随机变量的概率。 正如我们将在后面展示的那样,此操作启用贝叶斯推理,从而在给定观测数据的情况下得出预测。

With conditioning, you can derive for example,

通过调节,您可以得出例如

Image for post

Like the marginalization, the conditioned distribution is also a Gaussian distribution. This allows the results to be expressed in closed form and is tractable.

像边缘化一样,条件分布也是高斯分布。 这允许结果以封闭形式表示并且易于处理。

Gaussian process

高斯过程

We can draw parallels between a multivariate Gaussian distribution and a Gaussian process. A Gaussian process (GP) is fully defined by its mean function and covariance function (aka kernel),

我们可以在多元高斯分布和高斯过程之间得出相似之处。 高斯过程(GP)由其均值函数和协方差函数(即内核)完全定义,

Image for post

GP can be thought of as an infinite dimensional multivariate Gaussian. This is actually what we mean by GP as being non-parametric — because there are an infinite number of parameters. The mean function, m(x), describes the mean of any given data point x, and the kernel, k(x,x’), describes the relationship between any given two data points x1 and x2.

GP可以被视为无穷维多元高斯。 实际上,这就是我们所说的GP是非参数的-因为有无限数量的参数。 平均值函数m(x)描述任何给定数据点x的平均值,而内核k(x,x')描述任何给定的两个数据点x1x2之间的关系

As such, GP describes a distribution over possible functions. So when you sample from a GP, you get a single function. In contrast, when you sample from a Gaussian distribution, you get a single data point.

因此,GP描述了可能功能的分布。 因此,当您从GP采样时,您将获得一个功能。 相反,当从高斯分布中采样时,您将获得一个数据点。

Gaussian process regression

高斯过程回归

We can bring together the above concepts about marginalization and conditioning and GP to regression. In a traditional regression model, we infer a single function, Y=f(X). In Gaussian process regression (GPR), we place a Gaussian process over f(X). When we don’t have any training data and only define the kernel, we are effectively defining a prior distribution of f(X). We will use the notation f for f(X) below. Usually we assume a mean of zero, so all together this means,

我们可以将关于边际化和条件以及GP回归的上述概念组合在一起。 在传统的回归模型中,我们推导单个函数Y = f( X ) 。 在高斯过程回归(GPR)中,我们将高斯过程放在f( X )上。 当我们没有任何训练数据而仅定义内核时,我们实际上是定义f( X )先验分布 我们将在下面的f ( X )中使用符号f 。 通常情况下,我们假设均值为零,所以所有这些都意味着,

Image for post

The kernel K chosen (e.g. periodic, linear, radial basis function) describes the general shapes of the functions. The same way when you choose a first-order vs second-order equation, you’d expect different function shapes of e.g. a linear function vs a parabolic function.

所选的核K (例如,周期,线性,径向基函数)描述了函数的一般形状。 选择一阶方程与二阶方程时,采用相同的方式,您会期望不同的函数形状,例如线性函数与抛物线函数。

When we have observed data (e.g. training data, X) and data points where we want to estimate (e.g. test data, X*), we again place a Gaussian prior over f (for f(X)) and f* (for f(X*)), yielding a joint distribution,

当我们观察到数据(例如训练数据X )和要估计的数据点(例如测试数据X * )时,我们再次将高斯优先于f (对于f( X ) )和f * (对于f ( X * ) ),产生联合分布,

Image for post

The objective here is we want to know what is f* for some set of x values (X*) given we have observed data (X and its corresponding f). This is effectively conditioning, and in other words it is asking to derive the posterior probability of the function values, p(f*|f,X,X*). This is also how we can make predictions — to calculate the posterior conditioned on the observed data and test data points.

这里的目标是我们想知道给定的一组x值( X * )的f *是多少 我们已经观察到了数据( X及其对应的f )。 这是有效的条件,换句话说,它要求导出函数值p ( f * | fXX * )的后验概率。 这也是我们进行预测的方式-计算以观察到的数据和测试数据点为条件的后验。

Adding noise

增加噪音

The functions described above are noiseless, meaning we have perfect confidence in our observed data points. In the real world, this is not the case and we expect to have some noise in our observations. In the traditional regression models, this can be modeled as,

上述功能无噪音,这意味着我们对观察到的数据点具有完全的信心。 在现实世界中,情况并非如此,我们希望观察中会出现一些噪音。 在传统的回归模型中,可以将其建模为:

Image for post

where ε~N(0, σ² I). The ε is the noise term and follows a Gaussian distribution. In GPR, we place the Gaussian prior onto f(X) just like before, so f(X)~GP(0,K) and y(X)~GP(0, K+σ² I). With the observed data, the joint probability is very similar to before, except now with the added noise term to the observed data,

其中,ε〜N(0,σ²I)。 ε是噪声项,遵循高斯分布。 在GPR中,我们放置高斯先验到F(X)之前一样,因此f(X)〜GP(0,K)y(X)〜GP(0,K +σ²I)。 对于观察到的数据,联合概率与以前非常相似,只是现在在观察到的数据中添加了噪声项,

Image for post

Likewise, we can perform inference by calculating the posterior conditioned on f*, X, and X*.

同样,我们可以通过计算以f *XX *为条件的后验条件来进行推断。

GPR using scikit-learn

使用scikit-learn进行GPR

There are multiple packages available for Gaussian process modeling (some are more general Bayesian modeling packages): GPy, GPflow, GPyTorch, PyStan, PyMC3, tensorflow probability, and scikit-learn. For simplicity, we will illustrate here an example using the scikit-learn package on a sample dataset.

有多种可用于高斯过程建模的软件包(有些是更通用的贝叶斯建模软件包):GPy,GPflow,GPyTorch,PyStan,PyMC3,张量流概率和scikit-learn。 为了简单起见,我们将在示例数据集上使用scikit-learn包说明一个示例。

We will use the example Boston dataset from scikit-learn. First we will load and do a simple 80/20 split of the data into train and test sets.

我们将使用来自scikit-learn的示例波士顿数据集。 首先,我们将数据加载并进行简单的80/20拆分,将其分为训练集和测试集。

We will use the GaussianProcessRegressor package and define a kernel. Here we will try a radial-basis function kernel with noise and an offset. The hyperparameters for the kernel are suggested values and these will be optimized during fitting.

我们将使用GaussianProcessRegressor包并定义一个内核。 在这里,我们将尝试一个带有噪声和偏移量的径向基函数内核。 内核的超参数为建议值,这些参数将在拟合过程中进行优化。

You can view the fitted model with model.kernel_. We can now also plot and see our predicted versus actual,

您可以使用model.kernel_查看拟合的模型。 现在,我们还可以绘制并查看我们的预测值与实际值,

Image for post

Note that you can get similar performance with other machine learning models such as random forest regressor, etc. However, the key benefit from GPR is that for each given test data point, the predicted value naturally comes with confidence intervals. So not only do you know your model performance, but you know what is the uncertainty associated with each prediction.

请注意,您可以在其他机器学习模型(例如随机森林回归器等)上获得类似的性能。但是,GPR的主要好处是对于每个给定的测试数据点,预测值自然带有置信区间。 因此,您不仅知道模型的性能,而且知道与每个预测相关的不确定性是什么。

This is a high-level overview of GP and GPR. We won’t go into details of the kernels here. But by adopting different kernels, you can incorporate your prior assumptions about the data into your model. With the simple example with scikit-learn, we hope to provide some inspirations in seeing how GPR is useful and you can quickly get started to incorporate some form of Bayesian modeling as part of your machine learning toolbox!

这是GP和GPR的高级概述。 我们在这里不会详细介绍内核。 但是,通过采用不同的内核,您可以将有关数据的先前假设合并到模型中。 通过scikit-learn的简单示例,我们希望对GPR的有用性有所启发,并且您可以很快开始将某种形式的贝叶斯建模纳入机器学习工具箱!

Some other useful resources/posts about GP

有关GP的其他一些有用的资源/帖子

翻译自: https://towardsdatascience.com/getting-started-with-gaussian-process-regression-modeling-47e7982b534d

高斯过程分类和高斯过程回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/242133.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

假如购买的期房不小心烂尾了,那银行贷款是否可以不还了?

如今房价一路高升,再加上开发商融资难度越来越大,现在很多人都开始打期房的主意。期房不论是对开发商还是对购房者来说都是双赢的,开发商可以以较低的融资成本维持楼盘的开发,提高财务杠杆,而购房者可以较低的价格买房…

在银行存款5000万,能办理一张50万额度的信用卡吗?

拥有一张大额信用卡是很多人梦寐以求的事情,大额信用卡不仅实用,在关键时刻可以把钱拿出来刷卡或者取现,这是一种非常方便的融资方式。然而大额信用卡并不是说谁想申请就可以申请下来,正常情况下,10万以上额度以上的信…

hotelling变换_基于Hotelling-T²的偏最小二乘(PLS)中的变量选择

hotelling变换背景 (Background) One of the most common challenges encountered in the modeling of spectroscopic data is to select a subset of variables (i.e. wavelengths) out of a large number of variables associated with the response variable. It is common …

商业银行为什么大量组织高净值小规模活动?

在管理界有一个非常著名的定律叫做二八定律,所谓28定律就是20%的客户贡献了企业80%的利润。虽然这个定律在银行不一定适用,但同样的道理用于银行营销也是合适的。银行之所以经常组织一些高净值小规模的活动,因为这些客户的资产和价值比较高&a…

在县城投资买一辆出租车,一个月能收入多少钱?

在县城投资出租车能赚多少钱具体要看你是什么县城,比如西部的县城勉强能养活自己,中部的县城一个月能赚个5、6千,东部的小县城月赚个万元以上也有可能。具体回报率怎么样可以先算下投资一个出租车的成本投资一个出租车的构成成本比较多&#…

通过ISO镜像文件安装Ubuntu(可实现默认启动Windows的双系统)

解压文件 使用WinRAR等软件,Ubuntu ISO镜像文件中的casper文件夹解压到硬盘中的任意分区根目录,把ISO镜像也放在那个分区根目录。 使用Grub4dos启动Ubuntu 使用grub4dos启动Ubuntu,menu.lst写法如下。其中root命令指定了硬盘分区编号&#xf…

命名实体识别 实体抽取_您的公司为什么要关心命名实体的识别

命名实体识别 实体抽取Named entity recognition is the task of categorizing text into entities, such as people, locations, and dates. For example, for the sentence, On April 30, 1789, George Washington was inaugurated as the first president of the United Sta…

表达式测试

1111 (parameters) -> { statements; }//求平方 (int a) -> {return a * a;}//打印,无返回值 (int a) -> {System.out.println("a " a);}

有关西电的课程学分相关问题:必修课、选修课、补考、重修、学分

注:最近一年多以来学校的政策改动比较大,听说有选修一旦选了就必须通过,否则视为挂科需要重修的;还有的说是选修课学分够了再多选可能要收费(未经确认,可能只是误传);等各种说法。本…

银行现在都很缺钱吗,为什么给的利息比以前高了?

目前无论是大银行还是小银行,也不论是国有银行还是民营银行,基本上每个银行都上浮利率,如果不上浮利率,那就只能吃土了,当然加息一般主要针对定期存款以及贷款来说,活期存款利率一般是不会上浮,…

机器学习 异常值检测_异常值是否会破坏您的机器学习预测? 寻找最佳解决方案

机器学习 异常值检测内部AI (Inside AI) In the world of data, we all love Gaussian distribution (also known as a normal distribution). In real-life, seldom we have normal distribution data. It is skewed, missing data points or has outliers.在数据世界中&#…

1000万贷款三年,到期一次性偿还1500万,这个利息算不算高?

1000万的贷款三年期到期还1500万,相当于每一年的利息是166.6万,折算下来年化利率是16.6%。至于这个利率是否划算,要看你在什么金融机构贷款以及你个人的资质来看。如果你个人条件比较好,在银行做的抵押贷款,那我认为16…

Golang之变量去哪儿

写过C/C的同学都知道,调用著名的malloc和new函数可以在堆上分配一块内存,这块内存的使用和销毁的责任都在程序员。一不小心,就会发生内存泄露,搞得胆战心惊。切换到Golang后,基本不会担心内存泄露了。虽然也有new函数&…

运营商ip映射_我们如何映射互联网以发现运营商

运营商ip映射Being able to accurately predict which carriers use which IP addresses is important for Wandera’s data cost management solution. Customers with dual-SIM/eSIM devices in their fleet need to be aware at which point in time a device is using whic…

在县城开一家彩票站,一个月能赚多少钱?

现在彩票店多如牛毛,几步就有一个投注站,真能赚大钱的很少,但维持个基本生活应该是不成问题的。 至于接手彩票上是否能赚钱,关键还是要看人流,人流,人流。 想要知道彩票站是否赚钱,你就得先了解…

修改TrustedInstaller权限文件(无法删除文件)

在Win7系统中,存在一个虚拟账户,即TrustedInstaller,有时需要对C盘一些系统文件/文件夹进行修改,或删除,就会弹出“你需要TrustedInstaller提供的权限才能修改此文件”。这时用此法可解除此限制。对于系统中一些无法删…

yolov3算法优点缺点_优点缺点

yolov3算法优点缺点Naive Bayes: A classification algorithm under a supervised learning group based on Probabilistic logic. This is one of the simplest machine learning algorithms of all. Logistic regression is another classification algorithm that models po…

为什么很多企业要跑到美国去上市,而不是在A股上市?

我们都知道目前很多中国优质的企业都选择在香港,美国等境外上市,其中不乏阿里巴巴、腾讯,京东,百度这样的知名企业。比如下图是2017年我国市值排名前20的企业,这些企业当中有19个在境外上市,有的是境外跟境…

逻辑回归画图_逻辑回归

逻辑回归画图申请流程 (Application Flow) Logistic Regression is one of the most fundamental algorithms for classification in the Machine Learning world.Logistic回归是机器学习世界中分类的最基本算法之一。 But before proceeding with the algorithm, let’s firs…

邮储银行的规模有多大?凭什么可以成为第6大国有银行?

邮储银行之所以被划为第6大国有银行,因为他不论是在性质上还是在规模上都对得起第6大国有银行这一称号。首先邮储银行是国有控股的大型商业银行。邮储银行是由原来邮局的储蓄所以及邮电系统的储蓄业务整合而来,在上市之前邮储银行由中国邮政集团100%控股…