逻辑回归 概率回归_概率规划的多逻辑回归

逻辑回归 概率回归

There is an interesting dichotomy in the world of data science between machine learning practitioners (increasingly synonymous with deep learning practitioners), and classical statisticians (both Frequentists and Bayesians). There is generally no overlap between the techniques used in these two camps. However, there are some interesting tools and libraries that are trying to bridge the gap between the two camps, especially using Bayesian inference techniques to estimate the uncertainty of deep learning models. See this post and this paper to know more about the historical and recent trends in this exciting new area. The biggest benefit to adopting Bayesian thinking is it forces us to explicitly layout all the assumptions that go into the model. It is hard to perform Bayesian inference without fully being aware of all the modeling choices throughout the way. The biggest downside to Bayesian inference is the time needed to run even moderately sized models.

在机器学习从业者(越来越多地与深度学习从业者同义)与古典统计学家(包括频率论者和贝叶斯主义者)之间,数据科学领域存在着一种有趣的二分法。 在这两个阵营中使用的技术之间通常没有重叠。 但是,有一些有趣的工具和库正试图弥合两个阵营之间的鸿沟,尤其是使用贝叶斯推理技术来估计深度学习模型的不确定性。 请参阅这篇文章和本文,以了解有关这个令人兴奋的新领域的历史和最近趋势的更多信息。 采用贝叶斯思想的最大好处是,它迫使我们明确设计模型中的所有假设。 在没有完全了解整个过程中所有建模选择的情况下,很难执行贝叶斯推理。 贝叶斯推断最大的缺点是运行中等大小的模型所需的时间。

There are several probabilistic programming languages/frameworks out there that are becoming more popular due to the recent advances in computing hardware. The most common and mature language is Stan which has APIs to work with other common programming languages like Python (PyStan) and R (RStan). There are also some newer players in the field like PyMC3 (Theano), Pyro (PyTorch), and Turing (Julia). Of these, Turing, written in Julia potentially seems to be an interesting option. It brings with it all the advantages of Julia, and combining it with Flux can theoretically make it “easy” to estimate the uncertainties of any deep learning model.

由于计算硬件的最新发展,有几种概率性编程语言/框架正在变得越来越流行。 最常见和最成熟的语言是Stan,它具有与其他常见编程语言(例如Python( PyStan )和R( RStan ))一起使用的API。 该领域中还有一些较新的玩家,例如PyMC3 (Theano), Pyro (PyTorch)和Turing (Julia)。 其中,用Julia(Julia)编写的图灵似乎是一个有趣的选择。 它带来了Julia的所有优点 ,并且将其与Flux结合使用在理论上可以很轻松地估计任何深度学习模型的不确定性。

There are some amazing books to get you up and running with Bayesian data analysis and the bible in the field is definitely the book by the great Andrew Gelman. He also writes short articles/opinions on his blog which is worth following. I personally think the book “Statistical Rethinking” by Richard McElreath is the best introduction to the field for any newcomer. He walks you from the garden of forking paths all the way to multi-level models. He even has his entertaining and engaging lectures up on Youtube! No reason not to get your daily dose of Bayesian 😄

有一些很棒的书可以帮助您开始使用贝叶斯数据分析,并且该领域的圣经绝对是伟大的安德鲁·盖尔曼(Andrew Gelman) 所著的书 。 他还在自己的博客上写了一些简短的文章/观点,值得关注。 我个人认为,Richard McElreath撰写的“ Statistical Rethinking”一书对于任何新手来说都是该领域的最佳介绍。 他会带您从分叉路径的花园一直到多层模型。 他甚至在YouTube上进行有趣而有趣的演讲 ! 没有理由不每天服用贝叶斯😄

In this blog post, I just wanted to get my feet wet with Julia and Turing. I will use both PyStan and Turing to build multi-category logistic models to predict the species of penguins based on their features like bill-length, island, sex, etc. This is similar to the more popular Iris dataset that is used so commonly in data science tutorials. For more details on the Palmer penguin dataset see here.

在这篇博客中,我只是想和Julia和Turing在一起。 我将同时使用PyStan和Turing来建立多类别的物流模型,根据帐单长度,岛屿,性别等特征来预测企鹅的种类。这类似于在Iso中常用的更流行的Iris数据集。数据科学教程。 有关Palmer企鹅数据集的更多详细信息,请参见此处 。

y斯坦 (PyStan)

First, let's use PyStan to build a multi-logit model. Code for the Stan model looks like this:

首先,让我们使用PyStan构建多登录模型。 Stan模型的代码如下所示:

data {
int N; //the number of training observations
int N2; //the number of test observations
int D; //the number of features
int K; //the number of classes
int y[N]; //the response
matrix[N,D] x; //the model matrix
matrix[N2,D] x_new; //the matrix for the predicted values
}
parameters {
matrix[D,K] beta; //the regression parameters
}
model {
matrix[N, K] x_beta = x * beta;
to_vector(beta) ~ normal(0, 1);
for (n in 1:N)
y[n] ~ categorical_logit(x_beta[n]');
}

This is exactly similar to the example in Stan’s documentation. We are using a standard normal prior on all parameters. In the case of our penguin dataset, we have a total of 9 different features; four of them are continuous features namely bill-length, bill-depth, flipper-length, and body-mass, and 5 are one-hot encoded features for the island and sex categorical values. Therefore, the number of parameters to estimate is 9 per category. Since we have 3 categories, that would be a total of 27 parameters to estimate. For each category, the sum of the coefficients and the feature values are calculated:

这与Stan文档中的示例完全相似。 我们在所有参数上都使用标准普通优先级。 就我们的企鹅数据集而言,我们共有9种不同的功能; 其中四个是连续特征,即钞票长度,钞票深度,鳍状肢长度和身体质量,另外五个是岛和性别分类值的一键编码特征。 因此,每个类别要估计的参数数量为9。 由于我们有3个类别,因此总共需要估算27个参数。 对于每个类别,计算系数和特征值的总和:

The final category for each data point is computed using softmax:

使用softmax计算每个数据点的最终类别:

Image for post

We could have also let the parameters for one category to be all zeros, and only estimate the remaining 9*2 parameters. This is the same idea as the binary classification models, where we only have one coefficient present:

我们也可以让一个类别的参数全为零,而仅估计剩余的9 * 2参数。 这与二进制分类模型的想法相同,在二进制分类模型中,我们只有一个系数:

Image for post

I will show how that looks like when we get to the Julia code using the Turing library

我将展示使用图灵库访问Julia代码时的情况

Now we have the model ready, let's go ahead and perform sampling to get the posteriors for all the parameters:

现在我们已经准备好模型,让我们继续进行采样以获取所有参数的后验:

These are the parameters for Sampling:

这些是用于采样的参数:

Algorithm: No-U-Turn Sampler (NUTS)

算法:禁止掉头采样器(NUTS)

Warmup: 500 iterations

预热:500次迭代

Samples: 500 iterations

样本:500次迭代

Chains: 4

链数:4

Max Tree Depth: 10

最大树深:10

Time elapsed per chain: ~140 seconds

每条链经过的时间:〜140秒

Image for post
The posterior distributions for some parameters and their corresponding trace plots for 500 iterations. The samples are too unstable to be reliable
对于500次迭代,某些参数的后验分布及其对应的迹线图。 样品太不稳定而不能可靠

The chains show poor mixing and stability, and the recommendation from Stan is to go higher with the max tree depth for the NUTS sampler to get better stability between and across chains

链条显示出不良的混合和稳定性,Stan的建议是增加NUTS采样器的最大树深度,以在链条之间和跨链获得更好的稳定性。

Image for post
Summary of samples for some parameters. Rhat is definitely too high for the samples to be useful
一些参数的样本摘要。 Rhat绝对太高,无法用于样本

The poor stability of the chains is also reflected in the number of effective samples (n_eff), which is quite low for some parameters. The Rhat is significantly above the recommended value of 1.05 for most parameters.

链的不良稳定性还反映在有效样本数(n_eff)中,对于某些参数而言,该数目非常低。 对于大多数参数,Rhat明显高于建议值1.05。

In general though, this is not generally an issue for most cases and the samples are usable as is shown below for predicting the train and test set classes

通常,在大多数情况下,这通常不是问题 ,并且可以使用样本,如下所示,用于预测训练和测试集的类别

Image for post
Training set predictions
训练集预测
Image for post
Test set predictions
测试集预测

Now, lets increase the maximum tree depth for the NUTS sample from 10 to 12. This increases the time taken for each chain to converge

现在,让NUTS样本的最大树深度从10增加到12。这增加了每个链收敛所需的时间。

Max Tree Depth: 12

最大树深:12

Time elapsed per chain: ~570 seconds

每条链经过的时间:〜570秒

Image for post
The posterior distributions for some parameters and their corresponding trace plots for 500 iterations
500次迭代的某些参数的后验分布及其对应的轨迹图

The chains show much better mixing and stability, and we could still go higher with the max tree depth for the NUTS sampler to get better stability between and across chains

链条显示出更好的混合和稳定性,对于NUTS采样器,我们仍然可以使用最大树深度来提高链条之间和跨链条的稳定性。

Image for post
Summary of samples for some parameters. Rhat is on the higher end
一些参数的样本摘要。 Rhat在高端

As we can see, the number of effective samples (n_eff) has also increased considerably for some parameters, and the Rhat is approaching the recommended value of 1.05 for some parameters. These samples as expected provide good classification predictions

如我们所见,某些参数的有效样本数(n_eff)也大大增加,Rhat接近某些参数的建议值1.05。 预期这些样本提供了良好的分类预测

Image for post
Training set predicitons
训练集谓词
Image for post
Test set predictions
测试集预测

Increasing the max tree depth further to 15 significantly improves the chain stability (data not shown) but also increases the computational time ~25 fold.

将最大树深度进一步增加到15,可以显着改善链的稳定性(数据未显示),但还会增加约25倍的计算时间。

The code for running the above models is here. For the full project that includes setup for AWS, Sagemaker, and XGBoost models refer to my earlier blog post and Github repo.

运行上述模型的代码在这里 。 有关包含适用于AWS,Sagemaker和XGBoost模型的设置的完整项目,请参阅我先前的博客文章和Github repo 。

Julia: (Julia:)

Now, I will show you the equivalent model using Julia and Turing. The code can be found here in the main project repo. The model is defined like so:

现在,我将向您展示使用Julia和Turing的等效模型。 该代码可以发现这里的主要项目回购。 该模型的定义如下:

@model logistic_regression(x, y, n, σ) = begin
intercept_Adelie ~ Normal(0, σ)
intercept_Gentoo ~ Normal(0, σ)
intercept_Chinstrap ~ Normal(0, σ) bill_length_mm_Adelie ~ Normal(0, σ)
bill_length_mm_Gentoo ~ Normal(0, σ)
bill_length_mm_Chinstrap ~ Normal(0, σ) bill_depth_mm_Adelie ~ Normal(0, σ)
bill_depth_mm_Gentoo ~ Normal(0, σ)
bill_depth_mm_Chinstrap ~ Normal(0, σ) flipper_length_mm_Adelie ~ Normal(0, σ)
flipper_length_mm_Gentoo ~ Normal(0, σ)
flipper_length_mm_Chinstrap ~ Normal(0, σ) body_mass_g_Adelie ~ Normal(0, σ)
body_mass_g_Gentoo ~ Normal(0, σ)
body_mass_g_Chinstrap ~ Normal(0, σ) island_Biscoe_Adelie ~ Normal(0, σ)
island_Biscoe_Gentoo ~ Normal(0, σ)
island_Biscoe_Chinstrap ~ Normal(0, σ)
island_Dream_Adelie ~ Normal(0, σ)
island_Dream_Gentoo ~ Normal(0, σ)
island_Dream_Chinstrap ~ Normal(0, σ)
island_Torgersen_Adelie ~ Normal(0, σ)
island_Torgersen_Gentoo ~ Normal(0, σ)
island_Torgersen_Chinstrap ~ Normal(0, σ) sex_female_Adelie ~ Normal(0, σ)
sex_female_Gentoo ~ Normal(0, σ)
sex_female_Chinstrap ~ Normal(0, σ)
sex_male_Adelie ~ Normal(0, σ)
sex_male_Gentoo ~ Normal(0, σ)
sex_male_Chinstrap ~ Normal(0, σ)for i = 1:n
v = softmax([intercept_Adelie +
bill_length_mm_Adelie*x[i, 1] +
bill_depth_mm_Adelie*x[i, 2] +
flipper_length_mm_Adelie*x[i, 3] +
body_mass_g_Adelie*x[i, 4] +
island_Biscoe_Adelie*x[i, 5] +
island_Dream_Adelie*x[i, 6] +
island_Torgersen_Adelie*x[i, 7] +
sex_female_Adelie*x[i,8] +
sex_male_Adelie*x[i,9],
intercept_Gentoo +
bill_length_mm_Gentoo*x[i, 1] +
bill_depth_mm_Gentoo*x[i, 2] +
flipper_length_mm_Gentoo*x[i, 3] +
body_mass_g_Gentoo*x[i, 4] +
island_Biscoe_Gentoo*x[i, 5] +
island_Dream_Gentoo*x[i, 6] +
island_Torgersen_Gentoo*x[i, 7] +
sex_female_Gentoo*x[i,8] +
sex_male_Gentoo*x[i,9],
intercept_Chinstrap + bill_length_mm_Chinstrap*x[i, 1] +
bill_depth_mm_Chinstrap*x[i, 2] +
flipper_length_mm_Chinstrap*x[i, 3] +
body_mass_g_Chinstrap*x[i, 4] +
island_Biscoe_Chinstrap*x[i, 5] +
island_Dream_Chinstrap*x[i, 6] +
island_Torgersen_Chinstrap*x[i, 7] +
sex_female_Chinstrap*x[i,8] +
sex_male_Chinstrap*x[i,9]])
y[i, :] ~ Multinomial(1, v)
end
end;

I used the default HMC sampler as recommended in the Turing tutorial. One thing that I noticed is the much better stability of the chains when using the HMC sampler from Turing:

我使用了图灵教程中推荐的默认HMC采样器。 我注意到的一件事是,使用图灵的HMC采样器时,链条的稳定性更好:

Image for post
The posterior distributions for some parameters and their corresponding trace plots for 1000 iterations
一些参数的后验分布及其对应的1000次迭代轨迹图

And the summary of the samples:

以及样本摘要:

Image for post
The r_hat values look better
r_hat值看起来更好

Overall, the HMC samples from Turing seem to do a lot better compared to the NUTS samples from PyStan. Of course, this is not an apples-to-apples comparison, but these are interesting results. In addition, the HMC sampler also was much faster compared to the max_tree_depth=12 run from PyStan shown above. This is something to dig into more.

总体而言,与来自PyStan的NUTS样本相比,来自Turing的HMC样本似乎要好得多。 当然,这不是一个苹果对苹果的比较,但这是有趣的结果。 此外,与上面显示的PyStan运行的max_tree_depth = 12相比,HMC采样器还快得多。 这是需要进一步研究的东西。

The predictions from Turing are perfect on both the Training and Test sets as expected since this is an easy prediction problem.

Turing的预测在训练和测试集上都是理想的,因为这是一个容易预测的问题。

In conclusion, I like Julia and Turing so far! Another great (and fast) tool for Probabilistic Programming!

总之,到目前为止,我喜欢Julia和图灵! 概率编程的另一个出色(快速)工具!

Some good things:

一些好东西:

  1. Turing is fast! (at least in this example with default samplers)

    图灵快! (至少在此示例中使用默认采样器)
  2. 1-based indexing for Julia and Turing vs Python’s 0-based indexing which makes it harder to co-ordinate with Stan’s 1-based indexing

    Julia和Turing的基于1的索引与Python的基于0的索引相比,这使得与Stan的基于1的索引更难以协调
  3. Symbolic math ability with Turing and Julia

    图灵和Julia的符号数学能力

Some disadvantages compared to PyStan:

与PyStan相比,有些缺点:

  1. Not enough libraries to make pre-processing easy

    没有足够的库来简化预处理
  2. Stan has parsimonous model declaration syntax compared to Turing (probably just my ignorance with Turing)

    与Turing相比,Stan具有简约的模型声明语法(可能只是我对Turing的无知)
  3. Not a straightforward way to combine with Python (PyJulia is an option worth exploring)

    这不是与Python结合的直接方法( PyJuli a是值得探索的选择)

*****************************************************************

****************************************************** ***************

Image for post
https://www.azquotes.com/quote/655174https://www.azquotes.com/quote/655174

翻译自: https://medium.com/swlh/multi-logistic-regression-with-probabilistic-programming-db9a24467c0d

逻辑回归 概率回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389714.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

sys.modules[__name__]的一个实例

关于sys.modules[__name__]的用法,百度上阅读量比较多得一个帖子是:https://www.cnblogs.com/robinunix/p/8523601.html 对于里面提到的基础性的知识点这里就不再重复了,大家看原贴就好。这里为大家提供一个详细的例子,帮助大家更…

ajax不利于seo_利于探索移动选项的界面

ajax不利于seoLately, my parents will often bring up in conversation their desire to move away from their California home and find a new place to settle down for retirement. Typically they will cite factors that they perceive as having altered the essence o…

C#调用WebKit内核

原文:C#调用WebKit内核版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u013564470/article/details/80255954 系统要求 Windows与.NET框架 由于WebKit库和.NET框架的要求,WebKit .NET只能在Windows系统上运行。从…

数据分析入门:如何训练数据分析思维?

本文由 网易云 发布。 作者:吴彬彬(本篇文章仅限知乎内部分享,如需转载,请取得作者同意授权。) 我们在生活中,会经常听说两种推理模式,一种是归纳 一种是演绎,这两种思维模式能够帮…

559. N 叉树的最大深度

559. N 叉树的最大深度 给定一个 N 叉树,找到其最大深度。 最大深度是指从根节点到最远叶子节点的最长路径上的节点总数。 N 叉树输入按层序遍历序列化表示,每组子节点由空值分隔(请参见示例)。 示例 1: 输入&#…

el表达式取值优先级

不同容器中存在同名值时,从作用范围小到大的顺序依次尝试取值:pageContext->request->session->application 转载于:https://www.cnblogs.com/wrencai/p/9006880.html

数据探索性分析_探索性数据分析

数据探索性分析When we hear about Data science or Analytics , the first thing that comes to our mind is Modelling , Tuning etc. . But one of the most important and primary steps before all of these is Exploratory Data Analysis or EDA.当我们听到有关数据科学或…

5930. 两栋颜色不同且距离最远的房子

5930. 两栋颜色不同且距离最远的房子 街上有 n 栋房子整齐地排成一列,每栋房子都粉刷上了漂亮的颜色。给你一个下标从 0 开始且长度为 n 的整数数组 colors ,其中 colors[i] 表示第 i 栋房子的颜色。 返回 两栋 颜色 不同 房子之间的 最大 距离。 第 …

stata中心化处理_带有stata第2部分自定义配色方案的covid 19可视化

stata中心化处理This guide will cover an important, yet, under-explored part of Stata: the use of custom color schemes. In summary, we will learn how to go from this graph:本指南将涵盖Stata的一个重要但尚未充分研究的部分:自定义配色方案的使用。 总而…

Anaconda配置和使用

为什么80%的码农都做不了架构师?>>> 原来一直使用原生python和pip的方式,换了新电脑,准备折腾下Anaconda。 安装过程就不说了,全程可视化安装,很简单。 安装后用“管理员权限”打开“Anaconda Prompt”命令…

python 插补数据_python 2020中缺少数据插补技术的快速指南

python 插补数据Most machine learning algorithms expect complete and clean noise-free datasets, unfortunately, real-world datasets are messy and have multiples missing cells, in such cases handling missing data becomes quite complex.大多数机器学习算法期望完…

NIO 学习笔记

0. 介绍 参考 关于Java IO与NIO知识都在这里 ,在其基础上进行修改与补充。 1. NIO介绍 1.1 NIO 是什么 Java NIO 是 java 1.4, 之后新出的一套IO接口. NIO中的N可以理解为Non-blocking,不单纯是New。 1.2 NIO的特性/NIO与IO区别 IO是面向流的&#x…

[原创]java获取word里面的文本

需求场景 开发的web办公系统如果需要处理大量的Word文档(比如有成千上万个文档),用户一定提出查找包含某些关键字的文档的需求,这就要求能够读取 word 中的文字内容,而忽略其中的文字样式、表格、图片等信息。 方案分析…

ab 模拟_Ab测试第二部分的直观模拟

ab 模拟In this post, I would like to invite you to continue our intuitive exploration of A/B testing, as seen in the previous post:在本文中,我想邀请您继续我们对A / B测试的直观探索,如前一篇文章所示: Resuming what we saw, we…

1886. 判断矩阵经轮转后是否一致

1886. 判断矩阵经轮转后是否一致 给你两个大小为 n x n 的二进制矩阵 mat 和 target 。现 以 90 度顺时针轮转 矩阵 mat 中的元素 若干次 ,如果能够使 mat 与 target 一致,返回 true ;否则,返回 false 。 示例 1: 输…

samba登陆密码不正确

win7访问Linux Samba的共享目录提示“登录失败:用户名或密码错误”解决方法 解决办法:修改本地安全策略 通过Samba服务可以实现UNIX/Linux主机与Windows主机之间的资源互访,由于实验需要,轻车熟路的在linux下配置了samba服务&…

各类软件马斯洛需求层次分析_需求的分析层次

各类软件马斯洛需求层次分析When I joined Square, I was embedded on a product that had been in-market for a year but didn’t have dedicated analytics support.当我加入Square时,我被嵌入了已经上市一年但没有专门的分析支持的产品。 As you might expect,…

MySQL的变量分类总结

在MySQL中,my.cnf是参数文件(Option Files),类似于ORACLE数据库中的spfile、pfile参数文件,照理说,参数文件my.cnf中的都是系统参数(这种称呼比较符合思维习惯),但是官方…

亚洲国家互联网渗透率_发展中亚洲国家如何回应covid 19

亚洲国家互联网渗透率The COVID-19 pandemic has severely hit various economies across the world, with global impact estimated between USD 6.1 trillion and USD 9.1 trillion, equivalent to a loss of 7.1% to 10.5% of global gross domestic product (GDP).[1] More…

snake4444勒索病毒成功处理教程方法工具达康解密金蝶/用友数据库sql后缀snake4444...

*snake4444勒索病毒成功处理教程方法 案例:笔者负责一个政务系统的第三方公司的运维,上班后发现服务器的所有文件都打不开了,而且每个文件后面都有一个snake4444的后缀,通过网络我了解到这是一种勒索病毒。因为各个文件不能正常打…