詹森不等式_注意詹森差距

詹森不等式

背景 (Background)

In Kaggle’s M5 Forecasting — Accuracy competition, the square root transformation ruined many of my team’s forecasts and led to a selective patching effort in the eleventh hour. Although it turned out well, we were reminded that “reconstitution bias” can plague predictions on the original scale, even with common transformations such as the square root.

在Kaggle的“ M5预测-准确性”竞赛中 ,平方根转换破坏了我团队的许多预测,并在第11小时进行了选择性修补工作。 尽管结果很好 ,但我们仍被提醒,“重构偏见”会困扰原始规模的预测,即使采用平方根之类的常见转换也是如此。

平方根变换 (The square root transformation)

For Poisson data, the rationale of the square root is that it is a variance-stabilizing transformation; in theory, the square root of the values are distributed approximately normal with constant variance and a mean that is the square root of the original mean. It is an approximation, and as Wikipedia puts it, one in which the “convergence to normality (as [the original mean] increases) is far faster than the untransformed variable.

对于Poisson数据 ,平方根的基本原理是它是方差稳定的变换; 从理论上讲,值的平方根近似分布,且具有恒定方差,且均值是原始均值的平方根。 正如Wikipedia所说 ,这是一种近似,其中“ 一化的收敛性(随着(原始均值)的增加)比未转换的变量快得多。

Imagine you decide to take square roots in a count data scenario, feeling good reassured that the convergence to normality is “fast.” You then model the mean of square-root transformed data and then get predictions on the square root scale. At some point, especially in a forecasting scenario, you’ll have to get back to the original scale. That probably entails squaring the model-estimated means. The M5 competition served as a reminder that this approach can and will break down.

想象一下,您决定在计数数据方案中求平方根,并确信向正态的收敛是“快速的”。 然后,您可以对平方根转换后的数据的均值建模,然后获得平方根尺度的预测。 在某些时候,尤其是在预测情况下,您必须回到原始比例。 这可能需要对模型估计的均方进行平方。 M5竞赛提醒我们,这种方法可能并且将会失败。

詹森差距 (The Jensen Gap)

Jensen’s Inequality states that for convex functions, the function evaluated at the expectation is less than or equal to the expectation of the function, i.e., g(E[Y]) ≤ E[g(Y)]. The inequality is flipped for concave functions.

Jensen不等式指出,对于凸函数,按期望评估的函数小于或等于该函数的期望,即g(E [Y])≤E [g(Y)]。 对于凹函数,不等式被翻转。

Similarly, the Jensen Gap is defined as the difference E[g(Y)]-g(E[Y]), which is positive for convex functions g. (As an aside, notice that when g(x) is the square function, the Jensen Gap is the Variance of Y, which had better be non-negative!)

类似地, 詹森差距定义为差E [ g ( Y )]- g (E [ Y ]),对于凸函数g为正 (顺便说一句, 请注意,当g ( x )是平方函数时,Jensen Gap Y的方差,最好是非负的!)

When considering g(x) as the square function and the square root of Y as the random variable, the Jensen Gap becomes E[Y]-E[sqrt(Y)]². Since that quantity is positive, our reconstituted mean will be biased downward. To learn more about the magnitude of the gap, we turn to the Taylor expansion.

当将g ( x )作为平方函数并将Y的 平方根作为随机变量时,Jensen Gap变为E [ Y ] -E [sqrt( Y )]²。 由于该数量为正,因此我们重构的均值将向下偏向。 要了解有关差距大小的更多信息,我们转向泰勒展开。

泰勒展开至近似偏差 (Taylor expansion to approximate bias)

To the Mathematics StackExchange prompt “Expected Value of Square Root of Poisson Random Variable,” contributor Hernan Gonzalez explains the Taylor expansion of a random variable about its mean, as shown in the screenshot below.

在数学StackExchange提示“ 泊松随机变量平方根的期望值 ”中, 贡献者Hernan Gonzalez解释了随机变量的泰勒展开式及其均值,如下面的屏幕快照所示。

Image for post

Note that the expansion needs at least a few central moments of the original distribution. For the Poisson, the first three are just the mean parameter.

请注意,展开至少需要原始分布的几个中心时刻。 对于泊松而言,前三个只是均值参数。

Ignoring that the mean estimator is also a random variable, we can run the expectation above through the inverse transformation, i.e., square it, to get an idea of the bias on the original scale for any Poisson mean value (the algebra isn’t here but it’s computed in line 34 of the demonstration code.) Similarly, with properties of the square root of the random variable, it’s straightforward to analyze g(x) = x ^2 in the same way. That opens up the possibility of bias correction, an interesting proposition, albeit one with assumptions and complexities of its own.

忽略均值估计器也是一个随机变量,我们可以通过逆变换在上面运行期望值,即对它求平方,以了解任何泊松均值在原始比例上的偏差(代数不在此处但是,它是在演示代码的第34行中计算出来的 。)类似地,由于具有随机变量的平方根的属性,因此以相同的方式分析g (x)= x ^ 2很简单。 这开辟了偏差校正的可能性,这是一个有趣的主张,尽管它有其自身的假设和复杂性。

近似分解 (Approximation breakdown)

Near the end of his answer, Gonzalez mentions that the approximation “is only useful if” the mean of the original Poisson is quite a bit bigger than 1, clarifying in the comments that this is needed so that “the terms of the sum decrease quickly.” That follows from the mean being raised to negative powers after the original term.

冈萨雷斯在回答接近尾声时提到,“ 仅当 ”原始泊松的均值比1大很多时,近似值“ 才有用 ”,并在注释中阐明了这一点是必要的,以便“ 总和的项Swift减少”。 。 ”这是因为原任期之后,均值被提升为负数。

In the M5 competition, mean sales for many items were substantially below one, and thus using the square root transformation was a recipe for poor performance. To get an idea of how this plays out in an actual sample, the next section will investigate this phenomenon via simulation.

在M5竞赛中,许多商品的平均销售额都大大低于1,因此使用平方根变换是降低性能的良方。 为了了解这种情况在实际样本中如何发挥作用,下一部分将通过仿真研究这种现象。

示范 (Demonstration)

In this section, we use the loess smoother to create models on both the original scale and the square root scale, and square the mean estimates of the latter. For simulated Poisson data with both a mean of 20 and a mean of 0.2, we plot the two sets of predictions and examine the bias. The code is under 50 lines and is available in Nousot’s Public Github repository.

在本节中,我们将使用黄土平滑器在原始比例和平方根比例上创建模型,并对后者的均值进行平方。 对于均值为20和均值为0.2的模拟Poisson数据,我们绘制了两组预测并检查了偏差。 该代码少于50行,可在Nousot的Public Github存储库中找到 。

当平均值是20 (When the mean is 20)

Image for post

For the case where the mean of the Poisson random variable is 20, the retransformation bias is negative (as Jensen’s Inequality said it would be), but also relatively small. In the code, the first two terms of the Taylor expansion are computed and compared to the empirical bias on the square root scale. At -0.027 and -0.023, respectively, they are relatively close.

对于泊松随机变量的平均值为20的情况,重变换偏差为负(就像詹森的不等式所说的那样),但也相对较小。 在代码中,计算出泰勒展开的前两个项,并将其与平方根尺度上的经验偏差进行比较。 它们分别为-0.027和-0.023,相对接近。

当平均值为0.20时 (When the mean is 0.20)

Image for post

For the case where the mean of the Poisson random variable is 0.20, the picture is much different. While Jensen’s Inequality always holds, the Jensen Gap is now large in a relative sense. Furthermore, the Taylor approximation has completely broken down, with the first two bias terms summing to 0.419 while the empirical bias is -.251 (still on the square root scale).

对于泊松随机变量的平均值为0.20的情况,图片有很大不同。 尽管詹森的不平等现象始终存在,但詹森差距现在相对来说还是很大的。 此外,泰勒近似已完全分解,前两个偏差项的总和为0.419,而经验偏差为-.251(仍在平方根刻度上)。

讨论区 (Discussion)

David Warton’s 2018 paper “Why You Cannot Transform Your Way Out of Trouble for Small Counts” demonstrates the hopelessness of getting to the standard assumptions for small-mean count data. For the sparse time series in M5, there was nothing to gain and a lot to lose by taking the square root. At the very least, we should have treated those series differently. (Regarding our use of the Kalman Filter, Otto Seiskari’s advice to tune via cross-validation when the model is misspecified is especially compelling).

戴维·沃顿(David Warton)在2018年发表的论文“ 为什么小数位数无法摆脱麻烦 ”,这说明了达到小数位数数据的标准假设的绝望。 对于M5中稀疏的时间序列,通过求平方根没有任何收益,也有很多损失。 至少,我们应该对这些系列进行不同的处理。 (关于我们对卡尔曼滤波器的使用,当模型指定不正确时, Otto Seiskari的建议通过交叉验证进行调谐特别引人注目)。

Warton’s paper has some harsh words for users of transformations in general. I still believe that if a transformation brings you closer to the standard assumptions, where your code runs faster and you enjoy nicer properties, then it’s worth considering. But there needs to be an honest exploration of properties of the transformation in the context of the data, and this does not come for free.

一般而言,沃顿的论文对转​​换的使用者来说有些苛刻的话。 我仍然相信,如果 转换使您更接近标准假设,即代码运行速度更快并且享受更好的属性,因此值得考虑。 但是需要在数据的上下文中诚实地探索转换的属性,而这并不是免费的。

Typically transformations (and their inverses) are either convex or concave, and thus Jensen’s Inequality will guarantee bias in the form of a Jensen Gap. If you’re wondering why you’ve never heard of it, it’s because it’s often written off as approximation error. According to Gao et al (2018),

通常,变换(及其逆变换)是凸的或凹的,因此Jensen的不等式将保证以Jensen Gap的形式出现偏差。 如果您想知道为什么从未听说过它,那是因为它经常被记为近似误差。 根据Gao等人(2018) ,

“Computing a hard-to-compute [expectation of a function] appears in theoretical estimates in a variety of scenarios from statistical mechanics to machine learning theory. A common approach to tackle this problem is to … show that the error, i.e., the Jensen gap, would be small enough for the application.”

从统计力学到机器学习理论,在各种情况下的理论估计中都出现了计算难以计算的[函数期望]。 解决此问题的常用方法是……表明误差(即詹森间隙)对于应用程序而言足够小。”

When using transformations, the work to understand the properties of inverse-transformation (in the context of the data) is worth it. It’s dangerous out there. Watch your step, and mind the Jensen Gap!

使用转换时,了解逆转换属性(在数据上下文中)的工作是值得的。 那里很危险。 注意您的脚步,并注意詹森差距!

翻译自: https://towardsdatascience.com/mind-the-jensen-gap-c54e0eb9e1b7

詹森不等式

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388668.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数据分析师 需求分析师_是什么让分析师出色?

数据分析师 需求分析师重点 (Top highlight)Before we dissect the nature of analytical excellence, let’s start with a quick summary of three common misconceptions about analytics from Part 1:在剖析卓越分析的本质之前,让我们从第1部分中对分析的三种常…

JQuery发起ajax请求,并在页面动态的添加元素

页面html代码&#xff1a; <li><div class"coll-tit"><span class"coll-icon"><iclass"sysfont coll-default"></i>全域旅游目的地</span></div><div class"coll-panel"><div c…

MAYA插件入门

我们知道&#xff0c; MAYA 是一个基于结点的插件式软件架构&#xff0c;这种开放式的软件架构是非常优秀的&#xff0c;它可以让用户非常方便地在其基础上开发一些自已想要的插件&#xff0c;从而实现一些特殊的功能或效果。 在MAYA上开发自已的插件&#xff0c;你有3种选择&a…

(原創) 如何使用C++/CLI读/写jpg檔? (.NET) (C++/CLI) (GDI+) (C/C++) (Image Processing)

Abstract因为Computer Vision的作业&#xff0c;之前都是用C# GDI写&#xff0c;但这次的作业要做Grayscale Dilation&#xff0c;想用STL的Generic Algorithm写&#xff0c;但C Standard Library并无法读取jpg档&#xff0c;用其它Library又比较麻烦&#xff0c;所以又回头想…

猫眼电影评论_电影的人群意见和评论家的意见一样好吗?

猫眼电影评论Ryan Bellgardt’s 2018 movie, The Jurassic Games, tells the story of ten death row inmates who must compete for survival in a virtual reality game where they not only fight each other but must also fight dinosaurs which can kill them both in th…

c#对文件的读写

最近需要对一个文件进行数量的分割&#xff0c;因为数据量庞大&#xff0c;所以就想到了通过写程序来处理。将代码贴出来以备以后使用。 //读取文件的内容 放置于StringBuilder 中 StreamReader sr new StreamReader(path, Encoding.Default); String line; StringBuilder sb …

ai前沿公司_美术是AI的下一个前沿吗?

ai前沿公司In 1950, Alan Turing developed the Turing Test as a test of a machine’s ability to display human-like intelligent behavior. In his prolific paper, he posed the following questions:1950年&#xff0c;阿兰图灵开发的图灵测试作为一台机器的显示类似人类…

关于WKWebView高度的问题的解决

关于WKWebView高度的问题的解决 IOS端嵌入网页的方式有两种UIWebView和WKWebView。其中WKWebView的性能要高些;WKWebView的使用也相对简单 WKWebView在加载完成后&#xff0c;在相应的代理里面获取其内容高度&#xff0c;大多数网上的方法在获取高度是会出现一定的问题&#xf…

测试nignx php请求并发数,nginx 优化(突破十万并发)

一般来说nginx 配置文件中对优化比较有作用的为以下几项&#xff1a;worker_processes 8;nginx 进程数&#xff0c;建议按照cpu 数目来指定&#xff0c;一般为它的倍数。worker_cpu_affinity 00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000;为每个进…

mardown 标题带数字_标题中带有数字的故事更成功吗?

mardown 标题带数字统计 (Statistics) I have read a few stories on Medium about writing advice, and there were some of them which, along with other tips, suggested that putting numbers in your story’s title will increase the number of views, as people tend …

使用Pandas 1.1.0进行稳健的2个DataFrames验证

Pandas is one of the most used Python library for both data scientist and data engineers. Today, I want to share some Python tips to help us do qualification checks between 2 Dataframes.Pandas是数据科学家和数据工程师最常用的Python库之一。 今天&#xff0c;我…

置信区间的置信区间_什么是置信区间,为什么人们使用它们?

置信区间的置信区间I’m going to try something a little different today, in which I combine two (completely unrelated) topics I love talking about, and hopefully create something that is interesting and educational.今天&#xff0c;我将尝试一些与众不同的东西…

php中wlog是什么意思,d-log模式是什么意思

D-Log是一种高动态范围的视频素材记录格式&#xff0c;总而言之这个色彩模式为后期调色提供了更大的空间。在相机和摄影机拍摄时&#xff0c;一颗高性能的传感器通常支持11档以上的动态范围&#xff0c;而在8bit的照片或视频上&#xff0c;以符合人眼感知的Gamma进行机内处理和…

PowerShell入门(三):如何快速地掌握PowerShell?

如何快速地掌握PowerShell呢&#xff1f;总的来说&#xff0c;就是要尽可能多的使用它&#xff0c;就像那句谚语说的&#xff1a;Practice makes perfect。当然这里还有一些原则和方法让我们可以遵循。 有效利用交互式环境 一般来说&#xff0c;PowerShell有两个主要的运行环境…

pca 主成分分析_通过主成分分析(PCA)了解您的数据并发现潜在模式

pca 主成分分析Save time, resources and stay healthy with data exploration that goes beyond means, distributions and correlations: Leverage PCA to see through the surface of variables. It saves time and resources, because it uncovers data issues before an h…

UML-- plantUML安装

plantUML安装 因为基于intellid idea,所以第一步自行安装.setting->plugins 搜索plantUML安装完成后&#xff0c;重启idea 会有如下显示安装Graphviz 下载地址 https://graphviz.gitlab.io/_pages/Download/Download_windows.html配置Graphviz环境变量&#xff1a; dot -ver…

rstudio 关联r_使用关联规则提出建议(R编程)

rstudio 关联r背景 (Background) Retailers typically have a wealth of customer transaction data which consists of the type of items purchased by a customer, their value and the date they were purchased. Unless the retailer has a loyalty rewards system, they …

jquery数据折叠_通过位折叠缩小大数据

jquery数据折叠Sometimes your dataset is just too large, and you need a way to shrink it down to a reasonable size. I am suffering through this right now as I work on different machine learning techniques for checkers. I could work for over 18 years and buy…

新鬼影病毒

今天和明天是最后两天宿舍有空调的日子啦,暑假宿舍没空调啊,悲催T__T 好吧,今天是最精华的部分啦对于鬼影3的分析,剩下的都是浮云啦,alg.exe不准备分析了,能用OD调试的货.分析起来只是时间问题.但是MBR和之后的保护模式的代码就不一样啦同学们,纯静态分析,伤不起啊,各种硬编码,…

Silverlight:Downloader的使用(event篇)

(1)Downloader的使用首先我们看什么是Downloader,就是一个为描述Silverlight plug-in下载功能的集合.Downloader能异步的通过HTTP GET Request下载内容.他是一个能帮助Silverlight下载内容的一个对象,这些下载内容包括(XMAL content,JavaScript content,ZIP packages,Media,ima…