梯度下降法优化目标函数_如何通过3个简单的步骤区分梯度下降目标函数

梯度下降法优化目标函数

Nowadays we can learn about domains that were usually reserved for academic communities. From Artificial Intelligence to Quantum Physics, we can browse an enormous amount of information available on the Internet and benefit from it.

如今,我们可以了解通常为学术团体保留的领域。 从人工智能量子物理学 ,我们可以浏览互联网上大量的信息并从中受益。

However, the availability of information has some drawbacks. We need to be aware of a huge amount of unverified sources, full of factual errors (it’s a topic for the whole different discussion). What’s more, we can get used to getting answers with ease by googling it. As a result, we often take them for granted and use them without a better understanding.

但是,信息的可用性有一些缺点。 我们需要意识到大量未经验证的来源,充满事实错误(这是整个不同讨论的主题)。 而且,我们可以通过谷歌搜索来轻松地轻松获得答案。 结果,我们经常认为它们是理所当然的,并在没有更好的理解的情况下使用它们。

The process of discovering things on our own is an important part of learning. Let’s take part in such an experiment and calculate derivatives behind Gradient Descent algorithm for a Linear Regression.

独自发现事物的过程是学习的重要组成部分。 让我们参加这样的实验,并为线性回归计算梯度下降算法背后的导数。

一点介绍 (A little bit of introduction)

Linear Regression is a statistical method that can be used to model the relationship between variables [1, 2]. It’s described by a line equation:

线性回归是一种统计方法,可用于对变量之间的关系进行建模[1、2]。 它由线方程描述:

Line equation.
Line equation (image by Author).
线方程(作者提供的图像)。

We have two parameters Θ₀ and Θ₁ and a variable x. Having data points we can find optimal parameters to fit the line to our data set.

我们有两个参数Θ₀Θ₁和a 变量x 。 有了数据点,我们可以找到最佳参数以使线适合我们的数据集。

Simple linear regression visualisation. Points and a line which goes close to them.
Fitting a line to a data set (image by Author).
将行拟合到数据集(作者提供的图像)。

Ok, now the Gradient Descent [2, 3]. It is an iterative algorithm that is widely used in Machine Learning (in many different flavors). We can use it to automatically find optimal parameters of our line.

好的,现在是梯度下降[2,3]。 它是一种迭代算法,已在机器学习中广泛使用(有许多不同的风格)。 我们可以使用它来自动找到生产线的最佳参数。

To do this, we need to optimize an objective function defined by this formula:

为此,我们需要优化由以下公式定义的目标函数:

Objective function formula.
Linear regression objective function (image by Author).
线性回归目标函数(作者提供的图像)。

In this function, we iterate over each point (xʲ, yʲ) from our data set. Then we calculate the value of a function f for , and current theta parameters (Θ₀, Θ₁). We take a result and subtract . Finally, we square it and add it to the sum.

在此函数中,我们迭代数据集中的每个点(xʲ,yʲ) 。 然后我们计算一个函数f x的值,和当前THETA参数(Θ₀,Θ₁)。 我们得到一个结果并减去 。 最后,我们将其平方并加到总和上。

Then in the Gradient Descent formula (which updates Θ₀ and Θ₁ in each iteration), we can find these mysterious derivatives on the right side of equations:

然后,在“梯度下降”公式(每次迭代中更新Θ₀Θ₁ )中,我们可以在等式右边找到这些神秘的导数:

Gradient descent formula.
Gradient descent formula (image by Author).
梯度下降公式(作者提供的图像)。

These are derivatives of the objective function Q(Θ). There are two parameters, so we need to calculate two derivatives, one for each Θ. Let’s move on and calculate them in 3 simple steps.

这些是目标函数Q(Θ)的导数。 有两个参数,因此我们需要计算两个导数,每个Θ一个 。 让我们继续并通过3个简单的步骤计算它们。

步骤1.链式规则 (Step 1. Chain Rule)

Our objective function is a composite function. We can think of it as it has an “outer” function and an “inner” function [1]. To calculate a derivative of a composite function we’ll follow a chain rule:

我们的目标函数是一个复合函数 。 我们可以认为它具有“外部”功能和“内部”功能[1]。 要计算复合函数的导数,我们将遵循一条链规则:

Chain rule formula.
Chain rule formula (image by Author).
链规则公式(作者提供的图像)。

In our case, the “outer” part is about raising everything inside the brackets (“inner function”) to the second power. According to the rule we need to multiply the “outer function” derivative by the derivative of an “inner function”. It looks like this:

在我们的案例中, “外部”部分是关于将方括号内的所有内容( “内部功能” )提升至第二幂。 根据规则,我们需要将“外部函数”导数乘以“内部函数”的导数。 看起来像这样:

Objective function after applying chain rule.
Applying the chain rule to the objective function (image by Author).
将链式规则应用于目标函数(作者提供的图像)。

步骤2.功率规则 (Step 2. Power Rule)

The next step is calculating a derivative of a power function [1]. Let’s recall a derivative power rule formula:

下一步是计算幂函数的导数[1]。 让我们回想一下微分幂规则公式:

Power rule formula.
Power rule formula (image by Author).
幂规则公式(作者提供的图像)。

Our “outer function” is simply an expression raised to the second power. So we put 2 before the whole formula and leave the rest as it (2 -1 = 1, and expression raised to the first power is simply that expression).

我们的“外部功能”只是表达为第二力量的表达。 因此,我们将2放在整个公式的前面,其余部分保留为原来的值( 2 -1 = 1 ,升到第一幂的表达式就是该表达式)。

After the second step we have:

第二步之后,我们有:

Objective function after applying power rule formula.
Applying the power rule to the objective function (image by Author).
将幂规则应用于目标函数(作者提供的图像)。

We still need to calculate a derivative of an “inner function” (right side of the formula). Let’s move to the third step.

我们仍然需要计算“内部函数”的导数(公式的右侧)。 让我们转到第三步。

步骤3.常数的导数 (Step 3. The derivative of a constant)

The last rule is the simplest one. It is used to determine a derivative of a constant:

最后一条规则是最简单的规则。 用于确定常数的导数:

Derivative of a constant formula.
A derivative of a constant (image by Author).
常数的派生(作者提供的图片)。

As a constant means, no changes, derivative of a constant is equal to zero [1]. For example f’(4) = 0.

作为常数,没有变化,常数的导数等于零[1]。 例如f'(4)= 0

Having all three rules in mind let’s break the “inner function” down:

考虑到所有三个规则,让我们分解一下“内部功能”

Inner function derivative formula.
Inner function derivative (image by Author).
内部函数导数(作者提供的图像)。

The tricky part of our Gradient Descent objective function is that x is not a variable. x and y are constants that come from data set points. As we look for optimal parameters of our line, Θ₀ and Θ₁ are variables. That’s why we calculate two derivatives, one with respect to Θ₀ and one with respect to Θ₁.

梯度下降目标函数的棘手部分是x不是变量。 xy是来自数据设置点的常数。 当我们寻找线的最佳参数时, Θ₀Θ₁是变量。 这就是为什么我们计算两个导数,一个关于Θ₀ ,一个关于Θ₁。

Let’s start by calculating the derivative with respect to Θ₀. It means that Θ₁ will be treated as a constant.

让我们开始计算关于Θ₀的导数。 这意味着Θ₁将被视为常数。

Inner function derivative with respect to theta 0.
Inner function derivative with respect to Θ₀ (image by Author).
关于Θ₀的内部函数导数(作者提供的图像)。

You can see that constant parts were set to zero. What happened to Θ₀? As it’s a variable raised to the first power (a¹=a), we applied the power rule. It resulted in Θ₀ raised to the power of zero. When we raise a number to the power of zero, it’s equal to 1 (a⁰=1). And that’s it! Our derivative with respect to Θ₀ is equal to 1.

您会看到常量部分设置为零。 Θ₀怎么了? 由于它是一个提高到第一幂( a¹= a )的变量,因此我们应用了幂规则。 结果导致Θ₀提高到零的幂。 当我们将数字提高到零的幂时,它等于1( a⁰= 1 )。 就是这样! 关于Θ₀的导数等于1。

Finally, we have the whole derivative with respect to Θ₀:

最后,我们有了关于Θ₀的整个导数

Objective function derivative with respect to theta 0.
Objective function derivative with respect to Θ₀ (image by Author).
关于Θ₀的目标函数导数(作者提供的图像)。

Now it’s time to calculate a derivative with respect to Θ₁. It means that we treat Θ₀ as a constant.

现在是时候来计算相对于Θ₁衍生物。 这意味着我们将Θ₀视为常数。

Inner function derivative with respect to theta 1.
Θ₁θ₁的内函数导数

By analogy to the previous example, Θ₁ was treated as a variable raised to the first power. Then we applied a power rule which reduced Θ₁ to 1. However Θ₁ is multiplied by x, so we end up with derivative equal to x.

与前面的示例类似,将θ₁视为提高到第一幂的变量。 然后我们应用了一个幂规则,将Θ₁减小到1。但是Θ乘以x ,因此最终得到的导数等于x。

The final form of the derivative with respect to Θ₁ looks like this:

关于Θ₁的导数的最终形式如下:

Objective function derivative with respect to theta 1.
Objective function derivative with respect to Θ₁ (image by Author).
关于Θ₁的目标函数导数(作者提供的图像)。

完整的梯度下降配方 (Complete Gradient Descent recipe)

We calculated the derivatives needed by the Gradient Descent algorithm! Let’s put them where they belong:

我们计算了梯度下降算法所需的导数! 让我们将它们放在它们所属的位置:

Gradient descent formula with derivatives calculated in previous steps.
Gradient descent formula including objective function’s derivatives (image by Author).
梯度下降公式,包括目标函数的导数(作者提供的图像)。

By doing this exercise we get a deeper understanding of formula origins. We don’t take it as a magic incantation we found in the old book, but instead, we actively go through the process of analyzing it. We break down the method to smaller pieces and we realize that we can finish calculations by ourselves and put it all together.

通过执行此练习,我们对公式的起源有了更深入的了解。 我们不把它当作在旧书中发现的魔咒,而是积极地进行了分析。 我们将该方法分解为较小的部分,我们意识到我们可以自己完成计算并将其组合在一起。

From time to time grab a pen and paper and solve a problem. You can find an equation or method you already successfully use and try to gain this deeper insight by decomposing it. It will give you a lot of satisfaction and spark your creativity.

时不时地拿笔和纸解决问题。 您可以找到已经成功使用的方程式或方法,并尝试通过分解来获得更深入的了解。 它将给您带来极大的满足感并激发您的创造力。

参考书目: (Bibliography:)

  1. K.A Stroud, Dexter J. Booth, Engineering Mathematics, ISBN: 978–0831133276.

    KA Stroud,Dexter J. Booth, 工程数学 ,ISBN:978–0831133276。

  2. Joel Grus, Data Science from Scratch, 2nd Edition, ISBN: 978–1492041139

    Joel Grus, Scratch的数据科学,第二版 ,ISBN:978–1492041139

  3. Josh Patterson, Adam Gibson, Deep Learning, ISBN: 978–1491914250

    Josh Patterson,Adam Gibson, 深度学习 ,ISBN:978–1491914250

翻译自: https://towardsdatascience.com/how-to-differentiate-gradient-descent-objective-function-in-3-simple-steps-b9d58567d387

梯度下降法优化目标函数

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388076.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

FFmpeg 是如何实现多态的?

2019独角兽企业重金招聘Python工程师标准>>> 前言 众所周知,FFmpeg 在解码的时候,无论输入文件是 MP4 文件还是 FLV 文件,或者其它文件格式,都能正确解封装、解码,而代码不需要针对不同的格式做出任何改变&…

基于easyui开发Web版Activiti流程定制器详解(五)——Draw2d详解(一)

背景: 小弟工作已有十年有余,期间接触了不少工作流产品,个人比较喜欢的还是JBPM,因为出自名门Jboss所以备受推崇,但是现在JBPM版本已经与自己当年使用的版本(3.X)大相径…

seaborn 子图_Seaborn FacetGrid:进一步完善子图

seaborn 子图Data visualizations are essential in data analysis. The famous saying “one picture is worth a thousand words” holds true in the scope of data visualizations as well. In this post, I will explain a well-structured, very informative collection …

基于easyui开发Web版Activiti流程定制器详解(六)——Draw2d的扩展(一)

题外话: 最近在忙公司的云项目空闲时间不是很多,所以很久没来更新,今天补上一篇! 回顾: 前几篇介绍了一下设计器的界面和Draw2d基础知识,这篇讲解一下本设计器如何扩展Draw2d。 进…

深度学习网络总结

1.Siamese network Siamese [saiə mi:z] 孪生 左图的孪生网络是指两个网络通过共享权值实现对输入的输出,右图的伪孪生网络则不共享权值(pseudo-siamese network)。 孪生神经网络是用来衡量两个输入的相似度,可以用来人脸验证、语义相似度分析、QA匹配…

异常检测时间序列_时间序列的无监督异常检测

异常检测时间序列To understand the normal behaviour of any flow on time axis and detect anomaly situations is one of the prominent fields in data driven studies. These studies are mostly conducted in unsupervised manner, since labelling the data in real lif…

python设计模式(七):组合模式

组合,将对象组合成树状结构,来表示业务逻辑上的[部分-整体]层次,这种组合使单个对象和组合对象的使用方法一样。 如描述一家公司的层次结构,那么我们用办公室来表示节点,则总经理办公司是根节点,下面分别由…

存款惊人_如何使您的图快速美丽惊人

存款惊人So, you just finished retrieving, processing, and analyzing your data. You grab your data and you decide to graph it so you can show others your findings. You click ‘graph’ and……因此,您刚刚完成了数据的检索,处理和分析。 您获…

pytest自动化6:pytest.mark.parametrize装饰器--测试用例参数化

前言:pytest.mark.parametrize装饰器可以实现测试用例参数化。 parametrizing 1. 下面是一个简单是实例,检查一定的输入和期望输出测试功能的典型例子 2. 标记单个测试实例为失败,例如使用内置的mark.xfail,则跳过该用例不执行直…

基于easyui开发Web版Activiti流程定制器详解(六)——Draw2d详解(二)

上一篇我们介绍了Draw2d整体结构,展示了组件类关系图,其中比较重要的类有Node、Canvas、Command、Port、Connection等,这篇将进一步介绍Draw2d如何使用以及如何扩展。 进入主题: 详细介绍一下Draw2d中几个…

Ubuntu16.04 开启多个终端,一个终端多个小窗口

Ubuntu16.04 开启多个终端,一个终端多个小窗口 CtrlShift T,一个终端开启多个小终端 CtrlAlt T 开启多个终端 posted on 2019-03-15 11:26 _孤城 阅读(...) 评论(...) 编辑 收藏 转载于:https://www.cnblogs.com/liuweijie/p/10535904.html

敏捷 橄榄球运动_澳大利亚橄榄球迷的研究声称南非裁判的偏见被证明是错误的

敏捷 橄榄球运动In February 2020, an Australian rugby fan produced a study, claiming to show how South African rugby referees were exhibiting favorable bias towards South African home teams. The study did not consider how other countries’ referees treat So…

activiti 部署流程图后中文乱码

Activiti工作流引擎使用 1.简单介工作流引擎与Activiti 对于工作流引擎的解释请参考百度百科:工作流引擎 1.1 我与工作流引擎 在第一家公司工作的时候主要任务就是开发OA系统,当然基本都是有工作流的支持,不过当时使用的工作流引擎是公司一些…

Luogu 4755 Beautiful Pair

分治 主席树。 设$solve(l, r)$表示当前处理到$[l, r]$区间的情况,我们可以找到$[l, r]$中最大的一个数的位置$mid$,然后扫一半区间计算一下这个区间的答案。 注意,这时候左半边是$[l, mid]$,而右区间是$[mid, r]$,我…

网络传播动力学_通过简单的规则传播动力

网络传播动力学When a single drop of paint is dropped on a surface the amount of space that the drop will cover depends both on time and space. A short amount of time will no be enough for the drop to cover a greater area, and a small surface will bound the…

BPMN2.0-概要

BPMN2.0-概要 作者:AliKevin2011,发布于2012-6-27 一、BPMN简介 BPMN(Business Process Model And Notation)- 业务流程模型和符号 是有BPMI(Business Process Management Initiative)开发的一套变准的业务…

Activiti 简易教程

一搭建环境 1.1 JDK 6 activiti 运行在版本 6以上的 JDK上。转到 Oracle Java SE下载页面,点击按钮“下载 JDK”。网页中也有安装说明。要核实安装是否成功,在命令行上运行 java–version。将打印出安装的 JDK的版本。 1.2 Ant 1.8.1 从 Ant[http://…

koa2 中使用 svg-captcha 生成验证码

1. 安装svg-captcha $ npm install --save svg-captcha 2. 使用方法 生成有4个字符的图片和字符串const svgCaptcha require(svg-captcha)const cap svgCaptcha.create({size: 4, // 验证码长度width:160,height:60,fontSize: 50,ignoreChars: 0oO1ilI, // 验证码字符中排除 …

iris数据集 测试集_IRIS数据集的探索性数据分析

iris数据集 测试集Let’s explore one of the simplest datasets, The IRIS Dataset which basically is a data about three species of a Flower type in form of its sepal length, sepal width, petal length, and petal width. The data set consists of 50 samples from …

Oracle 12c 安装 Linuxx86_64

1)下载Oracle Database 12cRelease 1安装介质 官方的下载地址: 1:http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html 2:https://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number16496…