【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】

2.4.3. 梯度

  我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:RnR的输入是一个 n n n维向量 x ⃗ = [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix} x = x1x2⋅⋅⋅xn ,输出是一个标量。 函数 f ( x ⃗ ) f(\vec x) f(x )相对于 x ⃗ \vec x x 的梯度是一个包含 n n n个偏导数的向量:
∇ x ⃗ f ( x ⃗ ) = [ ∂ f ( x ⃗ ) ∂ x 1 ∂ f ( x ⃗ ) ∂ x 2 ⋅ ⋅ ⋅ ∂ f ( x ⃗ ) ∂ x n ] \nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix} x f(x )= x1f(x )x2f(x )⋅⋅⋅xnf(x )
其中 ∇ x ⃗ f ( x ⃗ ) \nabla_{\vec x} f(\vec x) x f(x )通常在没有歧义时被 ∇ f ( x ⃗ ) \nabla f(\vec x) f(x )取代。


假设 x ⃗ \vec x x n n n维向量,在微分多元函数时经常使用以下规则:

一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} ARm×n,都有 ∇ x ⃗ A x ⃗ = A ⊤ \nabla_{\vec x} A\vec x = A^\top x Ax =A

证明:设 A ( m , n ) A_{(m,n)} A(m,n) = [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a m , 1 a m , 2 ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix} a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n
A x ⃗ ( m , 1 ) A\vec x_{(m,1)} Ax (m,1) = [ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ⋅ ⋅ ⋅ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix} a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn ,
∇ x ⃗ A x ⃗ \nabla_{\vec x}A\vec x x Ax = [ ∂ A x ⃗ ∂ x 1 ∂ A x ⃗ ∂ x 2 ⋅ ⋅ ⋅ ∂ A x ⃗ ∂ x n ] \begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix} x1Ax x2Ax ⋅⋅⋅xnAx
= [ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 1 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 1 ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 2 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x n ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x n ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx2a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn⋅⋅⋅xna1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx1a2,1x1+a2,2x2+⋅⋅⋅+a2,nxnx2a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅xna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1am,1x1+am,2x2+⋅⋅⋅+am,nxnx2am,1x1+am,2x2+⋅⋅⋅+am,nxn⋅⋅⋅xnam,1x1+am,2x2+⋅⋅⋅+am,nxn
= [ a 1 , 1 a 2 , 1 ⋅ ⋅ ⋅ a m , 1 a 1 , 2 a 2 , 2 ⋅ ⋅ ⋅ a m , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a 1 , n a 2 , n ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix} a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n = A ⊤ A^\top A

二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} ARn×m,都有 ∇ x ⃗ x ⃗ ⊤ A = A \nabla_{\vec x} \vec x^\top A = A x x A=A

证明:设 A ( n , m ) A_{(n,m)} A(n,m)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m
x ⃗ ⊤ A \vec x^\top A x A=
[ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,mx1+a2,mx2+⋅⋅⋅+an,mxn],
∇ x ⃗ x ⃗ ⊤ A \nabla_{\vec x}\vec x^\top A x x A= [ ∂ x ⃗ ⊤ A ∂ x 1 ∂ x ⃗ ⊤ A ∂ x 2 ⋅ ⋅ ⋅ ∂ x ⃗ ⊤ A ∂ x n ] \begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix} x1x Ax2x A⋅⋅⋅xnx A
= [ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 1 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 1 ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 2 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x n ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x n ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a2,1x2+⋅⋅⋅+an,1xnx2a1,1x1+a2,1x2+⋅⋅⋅+an,1xn⋅⋅⋅xna1,1x1+a2,1x2+⋅⋅⋅+an,1xnx1a1,2x1+a2,2x2+⋅⋅⋅+an,2xnx2a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1a1,mx1+a2,mx2+⋅⋅⋅+an,mxnx2a1,mx1+a2,mx2+⋅⋅⋅+an,mxn⋅⋅⋅xna1,mx1+a2,mx2+⋅⋅⋅+an,mxn
= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m = A A A

三、对于所有 A ∈ R n × n A \in \mathbb{R^{n\times n}} ARn×n,都有 ∇ x ⃗ x ⃗ ⊤ A x ⃗ = ( A + A ⊤ ) x ⃗ \nabla_{\vec x} \vec x^\top A \vec x = (A+A^\top)\vec x x x Ax =(A+A)x

证明:设 A ( n , n ) A_{(n,n)} A(n,n)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅an,n
x ⃗ ⊤ A \vec x^\top A x A= [ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , n x 1 + a 2 , n x 2 + ⋅ ⋅ ⋅ + a n , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,nx1+a2,nx2+⋅⋅⋅+an,nxn],
x ⃗ ⊤ A x ⃗ \vec x^\top A \vec x x Ax = [ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ] \begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix} [i=1nj=1n(ai,jxixj)],
∇ x ⃗ x ⃗ ⊤ A x ⃗ \nabla_{\vec x}\vec x^\top A \vec x x x Ax = [ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 1 ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 2 ⋅ ⋅ ⋅ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x n ] \begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix} x1i=1nj=1n(ai,jxixj)x2i=1nj=1n(ai,jxixj)⋅⋅⋅xni=1nj=1n(ai,jxixj) = [ ∑ i = 1 n ( a i , 1 + a 1 , i ) x i ∑ i = 1 n ( a i , 2 + a 2 , i ) x i ⋅ ⋅ ⋅ ∑ i = 1 n ( a i , n + a n , i ) x i ] \begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix} i=1n(ai,1+a1,i)xii=1n(ai,2+a2,i)xi⋅⋅⋅i=1n(ai,n+an,i)xi
= [ 2 a 1 , 1 a 1 , 2 + a 2 , 1 ⋅ ⋅ ⋅ a 1 , n + a n , 1 a 2 , 1 + a 1 , 2 2 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n + a n , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 + a 1 , n a n , 2 + a 2 , n ⋅ ⋅ ⋅ 2 a n , n ] [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix} 2a1,1a2,1+a1,2⋅⋅⋅an,1+a1,na1,2+a2,12a2,2⋅⋅⋅an,2+a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,n+an,1a2,n+an,2⋅⋅⋅2an,n x1x2⋅⋅⋅xn = ( A + A ⊤ ) x ⃗ (A+A^\top)\vec x (A+A)x

四、 ∇ x ⃗ ∥ x ∥ 2 = ∇ x ⃗ x ⃗ ⊤ x ⃗ = 2 x ⃗ \nabla_{\vec x} \Vert x \Vert ^2=\nabla_{\vec x}\vec x^\top\vec x = 2\vec x x x2=x x x =2x

证明: ∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= ∇ x ⃗ x ⊤ x \nabla_{\vec x}x^\top x x xx
∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= [ 2 x 1 2 x 2 ⋅ ⋅ ⋅ 2 x n ] \begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix} 2x12x2⋅⋅⋅2xn = 2 x 2x 2x

  同样,对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。

五、对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X

证明:设 X X X m × n m\times n m×n的矩阵, X = [ x 1 , 1 x 1 , 2 ⋅ ⋅ ⋅ x 1 , n x 2 , 1 x 2 , 2 ⋅ ⋅ ⋅ x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ x m , 1 x m , 2 ⋅ ⋅ ⋅ x m , n ] X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix} X= x1,1x2,1⋅⋅⋅xm,1x1,2x2,2⋅⋅⋅xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1,nx2,n⋅⋅⋅xm,n
∥ X ∥ F 2 \Vert X \Vert_F^2 XF2= ∑ i = 1 m ∑ j = 1 n x i , j 2 2 \sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2}^2 i=1mj=1nxi,j2 2= ∑ i = 1 m ∑ j = 1 n x i , j 2 \sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2 i=1mj=1nxi,j2
∇ X ∥ X ∥ F 2 \nabla_X \Vert X \Vert_F^2 XXF2= [ 2 x 1 , 1 2 x 1 , 2 ⋅ ⋅ ⋅ 2 x 1 , n 2 x 2 , 1 2 x 2 , 2 ⋅ ⋅ ⋅ 2 x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 2 x m , 1 2 x m , 2 ⋅ ⋅ ⋅ 2 x m , n ] \begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix} 2x1,12x2,1⋅⋅⋅2xm,12x1,22x2,2⋅⋅⋅2xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅2x1,n2x2,n⋅⋅⋅2xm,n = 2 X 2X 2X

初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/628268.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

MySQL面试题 | 07.精选MySQL面试题

🤍 前端开发工程师(主业)、技术博主(副业)、已过CET6 🍨 阿珊和她的猫_CSDN个人主页 🕠 牛客高级专题作者、在牛客打造高质量专栏《前端面试必备》 🍚 蓝桥云课签约作者、已在蓝桥云…

MyBatis-Plus代码生成器使用

这里写目录标题 第一章、添加依赖第二章、准备CodeGenerator类第三章、右键运行main方法 第一章、添加依赖 mybatis-plus-generator依赖和velocity-engine-core依赖和commons-lang3依赖加到pom文件里面&#xff0c;代码生成器会用到 <dependency><groupId>com.bao…

跟着cherno手搓游戏引擎【6】ImGui和ImGui事件

导入ImGui&#xff1a; 下载链接&#xff1a; GitHub - TheCherno/imgui: Dear ImGui: Bloat-free Immediate Mode Graphical User interface for C with minimal dependencies 新建文件夹&#xff0c;把下载好的文件放入对应路径&#xff1a; SRC下的premake5.lua文件&#…

基于STM32F103C8T6单片机的1秒定时器设计与应用

标题&#xff1a;基于STM32F103C8T6单片机的1秒定时器设计与应用 摘要&#xff1a; 本文主要探讨了如何在STM32F103C8T6微控制器上利用内部定时器实现精确的1秒钟定时功能&#xff0c;并通过实际项目实施&#xff0c;验证其稳定性和可靠性。首先介绍了STM32F103C8T6单片机的特…

k8s 存储卷和pvc,pv

存储卷---数据卷 容器内的目录和宿主机的目录进行挂载。 容器在系统上的生命周期是短暂的&#xff0c;deletek8s用控制器创建的pod&#xff0c;delete相当于重启&#xff0c;容器的状态也会回复到初始状态。 一旦回到初始状态&#xff0c;所有的后天编辑的文件的都会消失。 …

Nacos:发现微服务的未来

一、为什么要使用Nacos 在今天的数字化世界中&#xff0c;微服务架构已经成为软件开发的主流。这种架构风格将大型复杂软件拆分为一系列小型的、松耦合的服务&#xff0c;每个服务都可以独立地开发、测试、部署和扩展。然而&#xff0c;随着微服务数量的增长&#xff0c;管理…

SpringBoot教程(十六) | SpringBoot集成swagger(全网最全)

SpringBoot教程(十六) | SpringBoot集成swagger&#xff08;全网最全&#xff09; 一. 接口文档概述 swagger是当下比较流行的实时接口文文档生成工具。接口文档是当前前后端分离项目中必不可少的工具&#xff0c;在前后端开发之前&#xff0c;后端要先出接口文档&#xff0c…

202312 青少年软件编程(C/C++)等级考试试卷(三级)电子学会真题

202312 青少年软件编程&#xff08;C/C&#xff09;等级考试试卷&#xff08;三级&#xff09;电子学会真题 1.因子问题 题目描述 任给两个正整数N、M&#xff0c;求一个最小的正整数a&#xff0c;使得a和(M-a)都是N的因子。 输入 包括两个整数N、M。N不超过1,000,000。 …

Mysql-redoLog

Redo Log redo log进行刷盘的效率要远高于数据页刷盘,具体表现如下 redo log体积小,只记录了哪一页修改的内容,因此体积小,刷盘快 redo log是一直往末尾进行追加,属于顺序IO。效率显然比随机IO来的快Redo log 格式 在MySQL的InnoDB存储引擎中,redo log(重做日志)被用…

C++ 输入用户名和密码 防止注入攻击

1、问题解释&#xff1a;注入攻击 &#xff0c;无密码直接登录数据库 可视化展示 1.1、当你的数据库为&#xff1a;其中包含三个字段id user 以及md5密码 1.2、在使用C堆数据库信息进行访问的时候&#xff0c;使用多条语句进行查询 string sql "select id from t_user…

蓝桥杯基础知识5 unique()

蓝桥杯基础知识5 unique&#xff08;&#xff09; #include <bits/stdc.h>int main(){std::vector<int> vec {1,1,2,2,3,3,3,4,4,5};auto it std::unique(vec.begin(), vec.end());vec.erase(it, vec.end());//vec.erase(unique(vec.begin(),vec.end()),vec.end(…

力扣82-删除排序链表中的重复元素

删除排序链表中的重复元素 题目链接 解题思路 1.遇见相同的元素直接删除即可 2.链表的头部也可能是重复元素&#xff0c;所以需要一个哑节点res来指向链表的头节点 /*** Definition for singly-linked list.* struct ListNode {* int val;* ListNode *next;* List…

机器学习之集成学习概念介绍

概念 机器学习中的集成学习(Ensemble Learning)是一种通过组合多个模型来提高整体性能的技术。它的基本思想是将多个学习器(弱学习器)组合成一个更强大的学习器,以提高整体性能和泛化能力。集成学习可以在各种机器学习任务中使用,包括分类、回归和聚类。 核心 弱学习器…

Spring自带分布式锁你用过吗?

环境&#xff1a;SpringBoot2.7.12 本篇文章将会为大家介绍有关spring integration提供的分布式锁功能。 1. 简介 Spring Integration 是一个框架&#xff0c;用于构建事件驱动的应用程序。在 Spring Integration 中&#xff0c;LockRegistry 是一个接口&#xff0c;用于管理…

使用Postman测试WebService接口

文章目录 使用Postman测试WebService接口1. 访问wsdl地址2. Postman配置1. URL及Headers设置2. Body设置3. 响应结果 使用Postman测试WebService接口 1. 访问wsdl地址 接口地址如&#xff1a;http://localhost:8101/ws/hello?wsdl 2. Postman配置 1. URL及Headers设置 2. B…

跟着小德学C++之数据库基础

嗨&#xff0c;大家好&#xff0c;我是出生在达纳苏斯的一名德鲁伊&#xff0c;我是要立志成为海贼王&#xff0c;啊不&#xff0c;是立志成为科学家的德鲁伊。最近&#xff0c;我发现我们所处的世界是一个虚拟的世界&#xff0c;并由此开始&#xff0c;我展开了对我们这个世界…

【揭秘】sleep()、wait()、park()三种休眠方式的终极对比

在Java中&#xff0c;线程休眠的三种方式包括Thread.sleep、Object.wait和LockSupport.park。Thread.sleep使线程在指定时间后进入休眠&#xff0c;状态为TIMEDWAITING&#xff0c;不会释放锁。Object.wait需在对象锁的保护下调用&#xff0c;会释放该对象的锁&#xff0c;使线…

Java SE入门及基础(13)

流程控制 1. break关键字 应用场景 break只能应用于 while 循环、 do-while 循环、 for 循环和 switch 选择结构 作用 break 应用在循环结构中时&#xff0c;表示终止 break 所在的循环&#xff0c;执行循环结构下面的代码&#xff0c;通常与 if 选择结构配合使用 …

CCS自适应巡航简介-驾驶辅助最初版本

在汽车行业这么多年&#xff0c;接触汽车也有些年头了&#xff0c;最开始接触的驾驶辅助功能就是ACC。 在这里说一说CC的来龙去脉吧~ CCS全称为CRUISE CONTROL SYSTEM&#xff0c;取的是英文首字母。以后看到CCS就要认得这是个啥&#xff1b; 到底有用处呢&#xff0c;如果你…

【ASP.NET Core 基础知识】--中间件--创建自定义中间件

一、为什么需要自定义中间件 自定义中间件在ASP.NET Core中的应用主要有以下几个原因&#xff1a; 满足特定需求&#xff1a; 默认情况下&#xff0c;ASP.NET Core提供了许多内置的中间件来处理常见的任务&#xff0c;如身份验证、授权、静态文件服务等。然而&#xff0c;某些…