计算机应用数学--第三次作业

  • 第三次作业
    • 计算题
    • 编程题
    • 1 基于降维的机器学习
    • 2 深度学习训练方法总结

第三次作业

计算题

  1. (15 分)对于给定矩阵 A A A(规模为 4×2),求 A A A 的 SVD(奇异值分解),即求 U U U Σ Σ Σ V T V^T VT, 使得 A = U Σ V T A = UΣV^T A=UΣVT。其中 U T U = I U^TU = I UTU=I V T V = I V^TV = I VTV=I,给出求解过程。

A = [ 1 0 0 1 2 1 − 1 0 ] A= \begin{bmatrix} 1 & 0 \\ 0 &1 \\ 2 & 1 \\ -1 & 0 \end{bmatrix} A= 10210110

【解】

首先计算 A T A A^TA ATA
A T A = [ 6 2 2 2 ] A^TA= \begin{bmatrix} 6&2\\ 2&2\\ \end{bmatrix} ATA=[6222]
其特征值和单位特征向量分别为:
[ 2 ( 2 + 2 ) , 2 ( 2 − 2 ) ] \begin{bmatrix} 2 (2 + \sqrt{2}), 2 (2 - \sqrt{2}) \end{bmatrix} [2(2+2 ),2(22 )]

V = [ 1 + 2 1 + ( 1 + 2 ) 2 1 − 2 1 + ( 1 − 2 ) 2 1 1 + ( 1 + 2 ) 2 1 1 + ( 1 − 2 ) 2 ] V = \begin{bmatrix} \frac{1 + \sqrt{2}}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1 - \sqrt{2}}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \frac{1}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \end{bmatrix} V= 1+(1+2 )2 1+2 1+(1+2 )2 11+(12 )2 12 1+(12 )2 1

奇异值 σ 1 = 2 ( 2 + 2 ) \sigma_1 = \sqrt{2 (2 + \sqrt{2})} σ1=2(2+2 ) σ 2 = 2 ( 2 − 2 ) \sigma_2 = \sqrt{2 (2 - \sqrt{2})} σ2=2(22 ) ,因此:
Σ = [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] Σ = \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix} Σ= 2(2+2 ) 00002(22 ) 00
U = ( u 1 , u 2 , u 3 , u 4 ) U = (u_1,u_2,u_3,u_4) U=(u1,u2,u3,u4),根据 A V = U Σ AV=U\Sigma AV=UΣ
U Σ = ( u 1 , u 2 , u 3 , u 4 ) [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] = ( 2 ( 2 + 2 ) u 1 , 2 ( 2 − 2 ) u 2 ) \begin{aligned} U\Sigma &= (u_1,u_2,u_3,u_4) \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix}\\ &=(\sqrt{2 (2 + \sqrt{2})}u_1,\sqrt{2 (2 - \sqrt{2})}u_2) \end{aligned} UΣ=(u1,u2,u3,u4) 2(2+2 ) 00002(22 ) 00 =(2(2+2 ) u1,2(22 ) u2)

A V = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 4 − 2 2 1 2 ( 2 + 2 ) 1 4 − 2 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 4 − 2 2 − 1 + 2 2 ( 2 + 2 ) − 1 − 2 2 ( 2 − 2 ) ] AV = \begin{bmatrix}\frac{1 + \sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & \frac{1 - \sqrt{2}}{\sqrt{4-2\sqrt{2}}}\\\frac{1}{\sqrt{2(2+\sqrt{2})}} & \frac{1}{\sqrt{4-2\sqrt{2}}}\\\frac{3 + 2\sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & \frac{3 - 2\sqrt{2}}{\sqrt{4-2\sqrt{2}}}\\-\frac{1 + \sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & -\frac{1 - \sqrt{2}}{\sqrt{2(2-\sqrt{2})}}\\\end{bmatrix} AV= 2(2+2 ) 1+2 2(2+2 ) 12(2+2 ) 3+22 2(2+2 ) 1+2 422 12 422 1422 322 2(22 ) 12

因此可得:
u 1 = 1 2 ( 2 + 2 ) [ 1 + 2 1 3 + 2 2 − 1 − 2 ] , u 2 = 1 2 ( 2 − 2 ) [ 1 − 2 1 3 − 2 2 − 1 + 2 ] u_1=\frac{1}{2(2+\sqrt{2})} \begin{bmatrix} 1+\sqrt{2}\\ 1\\ 3+2\sqrt{2}\\ -1-\sqrt{2} \end{bmatrix}, u_2=\frac{1}{2(2-\sqrt{2})} \begin{bmatrix} 1-\sqrt{2}\\ 1\\ 3-2\sqrt{2}\\ -1+\sqrt{2} \end{bmatrix} u1=2(2+2 )1 1+2 13+22 12 ,u2=2(22 )1 12 1322 1+2
扩展为标准正交基得:
U = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 2 ( 2 − 2 ) 1 2 − 1 2 1 2 ( 2 + 2 ) 1 2 ( 2 − 2 ) 0 − 1 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 ( 2 − 2 ) 0 1 2 − 1 − 2 2 ( 2 + 2 ) 2 − 1 2 ( 2 − 2 ) 1 2 1 2 ] U = \begin{bmatrix}\frac{1 + \sqrt{2}}{2(2+\sqrt{2})} & \frac{1 - \sqrt{2}}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & -\frac{1}{2}\\\frac{1}{2(2+\sqrt{2})} & \frac{1}{2(2-\sqrt{2})} & 0 & -\frac{1}{2}\\\frac{3 + 2\sqrt{2}}{2(2+\sqrt{2})} & \frac{3 - \sqrt{2}}{2(2-\sqrt{2})} & 0 & \frac{1}{2}\\\frac{-1 - \sqrt{2}}{2(2+\sqrt{2})} & \frac{\sqrt{2} - 1}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & \frac{1}{2}\\\end{bmatrix} U= 2(2+2 )1+2 2(2+2 )12(2+2 )3+22 2(2+2 )12 2(22 )12 2(22 )12(22 )32 2(22 )2 12 1002 121212121
最终,
A = U Σ V T = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 2 ( 2 − 2 ) 1 2 − 1 2 1 2 ( 2 + 2 ) 1 2 ( 2 − 2 ) 0 − 1 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 ( 2 − 2 ) 0 1 2 − 1 − 2 2 ( 2 + 2 ) 2 − 1 2 ( 2 − 2 ) 1 2 1 2 ] [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] [ 1 + 2 1 + ( 1 + 2 ) 2 1 1 + ( 1 + 2 ) 2 1 − 2 1 + ( 1 − 2 ) 2 1 1 + ( 1 − 2 ) 2 ] A=U\Sigma V^T=\begin{bmatrix}\frac{1 + \sqrt{2}}{2(2+\sqrt{2})} & \frac{1 - \sqrt{2}}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & -\frac{1}{2}\\\frac{1}{2(2+\sqrt{2})} & \frac{1}{2(2-\sqrt{2})} & 0 & -\frac{1}{2}\\\frac{3 + 2\sqrt{2}}{2(2+\sqrt{2})} & \frac{3 - \sqrt{2}}{2(2-\sqrt{2})} & 0 & \frac{1}{2}\\\frac{-1 - \sqrt{2}}{2(2+\sqrt{2})} & \frac{\sqrt{2} - 1}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & \frac{1}{2}\\\end{bmatrix} \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix} \begin{bmatrix} \frac{1 + \sqrt{2}}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 + \sqrt{2})^2}}\\ \frac{1 - \sqrt{2}}{\sqrt{1 + (1 - \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \end{bmatrix} A=UΣVT= 2(2+2 )1+2 2(2+2 )12(2+2 )3+22 2(2+2 )12 2(22 )12 2(22 )12(22 )32 2(22 )2 12 1002 121212121 2(2+2 ) 00002(22 ) 00 1+(1+2 )2 1+2 1+(12 )2 12 1+(1+2 )2 11+(12 )2 1

  1. (15 分)现有如下数据(5 个样本,4 个维度 ( A , B , C , D ) (A,B,C,D) (A,B,C,D)),⽤ P C A PCA PCA 将数据降到 2 维, 给出求解过程。
ABCD
1531
4-466
1432
4422
5524

【解】

X = [ 1 4 1 4 5 5 − 4 4 4 5 3 6 3 2 2 1 6 2 2 4 ] X = \begin{bmatrix} 1 & 4 & 1 & 4 & 5\\ 5 & -4 & 4 & 4 & 5\\ 3 & 6 & 3 & 2 & 2\\ 1 & 6 & 2 & 2 & 4 \end{bmatrix} X= 15314466143244225524
零均值化后的 X X X
[ − 2 1 − 2 1 2 2.2 − 6.8 1.2 1.2 2.2 − 0.2 2.8 − 0.2 − 1.2 − 1.2 − 2 3 − 1 − 1 1 ] \begin{bmatrix} -2 & 1 & -2 & 1 & 2 \\ 2.2 & -6.8 & 1.2 & 1.2 & 2.2\\ -0.2 & 2.8 & -0.2 & -1.2 & -1.2\\ -2 & 3 & -1 & -1 & 1 \end{bmatrix} 22.20.2216.82.8321.20.2111.21.2122.21.21
协方差矩阵 C = 1 5 X X T C=\frac{1}{5}XX^T C=51XXT
[ 2.8 − 1.6 0 2 − 1.6 11.76 − 4.76 − 5 0 − 4.76 2.16 1.8 2 − 5 1.8 3.2 ] \begin{bmatrix} 2.8 & -1.6 & 0 & 2 \\ -1.6 & 11.76 & -4.76 & -5 \\ 0 & -4.76 & 2.16 & 1.8 \\ 2 & -5 & 1.8 & 3.2 \end{bmatrix} 2.81.6021.611.764.76504.762.161.8251.83.2
其特征值及对应的特征向量(按行排列)为:
[ 16.27660036 3.30704892 0.30518298 0.03116774 ] \begin{bmatrix} 16.27660036 & 3.30704892 & 0.30518298 & 0.03116774 \end{bmatrix} [16.276600363.307048920.305182980.03116774]

[ 0.1582177 − 0.84232941 0.33404265 0.3922548 − 0.84205227 − 0.21992632 0.30154762 − 0.3894219 − 0.37915865 0.40046886 0.25782506 0.79334081 0.34950516 0.28589904 0.85499168 − 0.25514134 ] \begin{bmatrix} 0.1582177 &-0.84232941 & 0.33404265 & 0.3922548 \\-0.84205227 &-0.21992632 & 0.30154762 & -0.3894219 \\-0.37915865 & 0.40046886 & 0.25782506 & 0.79334081\\ 0.34950516 & 0.28589904 & 0.85499168 & -0.25514134\\ \end{bmatrix} 0.15821770.842052270.379158650.349505160.842329410.219926320.400468860.285899040.334042650.301547620.257825060.854991680.39225480.38942190.793340810.25514134

取前两行,即得到降维矩阵 P P P
P = [ 0.1582177 − 0.84232941 0.33404265 0.3922548 − 0.84205227 − 0.21992632 0.30154762 − 0.3894219 ] P=\begin{bmatrix} 0.1582177 & -0.84232941 & 0.33404265 & 0.3922548\\-0.84205227 & -0.21992632 & 0.30154762 & -0.3894219 \end{bmatrix} P=[0.15821770.842052270.842329410.219926320.334042650.301547620.39225480.3894219]
Y = P X Y=PX Y=PX 即为降维到 k k k 维后的数据(行代表属性、列代表样本):
[ − 3.02087823 7.99814153 − 1.78629402 − 1.64568358 − 1.5452857 1.91880091 0.32951437 1.74930533 − 1.0783991 − 2.9192215 ] \begin{bmatrix} -3.02087823 & 7.99814153 & -1.78629402& -1.64568358 & -1.5452857\\ 1.91880091 & 0.32951437 & 1.74930533 & -1.0783991 &-2.9192215\end{bmatrix} [3.020878231.918800917.998141530.329514371.786294021.749305331.645683581.07839911.54528572.9192215]

  1. (15 分)假设有六个样本在⼆维空间中:

    正样本:(1,1),(2,1),(2,3)

    负样本:(-1,-1),(0,-3),(-2,-4)

    使⽤⽀持向量机求划分平⾯,给出⽀持向量模型及求解过程。

【解】

根据数据, y 1 = y 2 = y 3 = 1 , y 4 = y 5 = y 6 = − 1 y_1=y_2=y_3=1,y_4=y_5=y_6=-1 y1=y2=y3=1,y4=y5=y6=1

w = ( w 1 , w 2 ) T w = (w_1,w_2)^T w=(w1,w2)T,则问题可以表示为以下约束最优化问题:
minimize w , b 1 2 ( w 1 2 + w 2 2 ) subject to { w 1 + w 2 + b ≥ 1 2 w 1 + w 2 + b ≥ 1 2 w 1 + 3 w 2 + b ≥ 1 − ( − w 1 − w 2 + b ) ≥ 1 − ( 0 w 1 − 3 w 2 + b ) ≥ 1 − ( − 2 w 1 − 4 w 2 + b ) ≥ 1 \begin{aligned} & \underset{w, b}{\text{minimize}} && \frac{1}{2}(w_1^2 + w_2^2) \\ & \text{subject to} && \begin{cases} w_1 + w_2 + b \geq 1 \\ 2w_1 + w_2 + b \geq 1 \\ 2w_1 + 3w_2 + b \geq 1 \\ -(-w_1 - w_2 + b) \geq 1 \\ -(0w_1 - 3w_2 + b) \geq 1 \\ -(-2w_1 - 4w_2 + b) \geq 1 \\ \end{cases} \end{aligned} w,bminimizesubject to21(w12+w22) w1+w2+b12w1+w2+b12w1+3w2+b1(w1w2+b)1(0w13w2+b)1(2w14w2+b)1
解得:
{ w 1 = 1 2 w 2 = 1 2 b = 0 \begin{cases} w_1 = \frac{1}{2} \\ w_2 = \frac{1}{2} \\ b=0 \end{cases} w1=21w2=21b=0
则最大间隔分离超平面为:
1 2 x ( 1 ) + 1 2 x ( 2 ) = 0 \frac{1}{2}x^{(1)}+\frac{1}{2}x^{(2)}=0 21x(1)+21x(2)=0
其中 x 1 = ( 1 , 1 ) T x_1=(1,1)^T x1=(1,1)T x 2 = ( − 1 , − 1 ) T x_2=(-1,-1)^T x2=(1,1)T 为支持向量。

  1. (15 分)给定⼀个 4 ∗ 3 ∗ 2 4*3*2 432 的神经⽹络,权重矩阵为:

W 1 = [ 0.10 0.40 0.35 0.15 0.20 0.25 0.05 0.35 0.40 0.20 0.25 0.15 ] W_1= \begin{bmatrix} 0.10 & 0.40 & 0.35 \\ 0.15 & 0.20 & 0.25 \\ 0.05 & 0.35 & 0.40 \\ 0.20 & 0.25 & 0.15 \end{bmatrix} W1= 0.100.150.050.200.400.200.350.250.350.250.400.15

b 1 = [ 0.15 , 0.10 , 0.25 ] b_1=\begin{bmatrix}0.15 , 0.10 , 0.25 \end{bmatrix} b1=[0.15,0.10,0.25]

W 2 = [ 0.20 0.40 0.35 0.15 0.30 0.50 ] W_2= \begin{bmatrix} 0.20 & 0.40 \\ 0.35 & 0.15 \\ 0.30 & 0.50 \end{bmatrix} W2= 0.200.350.300.400.150.50

b 2 = [ 0.30 , 0.20 ] b_2 = \begin{bmatrix} 0.30, 0.20 \end{bmatrix} b2=[0.30,0.20]

激活函数为 sigmoid 函数。给定⼀个训练样本 x = [ 0.80 , 0.55 , 0.20 , 0.10 ] x=[0.80,0.55,0.20,0.10] x=[0.80,0.55,0.20,0.10] y = [ 1 , 0 ] y=[1,0] y=[1,0],假设学习率为 0.01,损失函数为⼆元交叉熵损失,计算 w 10 , w 13 w_{10},w_{13} w10,w13 参数的⼀次更新结果。

image-20240527195411878

【解】

(1)前向传播

a. 输入层到隐层

根据权重矩阵 W 1 W_1 W1 和偏置 b 1 b_1 b1 ,隐层的输入为:
a 1 = W 1 T x 1 + b 1 = [ 0.10 0.15 0.05 0.20 0.40 0.20 0.35 0.25 0.35 0.25 0.40 0.15 ] [ 0.80 0.55 0.20 0.10 ] + [ 0.15 0.10 0.25 ] = [ 0.3425 0.625 0.7625 ] a_1 = W_1^Tx_1+b_1=\begin{bmatrix} 0.10 & 0.15 & 0.05 & 0.20 \\ 0.40 & 0.20 & 0.35 & 0.25 \\ 0.35 & 0.25 & 0.40 & 0.15 \end{bmatrix}\begin{bmatrix}0.80 \\ 0.55 \\ 0.20 \\ 0.10 \end{bmatrix} + \begin{bmatrix}0.15 \\ 0.10 \\ 0.25 \end{bmatrix} = \begin{bmatrix}0.3425 \\ 0.625 \\ 0.7625 \end{bmatrix} a1=W1Tx1+b1= 0.100.400.350.150.200.250.050.350.400.200.250.15 0.800.550.200.10 + 0.150.100.25 = 0.34250.6250.7625
在隐层经过 sigmoid 函数 σ ( a ) = 1 1 + e − a \sigma(a)=\frac{1}{1+e^{-a}} σ(a)=1+ea1 激活后:
h 1 = σ ( a 1 ) = [ σ ( 0.3425 ) σ ( 0.625 ) σ ( 0.7625 ) ] ≈ [ 0.5848 0.6514 0.6819 ] h_1=\sigma(a_1)=\begin{bmatrix} \sigma(0.3425) \\ \sigma(0.625) \\ \sigma(0.7625) \end{bmatrix} \approx \begin{bmatrix} 0.5848 \\ 0.6514 \\ 0.6819 \end{bmatrix} h1=σ(a1)= σ(0.3425)σ(0.625)σ(0.7625) 0.58480.65140.6819
b. 隐藏层到输出层

同理,输出层的输入为:
a 2 = W 2 T h 1 + b 2 = [ 0.20 0.35 0.30 0.40 0.15 0.50 ] [ 0.5848 0.6514 0.6819 ] + [ 0.30 0.20 ] = [ 0.8495 0.8726 ] a_2 = W_2^Th_1+b_2=\begin{bmatrix} 0.20 & 0.35 & 0.30 \\ 0.40 & 0.15 & 0.50 \end{bmatrix}\begin{bmatrix}0.5848 \\ 0.6514 \\ 0.6819 \end{bmatrix} + \begin{bmatrix}0.30 \\ 0.20 \end{bmatrix} = \begin{bmatrix} 0.8495 \\ 0.8726 \end{bmatrix} a2=W2Th1+b2=[0.200.400.350.150.300.50] 0.58480.65140.6819 +[0.300.20]=[0.84950.8726]
输出层经过 sigmoid 激活函数后:
h 2 = σ ( a 2 ) = [ σ ( 0.8495 ) σ ( 0.8726 ) ] ≈ [ 0.7005 0.7053 ] h_2=\sigma(a_2)=\begin{bmatrix} \sigma(0.8495) \\ \sigma(0.8726) \end{bmatrix} \approx \begin{bmatrix} 0.7005 \\ 0.7053 \end{bmatrix} h2=σ(a2)=[σ(0.8495)σ(0.8726)][0.70050.7053]
(2)计算损失

使用二元交叉熵损失,
L ( y , h 2 ) = − 1 N ∑ i = 1 N [ y i ⋅ l o g ⁡ ( h 2 i ) + ( 1 − y i ) ⋅ l o g ⁡ ( 1 − h 2 i ) ] = − 1 2 [ y 1 ⋅ l o g ( h 21 ) + ( 1 − y 1 ) ⋅ l o g ( 1 − h 21 ) + y 2 ⋅ l o g ( h 22 ) + ( 1 − y 2 ) ⋅ l o g ( 1 − h 22 ) ] = − 1 2 [ ( 1 ⋅ l o g ( 0.7005 ) + ( 1 − 1 ) ⋅ l o g ( 1 − 0.7005 ) + 0 ⋅ l o g ( 0.7053 ) + ( 1 − 0 ) ⋅ l o g ( 1 − 0.7053 ) ) ] ≈ 0.7889 \begin{aligned} L(y,h_2) &=−\frac{1}{N}\sum_{i=1}^N[y_i⋅log⁡( h_{2i})+(1−y_i)⋅log⁡(1− h_{2i})] \\ &= −\frac{1}{2}[y_1 \cdot log( h_{21})+(1−y_1)\cdot log(1− h_{21})+y_2\cdot log( h_{22})+(1−y_2)\cdot log(1− h_{22})] \\ &= −\frac{1}{2}[(1\cdot log(0.7005)+(1−1) \cdot log(1−0.7005)+0\cdot log(0.7053)+(1−0) \cdot log(1−0.7053))] \\ &\approx 0.7889 \end{aligned} L(y,h2)=N1i=1N[yilog(h2i)+(1yi)log(1h2i)]=21[y1log(h21)+(1y1)log(1h21)+y2log(h22)+(1y2)log(1h22)]=21[(1log(0.7005)+(11)log(10.7005)+0log(0.7053)+(10)log(10.7053))]0.7889
其中, y i y_i yi 是第 i i i 个样本的真实值, h 2 i h_{2i} h2i 是模型对第 i i i 个样本的预测值。

(3)反向传播

w 13 w_{13} w13
∂ L ∂ w 13 = ∂ L ∂ h 21 ⋅ ∂ h 21 ∂ a 21 ⋅ ∂ a 21 ∂ w 13 \frac{\partial L}{\partial w_{13}} = \frac{\partial L}{\partial h_{21}} \cdot \frac{\partial h_{21}}{\partial a_{21}} \cdot \frac{\partial a_{21}}{\partial w_{13}} w13L=h21La21h21w13a21
对于 y 1 = 1 y_1=1 y1=1
∂ L ∂ h 21 = − ( y 1 h 21 − 1 − y 1 1 − h 21 ) = − 1 h 21 ≈ − 1.4276 \frac{\partial L}{\partial h_{21}}=-(\frac{y_1}{ h_{21}}-\frac{1-y_1}{1- h_{21}})=-\frac{1}{ h_{21}}\approx -1.4276 h21L=(h21y11h211y1)=h2111.4276
由于 h 21 = σ ( a 21 ) = 1 1 + e − a 21 h_{21} = \sigma(a_{21})=\frac{1}{1+e^{-a_{21}}} h21=σ(a21)=1+ea211
∂ h 21 ∂ a 21 = h 21 ( 1 − h 21 ) = 0.7005 × 0.2995 ≈ 0.2098 \frac{\partial h_{21}}{\partial a_{21}}= h_{21}(1- h_{21})=0.7005 \times 0.2995 \approx 0.2098 a21h21=h21(1h21)=0.7005×0.29950.2098

a 2 = ( w 13 w 15 w 17 ) T ⋅ h 1 + b 2 a_{2}= \begin{pmatrix} w_{13} \\ w_{15} \\ w_{17} \end{pmatrix}^T \cdot h_1 + b_{2} a2= w13w15w17 Th1+b2
因此,
∂ a 21 ∂ w 13 = h 11 = 0.5848 \frac{\partial a_{21}}{\partial w_{13}} = h_{11}=0.5848 w13a21=h11=0.5848
更新 w 13 w_{13} w13 的值为:
w 13 = w 13 − η ⋅ ∂ L ∂ w 13 = 0.2 − 0.01 × ( − 1.4276 ) × 0.2098 × 0.5848 ≈ 0.2018 w_{13} = w_{13}-\eta \cdot \frac{\partial L}{\partial w_{13}} =0.2-0.01 \times (-1.4276)\times 0.2098 \times 0.5848 \approx 0.2018 w13=w13ηw13L=0.20.01×(1.4276)×0.2098×0.58480.2018
同理可得, w 14 ≈ 0.3959 w_{14}\approx 0.3959 w140.3959

w 10 w_{10} w10
∂ L ∂ w 10 = ∂ L ∂ h 11 ⋅ ∂ h 11 ∂ a 11 ⋅ ∂ a 11 ∂ w 10 = ∂ L ∂ h 11 ⋅ [ h 11 ⋅ ( 1 − h 11 ) ] ⋅ x 4 = ∂ L ∂ h 11 ⋅ 0.2428 ⋅ 0.1 \frac{\partial L}{\partial w_{10}} = \frac{\partial L}{\partial h_{11}} \cdot \frac{\partial h_{11}}{\partial a_{11}} \cdot \frac{\partial a_{11}}{\partial w_{10}} = \frac{\partial L}{\partial h_{11}} \cdot [h_{11} \cdot (1-h_{11})]\cdot x_4 = \frac{\partial L}{\partial h_{11}} \cdot 0.2428 \cdot 0.1 w10L=h11La11h11w10a11=h11L[h11(1h11)]x4=h11L0.24280.1
其中, h 11 h_{11} h11 会接受 a 21 a_{21} a21 a 22 a_{22} a22 两个地方传来的误差,
∂ L ∂ h 11 = ∂ L ∂ a 21 ⋅ ∂ a 21 ∂ h 11 + ∂ L ∂ a 22 ⋅ ∂ a 22 ∂ h 11 = − 1 a 21 ⋅ w 13 + [ ( − 1 a 22 ) ⋅ w 14 ] ≈ − 0.8494 \frac{\partial L}{\partial h_{11}} = \frac{\partial L}{\partial a_{21}} \cdot \frac{\partial a_{21}}{\partial h_{11}} + \frac{\partial L}{\partial a_{22}} \cdot \frac{\partial a_{22}}{\partial h_{11}} = -\frac{1}{a_{21}}\cdot w_{13} + [(-\frac{1}{a_{22}})\cdot w_{14}] \approx - 0.8494 h11L=a21Lh11a21+a22Lh11a22=a211w13+[(a221)w14]0.8494
更新 w 10 w_{10} w10 的值为:
w 10 = w 10 − η ⋅ ∂ L ∂ w 10 = 0.1999 w_{10} = w_{10}-\eta \cdot \frac{\partial L}{\partial w_{10}} = 0.1999 w10=w10ηw10L=0.1999

编程题

1 基于降维的机器学习

数据集:kddcup99_train.csv, kddcup99_test.csv

数据描述:https://www.kdd.org/kdd-cup/view/kdd-cup-1999/Data,⽬的是通过42维的数 据判断某⼀个数据包是否为 attack。(注意,数据中有多种类型的 attack,我们将他们统⼀ 认为是attack,不做具体区分,只区分normal和attack,可以预处理的时候把所有的attack 都统⼀ re-label,具体 attack 列表可参考 https://kdd.org/cupfiles/KDDCupData/1999/training_attack_types)。

任务描述:选择 SVM,结合降维⽅法 PCALDA,实现数据降维+分类。

要求输出:不同⽅法降低到不同维度对判别结果的影响(提升还是下降),Plot 出两种降维 ⽅法降到 3 维之后的结果(Normal 和 Attack 的 sample ⽤不同颜⾊)

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D# 创建列名列表, 数据集有 42 列(包括 41 个特征列和 1 个目标列)
column_names = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes','dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot','num_failed_logins', 'logged_in', 'num_compromised', 'root_shell','su_attempted', 'num_root', 'num_file_creations', 'num_shells','num_access_files', 'num_outbound_cmds', 'is_host_login','is_guest_login', 'count', 'srv_count', 'serror_rate','srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate','diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count','dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate','dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate','dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate','dst_host_srv_rerror_rate', 'label'
]# 加载数据, 跳过错误行
train = pd.read_csv('kddcup99_train.csv', names=column_names, on_bad_lines='skip')
test = pd.read_csv('kddcup99_test.csv', names=column_names, on_bad_lines='skip')# 将训练集和测试集各缩小一百倍
train = train.sample(frac=0.01)
test = test.sample(frac=0.01)print("1.数据加载和预处理完成")# 对文本列进行编码
categorical_columns = ['protocol_type', 'service', 'flag']
for column in categorical_columns:combined_data = pd.concat([train[column], test[column]], axis=0)le = LabelEncoder()le.fit(combined_data)train[column] = le.transform(train[column])test[column] = le.transform(test[column])X_train = train.drop(columns=['label'])
y_train = train['label']
y_train_relabel = y_train.apply(lambda x: 1 if x != 'normal.' else 0)
X_test = test.drop(columns=['label'])
y_test = test['label']
y_test_relabel = y_test.apply(lambda x: 1 if x != 'normal.' else 0)print("2.特征处理完成")# 定义降维函数
def reduce_dimension(reduction_method, X_train, y_train, X_test, n_components):if reduction_method == 'PCA':reducer = PCA(n_components=n_components)elif reduction_method == 'LDA':reducer = LDA(n_components=n_components)X_train_reduced = reducer.fit_transform(X_train, y_train)X_test_reduced = reducer.transform(X_test)return X_train_reduced, X_test_reduced# 定义分类函数
def classify(X_train, y_train, X_test, y_test):clf = SVC()clf.fit(X_train, y_train)y_pred = clf.predict(X_test)accuracy = accuracy_score(y_test, y_pred)return accuracy, y_pred# 降维和分类
results = {'PCA': [], 'LDA': []}for dim in range(3,11):X_train_pca, X_test_pca = reduce_dimension('PCA', X_train, y_train, X_test, dim)accuracy_pca, _ = classify(X_train_pca, y_train_relabel, X_test_pca, y_test_relabel)results['PCA'].append((dim, accuracy_pca))X_train_lda, X_test_lda = reduce_dimension('LDA', X_train, y_train, X_test, dim)accuracy_lda, _ = classify(X_train_lda, y_train_relabel, X_test_lda, y_test_relabel)results['LDA'].append((dim, accuracy_lda))print("PCA Results:")
print(f"Dimensions\tAccuracy")
for dim, acc in results['PCA']:print(dim,"\t", acc)print("\nLDA Results:")
print(f"Dimensions\tAccuracy")
for dim, acc in results['LDA']:print(dim,"\t", acc)# 可视化降到3维的结果
X_train_pca_3d, _ = reduce_dimension('PCA', X_train, y_train, X_test, 3)
X_train_lda_3d, _ = reduce_dimension('LDA', X_train, y_train, X_test, 3)def plot_3d(X, y, title):fig = plt.figure()ax = fig.add_subplot(111, projection='3d')ax.set_title(title)colors = {0: 'b', 1: 'r'}  # 0: normal, 1: attackfor label in np.unique(y):ax.scatter(X[y == label, 0], X[y == label, 1], X[y == label, 2], c=colors[label], label=label, s=20)ax.legend()plt.show()plot_3d(X_train_pca_3d, y_train_relabel, "PCA reduced to 3 dimensions")
plot_3d(X_train_lda_3d, y_train_relabel, "LDA reduced to 3 dimensions")

Figure_1

Figure_2

result

2 深度学习训练方法总结

数据集:kddcup99_Train.csv, kddcup99_Test.csv ,数据的每一行有 42 列,根据前 41 列的值去预测第42列是否为 attack。

任务描述:实现⼀个简单的神经网络模型判别数据包是否为attack,网络层数不小于5层,,例如 41->36->24->12->6->1。

要求至少 2 种激活函数,至少 2 种 parameter initialization 方法,至少 2 种训练方法(SGD,SGD+Momentom,Adam)。

训练模型并判断训练结果。

要求输出:

1)模型描述,层数,每⼀层参数,激活函数选择,loss 函数设置等;

2)针对不同方法组合(至少 2 个组合),plot 出随着 epoch 增长 training error 和 test error 的变化情况。

import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from colorama import Fore, Style# Load data
train_data = pd.read_csv('kddcup99_train.csv', header= None, on_bad_lines='skip')
test_data = pd.read_csv('kddcup99_test.csv', header= None, on_bad_lines='skip')# 将训练集和测试集各缩小一百倍
train_data = train_data.sample(frac=0.01)
test_data = test_data.sample(frac=0.01)# 对文本列进行编码
categorical_columns = train_data.columns[1:4].tolist()
for column in categorical_columns:combined_data = pd.concat([train_data[column], test_data[column]], axis=0)le = LabelEncoder()le.fit(combined_data)train_data[column] = le.transform(train_data[column])test_data[column] = le.transform(test_data[column])X_train = train_data.iloc[:, :-1]
y_train = train_data.iloc[:, -1]
y_train_relabel = y_train.apply(lambda x: 1 if x != 'normal.' else 0)
X_test = test_data.iloc[:, :-1]
y_test = test_data.iloc[:, -1]
y_test_relabel = y_test.apply(lambda x: 1 if x != 'normal.' else 0)#  X_train  X_test normalization 
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)# Convert to tensors
X_train = torch.tensor(X_train, dtype=torch.float)
y_train_relabel = torch.tensor(y_train_relabel.values, dtype=torch.float)
X_test = torch.tensor(X_test, dtype=torch.float)
y_test_relabel = torch.tensor(y_test_relabel.values, dtype=torch.float)# Define the network
class Net(nn.Module):def __init__(self, activation):super(Net, self).__init__()self.fc1 = nn.Linear(41, 36)self.fc2 = nn.Linear(36, 24)self.fc3 = nn.Linear(24, 12)self.fc4 = nn.Linear(12, 6)self.fc5 = nn.Linear(6, 1)if activation == 'relu':self.activation = nn.ReLU()elif activation == 'sigmoid':self.activation = nn.Sigmoid()def forward(self, x):x = self.activation(self.fc1(x))x = self.activation(self.fc2(x))x = self.activation(self.fc3(x))x = self.activation(self.fc4(x))x = torch.sigmoid(self.fc5(x))  # final activation is sigmoid for binary classificationreturn x# Training function
def train(net, X_train, y_train, X_test, y_test, optimizer, criterion, epochs=100):train_errors = []test_errors = []for epoch in range(epochs):optimizer.zero_grad()output = net(X_train).squeeze()  # squeeze the output to remove the extra dimensiontrain_loss = criterion(output, y_train)train_loss.backward()optimizer.step()# Record errorstrain_error = train_loss.item()test_output = net(X_test).squeeze()  # squeeze the output here tootest_loss = criterion(test_output, y_test)test_error = test_loss.item()train_errors.append(train_error)test_errors.append(test_error)if epoch % 10 == 0:print(Fore.GREEN + 'epoch: [{:3d}/{}], '.format(epoch, epochs) + Fore.BLUE + 'train step: {}, '.format(epoch) + Fore.RED + 'train_loss: {:.5f}, '.format(train_error) + Fore.YELLOW + 'test_loss: {:.5f}'.format(test_error))print(Style.RESET_ALL)  # Reset the color to defaultreturn train_errors, test_errors# Initialize network with ReLU activation, Xavier initialization, and SGD optimizer
net = Net('sigmoid')
nn.init.xavier_uniform_(net.fc1.weight)
nn.init.xavier_uniform_(net.fc2.weight)
nn.init.xavier_uniform_(net.fc3.weight)
nn.init.xavier_uniform_(net.fc4.weight)
nn.init.xavier_uniform_(net.fc5.weight)
optimizer = optim.SGD(net.parameters(), lr=0.01)
criterion = nn.BCELoss()# Train and plot errors
train_errors, test_errors = train(net, X_train, y_train_relabel, X_test, y_test_relabel, optimizer, criterion)print("Model structure: ", net, "\n\n")for name, param in net.named_parameters():print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")plt.title('Sigmoid Activation, Xavier Initialization, SGD Optimizer')
plt.plot(train_errors, label='Train Error')
plt.plot(test_errors, label='Test Error')
plt.legend()
plt.show()# Initialize network with Sigmoid activation, Kaiming initialization, and Adam optimizer
net = Net('relu')
nn.init.kaiming_uniform_(net.fc1.weight)
nn.init.kaiming_uniform_(net.fc2.weight)
nn.init.kaiming_uniform_(net.fc3.weight)
nn.init.kaiming_uniform_(net.fc4.weight)
nn.init.kaiming_uniform_(net.fc5.weight)
optimizer = optim.Adam(net.parameters(), lr=0.01)# Train and plot errors
train_errors, test_errors = train(net, X_train, y_train_relabel, X_test, y_test_relabel, optimizer, criterion)print("Model structure: ", net, "\n\n")for name, param in net.named_parameters():print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")plt.title('ReLU Activation, Kaiming Initialization, Adam Optimizer')
plt.plot(train_errors, label='Train Error')
plt.plot(test_errors, label='Test Error')
plt.legend()
plt.show()

输出:

Figure_1

Figure_2

epoch: [  0/100], train step: 0, train_loss: 0.83076, test_loss: 0.82476
epoch: [ 10/100], train step: 10, train_loss: 0.77907, test_loss: 0.77409
epoch: [ 20/100], train step: 20, train_loss: 0.73555, test_loss: 0.73146
epoch: [ 30/100], train step: 30, train_loss: 0.69899, test_loss: 0.69566
epoch: [ 40/100], train step: 40, train_loss: 0.66831, test_loss: 0.66563
epoch: [ 50/100], train step: 50, train_loss: 0.64258, test_loss: 0.64046
epoch: [ 60/100], train step: 60, train_loss: 0.62099, test_loss: 0.61933
epoch: [ 70/100], train step: 70, train_loss: 0.60285, test_loss: 0.60160
epoch: [ 80/100], train step: 80, train_loss: 0.58759, test_loss: 0.58668
epoch: [ 90/100], train step: 90, train_loss: 0.57472, test_loss: 0.57411Model structure:  Net((fc1): Linear(in_features=41, out_features=36, bias=True)(fc2): Linear(in_features=36, out_features=24, bias=True)(fc3): Linear(in_features=24, out_features=12, bias=True)(fc4): Linear(in_features=12, out_features=6, bias=True)(fc5): Linear(in_features=6, out_features=1, bias=True)(activation): Sigmoid()
)Layer: fc1.weight | Size: torch.Size([36, 41]) | Values : tensor([[-0.1204,  0.1695, -0.0945, -0.1015, -0.0338, -0.0430, -0.2108, -0.0487,0.2591,  0.0264, -0.2337,  0.1409, -0.2239,  0.1533,  0.1117, -0.1207,0.1573,  0.2463, -0.1667,  0.2229,  0.2368,  0.0499, -0.0194,  0.1738,0.0057,  0.2606, -0.1390, -0.2654, -0.1701,  0.0092, -0.0232, -0.2021,-0.1327, -0.0356,  0.2128,  0.0905, -0.2579,  0.0700, -0.2245,  0.1827,0.0351],[-0.0199, -0.0828,  0.1850,  0.2518,  0.1650, -0.0836, -0.0384,  0.0219,0.0745,  0.0179, -0.1767, -0.2702, -0.1961,  0.2151,  0.1896, -0.0962,-0.2253,  0.2759, -0.0934, -0.0413, -0.1034,  0.2449,  0.2524, -0.0120,0.1045,  0.0078,  0.1084, -0.0260, -0.2767, -0.1596, -0.1660,  0.1970,0.2388,  0.0917, -0.1074,  0.1396,  0.2334, -0.2057, -0.0928,  0.0844,0.2253]], grad_fn=<SliceBackward0>)Layer: fc1.bias | Size: torch.Size([36]) | Values : tensor([-0.0125, -0.0647], grad_fn=<SliceBackward0>)       Layer: fc2.weight | Size: torch.Size([24, 36]) | Values : tensor([[ 0.1901,  0.3076,  0.2780, -0.1221,  0.0865, -0.1958, -0.1591,  0.2710,-0.0584,  0.0713,  0.1986,  0.0998,  0.0596,  0.1909,  0.1504,  0.1529,-0.0607,  0.0267, -0.2466, -0.0994,  0.3090,  0.0109,  0.1064, -0.2247,0.0065, -0.2745, -0.2972,  0.1079, -0.2792, -0.2592, -0.0671,  0.1319,0.1867, -0.2757, -0.0122,  0.3061],[-0.2875,  0.0513, -0.2721,  0.1229,  0.1980, -0.2237,  0.0273,  0.0966,-0.1452, -0.1988, -0.0604, -0.0344, -0.2188,  0.1212, -0.1144, -0.2195,0.0199,  0.1941,  0.0436, -0.0758,  0.2529,  0.0526,  0.1667,  0.0497,0.1492, -0.0510,  0.1750,  0.1225, -0.1053,  0.1485,  0.2058,  0.2799,-0.0877, -0.0516, -0.1451, -0.0730]], grad_fn=<SliceBackward0>)Layer: fc2.bias | Size: torch.Size([24]) | Values : tensor([ 0.0214, -0.1065], grad_fn=<SliceBackward0>)       Layer: fc3.weight | Size: torch.Size([12, 24]) | Values : tensor([[ 0.0663, -0.1093,  0.1332,  0.0015,  0.0130, -0.0853, -0.1504,  0.4016,0.1938, -0.3305,  0.0805, -0.4046,  0.1531,  0.2350,  0.2465, -0.3746,0.1722, -0.0767,  0.2982,  0.3774,  0.3722, -0.0244,  0.3968,  0.0536],[-0.3444,  0.3722,  0.1660,  0.1776, -0.3715, -0.3797, -0.3049, -0.3762,0.3245,  0.4026, -0.0206,  0.0601,  0.3001, -0.2587,  0.3230,  0.1495,0.1577,  0.1810, -0.1501,  0.3237, -0.0502,  0.2333, -0.2059, -0.1671]],grad_fn=<SliceBackward0>)Layer: fc3.bias | Size: torch.Size([12]) | Values : tensor([ 0.0377, -0.0293], grad_fn=<SliceBackward0>)       Layer: fc4.weight | Size: torch.Size([6, 12]) | Values : tensor([[ 0.4014,  0.2608, -0.0577,  0.4244,  0.1046,  0.4315,  0.3025, -0.1401,-0.3695,  0.3276,  0.4545, -0.5888],[ 0.5549, -0.1166, -0.2757, -0.0290,  0.2277, -0.4197, -0.3324, -0.4567,-0.2027, -0.2449,  0.3914,  0.4241]], grad_fn=<SliceBackward0>)Layer: fc4.bias | Size: torch.Size([6]) | Values : tensor([0.0023, 0.1753], grad_fn=<SliceBackward0>)Layer: fc5.weight | Size: torch.Size([1, 6]) | Values : tensor([[-0.5094,  0.8143,  0.8610,  0.4556, -0.2517, -0.0298]],grad_fn=<SliceBackward0>)Layer: fc5.bias | Size: torch.Size([1]) | Values : tensor([-0.0002], grad_fn=<SliceBackward0>)epoch: [  0/100], train step: 0, train_loss: 0.77802, test_loss: 0.59542
epoch: [ 10/100], train step: 10, train_loss: 0.05965, test_loss: 0.04284
epoch: [ 20/100], train step: 20, train_loss: 0.00630, test_loss: 0.00698
epoch: [ 30/100], train step: 30, train_loss: 0.00557, test_loss: 0.00664
epoch: [ 40/100], train step: 40, train_loss: 0.00429, test_loss: 0.00585
epoch: [ 50/100], train step: 50, train_loss: 0.00323, test_loss: 0.00419
epoch: [ 60/100], train step: 60, train_loss: 0.00241, test_loss: 0.00331
epoch: [ 70/100], train step: 70, train_loss: 0.00185, test_loss: 0.00287
epoch: [ 80/100], train step: 80, train_loss: 0.00154, test_loss: 0.00289
epoch: [ 90/100], train step: 90, train_loss: 0.00137, test_loss: 0.00280Model structure:  Net((fc1): Linear(in_features=41, out_features=36, bias=True)(fc2): Linear(in_features=36, out_features=24, bias=True)(fc3): Linear(in_features=24, out_features=12, bias=True)(fc4): Linear(in_features=12, out_features=6, bias=True)(fc5): Linear(in_features=6, out_features=1, bias=True)(activation): ReLU()
)Layer: fc1.weight | Size: torch.Size([36, 41]) | Values : tensor([[-0.2223,  0.2652,  0.2620, -0.1737,  0.3073, -0.0071, -0.1844, -0.0910,-0.3330, -0.3644,  0.1672,  0.1811, -0.1323,  0.1138,  0.2474,  0.1190,0.0716,  0.2176, -0.2005, -0.2051,  0.2049, -0.2390,  0.1245, -0.3129,0.2679,  0.2085, -0.2465,  0.3374, -0.3995, -0.1678,  0.0991,  0.2571,-0.2904, -0.3289,  0.1882, -0.2960, -0.2482,  0.2063,  0.4326,  0.0245,0.3398],[-0.0233,  0.4540, -0.2002, -0.1890,  0.3571, -0.0085,  0.1579,  0.3280,0.0949,  0.1917,  0.2366,  0.2644,  0.1930, -0.4022,  0.2962, -0.1272,-0.1952, -0.1286, -0.2101,  0.2916,  0.2925,  0.4614, -0.2036, -0.1995,0.0267, -0.1885,  0.1940,  0.2943, -0.0280, -0.0546,  0.3826,  0.0526,0.1689,  0.5428, -0.2704, -0.1747, -0.0180, -0.2829,  0.2023,  0.1309,0.0611]], grad_fn=<SliceBackward0>)Layer: fc1.bias | Size: torch.Size([36]) | Values : tensor([-0.0362,  0.0334], grad_fn=<SliceBackward0>)       Layer: fc2.weight | Size: torch.Size([24, 36]) | Values : tensor([[ 0.2063,  0.1689,  0.1686, -0.0299, -0.1831,  0.0941,  0.2320, -0.3211,0.1923,  0.0230,  0.4319, -0.1799,  0.1243, -0.3534, -0.1607,  0.0783,0.1387, -0.3061,  0.3025, -0.2749,  0.1129,  0.1131,  0.0264,  0.2124,0.0571,  0.1330,  0.2580,  0.1441, -0.2470,  0.1353, -0.2976,  0.1211,-0.4715, -0.3144,  0.3093,  0.0863],[-0.1541,  0.3662, -0.0299,  0.0588,  0.4050,  0.1201,  0.0695, -0.1545,0.3148, -0.0187, -0.0606, -0.2249,  0.2707, -0.3788,  0.3024,  0.1695,0.1224,  0.0702, -0.3575,  0.3632, -0.4167,  0.0654, -0.4184, -0.0808,0.1539,  0.3601,  0.2400, -0.1469,  0.0635,  0.4880, -0.2289,  0.2739,-0.3789,  0.0521,  0.1561,  0.1686]], grad_fn=<SliceBackward0>)Layer: fc2.bias | Size: torch.Size([24]) | Values : tensor([-0.1722, -0.1973], grad_fn=<SliceBackward0>)       Layer: fc3.weight | Size: torch.Size([12, 24]) | Values : tensor([[-0.3522,  0.3949, -0.4170, -0.0109, -0.0345, -0.0263,  0.1290,  0.2439,-0.4594, -0.0283, -0.3729,  0.4788,  0.2042,  0.3306, -0.3264, -0.2471,0.3756, -0.2152, -0.1101,  0.2048, -0.1268, -0.3149, -0.2185,  0.1856],[-0.0387,  0.3687, -0.3605,  0.1807, -0.1386,  0.1414,  0.4445,  0.5877,-0.1079,  0.2080, -0.3797,  0.3645,  0.1634, -0.2281,  0.4158, -0.5624,0.2715,  0.4161,  0.1966, -0.3475, -0.1538,  0.4041,  0.0800,  0.5462]],grad_fn=<SliceBackward0>)Layer: fc3.bias | Size: torch.Size([12]) | Values : tensor([-0.1063,  0.0016], grad_fn=<SliceBackward0>)       Layer: fc4.weight | Size: torch.Size([6, 12]) | Values : tensor([[ 0.3684,  0.3588,  0.4967,  0.4015, -0.6398, -0.2540,  0.2881, -0.4240,-0.2107,  0.1753, -0.0842,  0.5281],[ 0.0090, -0.4973,  0.6199, -0.4902,  0.1952,  0.6437, -0.5067,  0.5859,0.8092,  0.8759,  0.0526, -0.3403]], grad_fn=<SliceBackward0>)Layer: fc4.bias | Size: torch.Size([6]) | Values : tensor([-0.2995,  0.0266], grad_fn=<SliceBackward0>)        Layer: fc5.weight | Size: torch.Size([1, 6]) | Values : tensor([[-0.0259, -0.4790,  0.7276,  0.2180,  0.6982, -0.7128]],grad_fn=<SliceBackward0>)Layer: fc5.bias | Size: torch.Size([1]) | Values : tensor([0.0468], grad_fn=<SliceBackward0>)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/41477.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Ardupilot无人船(车)mavros自主控制

文章目录 前言一、启动仿真二、编写代码三、运行前言 ubuntu20.04 rover 4.4 学习资料: https://cwkj-tech.yuque.com/bsge84/suv1 https://ardupilot.org/dev/docs/mavlink-rover-commands.html http://wiki.ros.org/mavros 一、启动仿真 在ardupilot/Rover目录下执行: …

强化学习-6 DDPG、PPO、SAC算法

文章目录 1 DPG方法2 DDPG算法3 DDPG算法的优缺点4 TD3算法4.1 双Q网络4.2 延迟更新4.3 噪声正则 5 附15.1 Ornstein-Uhlenbeck (OU) 噪声5.1.1 定义5.1.2 特性5.1.3 直观理解5.1.4 数学性质5.1.5 代码示例5.1.6 总结 6 重要性采样7 PPO算法8 附28.1 重要性采样方差计算8.1.1 公…

重塑通信边界,基于ZYNQ7000 FPGA驱动的多频段多协议软件无线电平台

01、产品概述 本平台是基于高性能ZYNQ-7000系列中的XC7Z045处理器构建的多频段多协议软件无线电解决方案&#xff0c;集成了AD9364芯片——一款业界领先的1x1通道RF敏捷收发器&#xff0c;为无线通信应用提供了强大支持。其存储架构包括2路高速4GB DDR3内存、1路32GB EMMC存储以…

一道有意思的简单题 [NOIP2010 普及组] 接水问题

题目&#xff1a; 题解&#xff1a; 每一次新来的同学的接水时间都加在现在已有的水龙头中接水时间最短的&#xff0c;总时间就为n次操作后水龙头中接水时间的最长值。 #include<bits/stdc.h> using namespace std; multiset<int>s;int main(){int n,m;scanf(&qu…

uni-app组件 子组件onLoad、onReady事件无效

文章目录 导文解决方法 导文 突然发现在项目中&#xff0c;组件 子组件的onLoad、onReady事件无效 打印也出不来值 怎么处理呢&#xff1f; 解决方法 mounted() {console.log(onLoad, this.dateList);//有效// this.checkinDetails()},onReady() {console.log(onReady, this.da…

空间数据采集与管理:为什么选择ArcGISPro和Python?

你还在为找不到合适的数据而苦恼吗&#xff1f;你还在面对大量数据束手无策&#xff0c;不知如何处理吗&#xff1f;对于从事生产和科研的人员来说&#xff0c;空间数据的采集与管理是地理信息系统&#xff08;GIS&#xff09;和空间分析领域的关键环节。通过准确高效地采集和管…

贪心算法-以高校教师信息管理系统为例

1.贪心算法介绍 1.算法思路 贪心算法的基本思路是从问题的某一个初始解出发一步一步地进行&#xff0c;根据某个优化测度&#xff0c;每一 步都要确保能获得局部最优解。每一步只考虑一 个数据&#xff0c;其选取应该满足局部优化的条件。若下 一个数据和部分最优解连在一起不…

PHP源码:美容护理按摩预约系统(附管理端+前台)

一. 前言 今天小编给大家带来了一款可学习&#xff0c;可商用的&#xff0c;预约系统 源码&#xff0c;支持二开&#xff0c;无加密。项目的内容可以是美容护理&#xff0c;按摩护理等&#xff0c;你也可以扩展。 预约下单大致流程&#xff1a; 客户登录下预约单&#xff0c…

电机驱动----L298N

一、介绍 L298N 是一种双H桥电机驱动芯片&#xff0c;其中每个H桥可以提供2A的电流&#xff0c;内含4路逻辑驱动电路&#xff0c;功率部分的供电电压范围是2.5-48v&#xff0c;逻辑部分5v供电&#xff0c;接受5vTTL电平。一般情况下&#xff0c;功率部分的电压应大于6V否则芯片…

实现Java应用的快速开发与迭代

实现Java应用的快速开发与迭代 大家好&#xff0c;我是免费搭建查券返利机器人省钱赚佣金就用微赚淘客系统3.0的小编&#xff0c;也是冬天不穿秋裤&#xff0c;天冷也要风度的程序猿&#xff01; 1. 引言 随着软件开发周期的不断缩短和市场竞争的加剧&#xff0c;快速开发和…

Spring源码十一:事件驱动

上一篇Spring源码十&#xff1a;BeanPostProcess中&#xff0c;我们介绍了BeanPostProcessor是Spring框架提供的一个强大工具&#xff0c;它允许我们开发者在Bean的生命周期中的特定点进行自定义操作。通过实现BeanPostProcessor接口&#xff0c;开发者可以插入自己的逻辑&…

JAVA基础(8) 面向对象编程3

目录 一、关键字static 1.概念 2.static关键字 &#xff08;1&#xff09;使用范围&#xff1a; &#xff08;2&#xff09;被修饰后的成员具备以下特点&#xff1a; 3.静态变量 &#xff08;1&#xff09;语法格式 &#xff08;2&#xff09;静态变量的特点 &#xf…

k8s-第五节-StatefulSet

StatefulSet StatefulSet 是用来管理有状态的应用&#xff0c;例如数据库。 前面我们部署的应用&#xff0c;都是不需要存储数据&#xff0c;不需要记住状态的&#xff0c;可以随意扩充副本&#xff0c;每个副本都是一样的&#xff0c;可替代的。 而像**数据库、Redis **这类…

CPU是什么?

CPU&#xff0c;全称Central Processing Unit&#xff0c;中文名为中央处理器&#xff0c;是计算机系统的核心部件&#xff0c;负责执行程序指令、处理数据以及控制计算机内部的其他组件。简单来说&#xff0c;CPU就像是计算机的大脑&#xff0c;负责进行所有的思考和计算工作。…

在Linux/Debian/Ubuntu中出现“Could not get lock /var/lib/dpkg/lock-frontend”问题的解决办法

在Linux/Debian/Ubuntu中出现“Could not get lock /var/lib/dpkg/lock-frontend”问题的解决办法 在使用 apt 或 apt-get 进行软件包管理时&#xff0c;有时会遇到以下错误提示&#xff1a; Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporari…

CCM的作用及原理

CCM调试的理论依据_ccm矩阵sat调试-CSDN博客 CCM是在WB之后&#xff0c;就是当AWB将白色校正之后其他颜色也会跟着有明显变化&#xff0c;CCM的作用就是要保持白色不变&#xff0c;把其他色彩校正到非常精准的地步。 校正后的颜色(target值是一个固定的值)CCM矩阵*原始的颜色…

代码随想录Day73(Part09)

badijkstar(堆优化版) 题目&#xff1a;47. 参加科学大会&#xff08;第六期模拟笔试&#xff09; (kamacoder.com) 思路&#xff1a;直接看答案 答案 import java.util.*;class Edge {int to; // 邻接顶点int val; // 边的权重Edge(int to, int val) {this.to to;this.val …

昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

目录 前情提要torch_npu框架mindsport框架mindnlp框架 下载模型国外国内 环境设置代码适配&#xff08;非流式&#xff09;MainBranch结果展示 代码适配&#xff08;流式&#xff09; 前情提要 torch_npu框架 官方未适配 mindsport框架 官方未适配 mindnlp框架 官方适配…

力扣139 单词拆分 Java版本

文章目录 题目描述代码 题目描述 给你一个字符串 s 和一个字符串列表 wordDict 作为字典。如果可以利用字典中出现的一个或多个单词拼接出 s 则返回 true。 注意&#xff1a;不要求字典中出现的单词全部都使用&#xff0c;并且字典中的单词可以重复使用。 示例 1&#xff1a…

HTTP与HTTPS的主要区别

HTTP&#xff08;超文本传输协议&#xff09;与HTTPS&#xff08;超文本传输安全协议&#xff09;的主要区别在于安全性、数据传输方式、默认使用的端口以及对网站的影响。 一、安全性&#xff1a; HTTP是一种无加密的协议&#xff0c;数据在传输过程中以明文形式发送&#x…