LSTM(序列标注,自实现)

文章目录

  • 1.LSTM
    • 1.1 单独计算
  • 单层LSTM-cell
    • 单层LSTM
    • BPTT
  • 2.序列标注

使用pytorch实现序列标注
自实现lstm

import torch
import torch.nn as nn
def prepare_sequence(seq, to_ix):idxs = [to_ix[w] for w in seq]return torch.tensor(idxs, dtype=torch.long)training_data = [("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
for sent, tags in training_data:for word in sent:if word not in word_to_ix:word_to_ix[word] = len(word_to_ix)
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2}# 实际中通常使用更大的维度如32维, 64维.
# 这里我们使用小的维度, 为了方便查看训练过程中权重的变化.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6
{'The': 0, 'dog': 1, 'ate': 2, 'the': 3, 'apple': 4, 'Everybody': 5, 'read': 6, 'that': 7, 'book': 8}
x=prepare_sequence(training_data[0][0],word_to_ix)
y=prepare_sequence(training_data[0][1],tag_to_ix)
print('x:',x)
print('y:',y)
x: tensor([0, 1, 2, 3, 4])
y: tensor([0, 1, 2, 0, 1])
def sigmoid(x):return 1/(1+torch.exp(-1*x))
def tanh(x):return (torch.exp(x)-torch.exp(-1*x))/ (torch.exp(x)+torch.exp(-1*x))
hidden_dim=6
embedding_dim=6
vocab_size=len(word_to_ix)def init_hidden():# 一开始并没有隐藏状态所以我们要先初始化一个# 关于维度为什么这么设计请参考Pytoch相关文档# 各个维度的含义是 (num_layers*num_directions, batch_size, hidden_dim)return (torch.zeros(1, 1, hidden_dim),torch.zeros(1, 1, hidden_dim))
word_embeddings = nn.Embedding(vocab_size, embedding_dim)
embed=word_embeddings(x)
print(embed)
tensor([[-1.9627,  0.8135, -0.4169,  0.5599, -0.3018,  1.1061],[-0.3190,  1.0058, -0.7057,  0.1204,  1.4937,  0.0279],[-0.4799,  2.1392, -0.9231, -1.0999, -1.4840, -0.7990],[-1.0826,  1.0353,  0.4493,  1.1570,  0.2160,  0.7899],[ 1.2812,  1.0754,  0.7863,  0.6510, -1.1592, -0.4033]],grad_fn=<EmbeddingBackward>)
one_hot = torch.zeros(len(x), vocab_size).scatter_(1,x.reshape(len(x),1), 1)
print(one_hot)
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0.],[0., 1., 0., 0., 0., 0., 0., 0., 0.],[0., 0., 1., 0., 0., 0., 0., 0., 0.],[0., 0., 0., 1., 0., 0., 0., 0., 0.],[0., 0., 0., 0., 1., 0., 0., 0., 0.]])
embedding_matrix=torch.randn(vocab_size,embedding_dim)
my_embed=torch.matmul(one_hot,embedding_matrix)
print(my_embed)
tensor([[ 0.4017, -0.0790, -0.7208,  1.0096, -0.6415,  0.3977],[-1.1770,  0.9600, -0.4552,  0.5287, -1.2346, -1.1289],[ 1.4698,  0.9478,  0.4281, -0.0136, -0.8808,  0.6587],[ 1.4614,  0.1628,  0.4880,  1.5886,  1.0572,  0.0694],[ 1.1160, -0.9236,  0.1572,  0.8014,  0.9089, -0.0327]])
# (句长,batch_size,embedding_size)
input_x=embed.view(len(x), 1, -1)
print(input_x)
tensor([[[-1.9627,  0.8135, -0.4169,  0.5599, -0.3018,  1.1061]],[[-0.3190,  1.0058, -0.7057,  0.1204,  1.4937,  0.0279]],[[-0.4799,  2.1392, -0.9231, -1.0999, -1.4840, -0.7990]],[[-1.0826,  1.0353,  0.4493,  1.1570,  0.2160,  0.7899]],[[ 1.2812,  1.0754,  0.7863,  0.6510, -1.1592, -0.4033]]],grad_fn=<ViewBackward>)

1.LSTM

1.1 单独计算

  • 第i层,t时刻
    j是前一时刻的各个cell
    在这里插入图片描述
    fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t−1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t−1))qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t−1))si(t)=fi(t)∗si(t−1)+gi(t)∗σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t−1))hi(t)=tanh(si(t))qi(t)f_i^{(t)}=\sigma(b_i^f+\Sigma_jU_{i,j}^fx_j^{(t)}+\Sigma_jW_{i,j}^fh_j{(t-1)})\\ g_i^{(t)}=\sigma(b_i^g+\Sigma_jU_{i,j}^gx_j^{(t)}+\Sigma_jW_{i,j}^gh_j{(t-1)})\\ q_i^{(t)}=\sigma(b_i^o+\Sigma_jU_{i,j}^ox_j^{(t)}+\Sigma_jW_{i,j}^oh_j{(t-1)})\\ s_i^{(t)}=f_i^{(t)}*s_i^{(t-1)}+g_i^{(t)}*\sigma(b_i^c+\Sigma_jU_{i,j}^cx_j^{(t)}+\Sigma_jW_{i,j}^ch_j{(t-1)})\\ h_i^{(t)}=tanh(s_i^{(t)})q_i^{(t)}fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t1))qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t1))si(t)=fi(t)si(t1)+gi(t)σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t1))hi(t)=tanh(si(t))qi(t)
# 输入到2
# U
Uf=torch.randn(embedding_dim,hidden_dim)
Wf=torch.randn(hidden_dim,hidden_dim)
bf=torch.randn(hidden_dim)
print(Uf)
print(Wf)
print(bf)
tensor([[-0.2412, -1.2818, -0.7232,  0.9796, -1.3831,  0.0280],[-2.2550,  1.0024,  0.3181,  2.4625,  0.8185, -0.1705],[ 0.6749, -1.4820,  0.1306, -0.0302,  0.1076, -0.4431],[-1.9521,  1.5941, -0.4877, -0.5115,  0.3042, -0.8965],[ 0.8267,  0.6762,  1.1087, -0.0376,  0.4959, -0.9688],[-0.2706,  0.6851,  0.8101,  0.3680,  0.1835, -0.4139]])
tensor([[ 1.1376, -1.2257, -0.3329, -0.1501,  1.0706, -1.1383],[-0.5685, -0.6473, -0.9684, -0.4290, -0.8083,  0.1783],[-0.3419, -1.3738,  0.0836, -0.3662, -0.2039,  0.0299],[ 0.4583, -0.4010, -0.9482, -0.1714,  0.0785, -0.5377],[ 0.7783,  0.4437, -2.0553, -1.8913,  0.8079,  0.7039],[-0.5302, -1.1906, -1.2803,  0.0609,  0.3618,  0.7094]])
tensor([ 1.6180,  0.4092, -1.1886, -1.1649,  0.7097,  1.4132])
h0,c0=init_hidden()
print(h0)
print(c0)
tensor([[[0., 0., 0., 0., 0., 0.]]])
tensor([[[0., 0., 0., 0., 0., 0.]]])
print(torch.matmul(input_x[0],Uf))
#0时刻的遗忘门
f1=sigmoid(bf+torch.matmul(input_x[0],Uf)+torch.matmul(h0,Wf))
print(f1)
tensor([[-3.2840,  5.3954,  1.9123,  0.2252,  3.5592, -0.6763]],grad_fn=<MmBackward>)
tensor([[[0.1590, 0.9970, 0.6734, 0.2810, 0.9862, 0.6763]]],grad_fn=<MulBackward0>)
torch.matmul(h0,Wf)
tensor([[[0., 0., 0., 0., 0., 0.]]])
# 0时刻输入门
bg=torch.randn(hidden_dim)
Wg=torch.randn(hidden_dim,hidden_dim)
Ug=torch.randn(embedding_dim,hidden_dim)
print(bg)
print(Wg)
print(Ug)
tensor([-0.9991, -0.3109, -0.3376, -1.8703, -0.0876,  0.4118])
tensor([[ 0.1361,  0.8912,  0.3556, -1.1611, -0.4669, -0.7749],[-0.9517, -2.1878, -1.1335,  1.8934, -0.4701, -0.0386],[-0.2086, -0.0997,  0.0195, -0.4307,  0.2007, -0.3712],[-0.0860, -0.7646, -1.0500, -1.3939,  0.3060,  0.5810],[ 0.9782,  0.1691,  1.3593, -0.1176,  0.2451,  1.2866],[ 0.3426,  1.1758, -0.1679, -0.7304,  1.8132,  0.7703]])
tensor([[ 0.7740, -0.5909,  0.3731, -0.2821,  0.4309,  0.3201],[-0.0408, -2.3477, -0.0902, -0.6489, -0.6137, -0.6363],[-0.3889,  0.7760, -1.5003, -1.6583,  1.7034,  0.6059],[ 0.9344, -1.5214, -2.2810, -0.9084,  0.4917, -0.0436],[-0.3241,  0.2920, -1.4197, -0.7704,  1.3797,  1.0030],[ 0.4039, -1.4007,  1.1480, -0.5950,  0.2726, -0.3568]])
# 0时刻输入门
g1=sigmoid(bg+torch.matmul(input_x[0],Ug)+torch.matmul(h0,Wg))
print(g1)
tensor([[[0.2106, 0.0204, 0.4759, 0.1103, 0.1211, 0.1534]]],grad_fn=<MulBackward0>)
# 状态
bc=torch.randn(hidden_dim)
Wc=torch.randn(hidden_dim,hidden_dim)
Uc=torch.randn(embedding_dim,hidden_dim)
print(bc)
print(Wc)
print(Uc)
tensor([-1.4072,  0.0440,  0.4973,  2.0482,  0.2032,  0.2510])
tensor([[ 2.0180, -0.5751,  0.4657, -1.3219,  2.4918, -0.8496],[ 0.2287, -1.4079, -0.0104, -0.3973,  1.3936,  1.2032],[ 0.5597,  0.8178, -0.2663, -0.0518, -1.2287,  0.7666],[ 1.4284, -0.6757,  1.3944,  0.3908, -0.1043,  1.7851],[-0.2318,  0.1908, -0.9405, -1.3440, -2.0447, -2.2236],[ 0.7214, -0.5389,  1.0935, -0.4707, -0.6584,  0.8625]])
tensor([[ 0.2348,  0.7101, -0.2298,  0.4476,  1.2316,  0.3588],[ 0.9452, -0.3919, -0.1857,  0.5695, -0.7272, -1.2976],[ 0.3508, -0.3632,  0.9566, -0.8370, -2.0458, -1.2055],[-1.4784, -0.9333, -0.7207,  1.8996, -1.0026, -0.0988],[ 1.0030, -0.2087, -0.4728, -1.4157, -0.3052, -1.1199],[-1.7926, -1.1267, -0.9589, -0.9056, -0.8777,  0.2443]])
c1=f1*c0+g1*sigmoid(bc+torch.matmul(input_x[0],Uc)+torch.matmul(h0,Wc))
print(c1)
tensor([[[0.0027, 0.0008, 0.1353, 0.1017, 0.0039, 0.0596]]],grad_fn=<AddBackward0>)
# 输出门参数
bo=torch.randn(hidden_dim)
Wo=torch.randn(hidden_dim,hidden_dim)
Uo=torch.randn(embedding_dim,hidden_dim)
print(bo)
print(Wo)
print(Uo)
tensor([-0.7430, -0.4823,  0.6030, -0.1274, -0.5860, -0.1610])
tensor([[-0.8334,  0.1386,  0.4369,  0.9919,  0.0499,  0.2537],[ 0.7339,  1.3104,  0.5500, -0.9005, -1.1566, -1.7843],[-1.6112, -1.0089, -1.0443,  0.3732, -0.6024, -1.1931],[-1.9338, -0.1763, -0.0256, -0.8732, -1.7940, -1.4747],[ 0.4316,  1.6072,  0.8072, -0.9294,  0.8270,  0.5840],[ 0.0676,  0.0690,  0.9222,  0.3463,  0.3679,  0.1482]])
tensor([[-0.9390, -1.6735,  0.2829,  1.0728, -0.6216,  0.2004],[-0.7808, -0.1753, -0.9838, -1.7960,  0.2015, -0.8450],[-0.0584, -0.1656, -0.3886,  0.1750,  0.3405, -0.0094],[ 0.3652, -0.3256,  0.3165, -1.9058, -0.0954,  1.0349],[-0.1895, -0.2673, -1.4944,  0.7692, -2.3686,  1.3873],[ 0.9085,  0.9621,  0.8830, -2.6961, -0.2800, -0.7214]])
# 输出门公式
q1=sigmoid(bo+torch.matmul(input_x[0],Uo)+torch.matmul(h0,Wo))
print(q1)
tensor([[[8.5267e-01, 9.7567e-01, 7.3385e-01, 3.1952e-04, 7.3254e-01,1.3298e-01]]], grad_fn=<MulBackward0>)
# 隐层
h1=tanh(c1)*q1
print(h1)
tensor([[[2.2688e-03, 7.6095e-04, 9.8698e-02, 3.2395e-05, 2.8849e-03,7.9154e-03]]], grad_fn=<MulBackward0>)

单层LSTM-cell

def LSTM_Cell(input_x,h0,c0):f1=sigmoid(bf+torch.matmul(input_x,Uf)+torch.matmul(h0,Wf))
#     print(f1)g1=sigmoid(bg+torch.matmul(input_x,Ug)+torch.matmul(h0,Wg))
#     print(g1)gc1=sigmoid(bc+torch.matmul(input_x,Uc)+torch.matmul(h0,Wc))c1=f1*c0+g1*gc1
#     print(c1)q1=sigmoid(bo+torch.matmul(input_x,Uo)+torch.matmul(h0,Wo))
#     print(q1)h1=tanh(c1)*q1
#     print(h1)return (h1,c1),f1,g1,q1,gc1
(h1,c1),_,_,_,_=LSTM_Cell(input_x[0],h0,c0)
print(h1)
print(c1)
tensor([[[2.2688e-03, 7.6095e-04, 9.8698e-02, 3.2395e-05, 2.8849e-03,7.9154e-03]]], grad_fn=<MulBackward0>)
tensor([[[0.0027, 0.0008, 0.1353, 0.1017, 0.0039, 0.0596]]],grad_fn=<AddBackward0>)

单层LSTM

# forward
def single_layer_LSTM(input_x):h0,c0=init_hidden()h=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)c=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)f=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)g=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)q=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)gc=torch.zeros(input_x.shape[0],input_x.shape[1],hidden_dim)for i in range(len(input_x)):(h0,c0),f0,g0,q0,gc0=LSTM_Cell(input_x[i],h0,c0)h[i]=h0c[i]=c0f[i]=f0g[i]=g0q[i]=q0gc[i]=gc0return h,(h0,c0),c,f,g,q,gc
o,(h1,c1),c,f,g,q,gc=single_layer_LSTM(input_x)
print(o)
print(h1)
print(c1)
tensor([[[2.2688e-03, 7.6095e-04, 9.8698e-02, 3.2395e-05, 2.8849e-03,7.9154e-03]],[[2.2140e-02, 7.2907e-03, 1.0052e-02, 2.2311e-02, 4.1039e-03,8.3458e-02]],[[6.5945e-03, 2.2127e-02, 3.3992e-01, 1.2278e-01, 2.0307e-01,1.2748e-03]],[[1.1699e-02, 3.3651e-02, 1.5326e-01, 1.8081e-04, 6.9607e-02,3.0697e-02]],[[7.7960e-03, 7.7988e-04, 1.2081e-01, 4.8651e-02, 1.8456e-01,3.7786e-02]]], grad_fn=<CopySlices>)
tensor([[[0.0078, 0.0008, 0.1208, 0.0487, 0.1846, 0.0378]]],grad_fn=<MulBackward0>)
tensor([[[0.1567, 0.0205, 0.1611, 0.3002, 0.2173, 0.2408]]],grad_fn=<AddBackward0>)

BPTT

一层一层的计算:从T开始
fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t−1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t−1))qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t−1))si(t)=fi(t)∗si(t−1)+gi(t)∗σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t−1))hi(t)=tanh(si(t))qi(t)f_i^{(t)}=\sigma(b_i^f+\Sigma_jU_{i,j}^fx_j^{(t)}+\Sigma_jW_{i,j}^fh_j{(t-1)})\\ g_i^{(t)}=\sigma(b_i^g+\Sigma_jU_{i,j}^gx_j^{(t)}+\Sigma_jW_{i,j}^gh_j{(t-1)})\\ q_i^{(t)}=\sigma(b_i^o+\Sigma_jU_{i,j}^ox_j^{(t)}+\Sigma_jW_{i,j}^oh_j{(t-1)})\\ s_i^{(t)}=f_i^{(t)}*s_i^{(t-1)}+g_i^{(t)}*\sigma(b_i^c+\Sigma_jU_{i,j}^cx_j^{(t)}+\Sigma_jW_{i,j}^ch_j{(t-1)})\\ h_i^{(t)}=tanh(s_i^{(t)})q_i^{(t)}fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t1))qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t1))si(t)=fi(t)si(t1)+gi(t)σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t1))hi(t)=tanh(si(t))qi(t)
下面的未考虑batch

print(o.view(len(x),-1))
tensor([[2.2688e-03, 7.6095e-04, 9.8698e-02, 3.2395e-05, 2.8849e-03, 7.9154e-03],[2.2140e-02, 7.2907e-03, 1.0052e-02, 2.2311e-02, 4.1039e-03, 8.3458e-02],[6.5945e-03, 2.2127e-02, 3.3992e-01, 1.2278e-01, 2.0307e-01, 1.2748e-03],[1.1699e-02, 3.3651e-02, 1.5326e-01, 1.8081e-04, 6.9607e-02, 3.0697e-02],[7.7960e-03, 7.7988e-04, 1.2081e-01, 4.8651e-02, 1.8456e-01, 3.7786e-02]],grad_fn=<ViewBackward>)
print(torch.transpose(o.view(len(x),-1),1,0))
tensor([[2.2688e-03, 2.2140e-02, 6.5945e-03, 1.1699e-02, 7.7960e-03],[7.6095e-04, 7.2907e-03, 2.2127e-02, 3.3651e-02, 7.7988e-04],[9.8698e-02, 1.0052e-02, 3.3992e-01, 1.5326e-01, 1.2081e-01],[3.2395e-05, 2.2311e-02, 1.2278e-01, 1.8081e-04, 4.8651e-02],[2.8849e-03, 4.1039e-03, 2.0307e-01, 6.9607e-02, 1.8456e-01],[7.9154e-03, 8.3458e-02, 1.2748e-03, 3.0697e-02, 3.7786e-02]],grad_fn=<TransposeBackward0>)
print(y)
tagset_size=len(tag_to_ix)
tensor([0, 1, 2, 0, 1])
one_hot_y = torch.zeros(len(y), tagset_size).scatter_(1,y.reshape(len(y),1), 1)
dL_do=torch.tensor([[[  1.1719,   4.0198,  -0.1581,  -6.9059,  -4.1330,   5.0020]],[[  1.0842,  -0.5113,   0.2987,   0.7790,  -0.1800,   1.7739]],[[-16.1690, -10.2418,   9.0003,  10.4557,   6.8416, -34.2560]],[[  0.3115,   1.0683,  -0.0420,  -1.8353,  -1.0984,   1.3294]],[[  1.3049,  -0.6155,   0.3596,   0.9376,  -0.2166,   2.1351]]])
print(dL_do)
tensor([[[  1.1719,   4.0198,  -0.1581,  -6.9059,  -4.1330,   5.0020]],[[  1.0842,  -0.5113,   0.2987,   0.7790,  -0.1800,   1.7739]],[[-16.1690, -10.2418,   9.0003,  10.4557,   6.8416, -34.2560]],[[  0.3115,   1.0683,  -0.0420,  -1.8353,  -1.0984,   1.3294]],[[  1.3049,  -0.6155,   0.3596,   0.9376,  -0.2166,   2.1351]]])

hi(t)=tanh(si(t))qi(t)∂L∂qi(t)=∂L∂hi(t)∗tanh(si(t))∂L∂si(t)=∂L∂hi(t)∗qi(t)∗(1−tanh(si(t))2)h_i^{(t)}=tanh(s_i^{(t)})q_i^{(t)}\\ \frac{\partial L}{\partial q_i^{(t)}}=\frac{\partial L}{\partial h_i^{(t)}}*tanh(s_i^{(t)})\\ \frac{\partial L}{\partial s_i^{(t)}}=\frac{\partial L}{\partial h_i^{(t)}}*q_i^{(t)}*(1-tanh(s_i^{(t)})^2)hi(t)=tanh(si(t))qi(t)qi(t)L=hi(t)Ltanh(si(t))si(t)L=hi(t)Lqi(t)1tanh(si(t))2)

print(tanh(c))
tensor([[[0.0027, 0.0008, 0.1345, 0.1014, 0.0039, 0.0595]],[[0.1275, 0.0195, 0.1282, 0.1240, 0.2368, 0.1144]],[[0.1172, 0.0422, 0.6908, 0.6798, 0.2083, 0.1435]],[[0.0253, 0.0420, 0.3672, 0.3021, 0.2065, 0.1015]],[[0.1554, 0.0205, 0.1597, 0.2915, 0.2140, 0.2363]]],grad_fn=<DivBackward0>)
dL_dq=torch.zeros(dL_do.shape)
dL_dq[-1]=tanh(c[-1])*dL_do[-1]
print(dL_dq)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.2028, -0.0126,  0.0574,  0.2733, -0.0463,  0.5045]]],grad_fn=<CopySlices>)
dL_ds=torch.zeros(dL_do.shape)
dL_ds[-1]=dL_do[-1]*(1-tanh(c[-1])*tanh(c[-1]))*q[-1]
print(dL_ds)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0639, -0.0234,  0.2650,  0.1432, -0.1783,  0.3224]]],grad_fn=<CopySlices>)

qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t−1))∂L∂(bio+ΣjUi,joxj(t)+ΣjWi,johj(t−1))=∂L∂bio=∂L∂qi(t)∗(σ)′=∂L∂qi(t)∗(1−σ)σ=∂L∂qi(t)∗qi(t)∗(1−qi(t))∂L∂Wi,jo=∂L∂qi(t)∗(σ)′∗hj(t−1)∂L∂Ui,jo=∂L∂qi(t)∗(σ)′∗xj(t)q_i^{(t)}=\sigma(b_i^o+\Sigma_jU_{i,j}^ox_j^{(t)}+\Sigma_jW_{i,j}^oh_j^{(t-1)})\\ \frac{\partial L}{\partial (b_i^o+\Sigma_jU_{i,j}^ox_j^{(t)}+\Sigma_jW_{i,j}^oh_j{(t-1)})}=\frac{\partial L}{\partial b_i^{o}}=\frac{\partial L}{\partial q_i^{(t)}}*(\sigma)'=\frac{\partial L}{\partial q_i^{(t)}}*(1-\sigma)\sigma=\frac{\partial L}{\partial q_i^{(t)}}*q_i^{(t)}*(1-q_i^{(t)})\\ \frac{\partial L}{\partial W_{i,j}^{o}}=\frac{\partial L}{\partial q_i^{(t)}}*(\sigma)'*h_j^{(t-1)}\\ \frac{\partial L}{\partial U_{i,j}^{o}}=\frac{\partial L}{\partial q_i^{(t)}}*(\sigma)'*x_j{(t)}qi(t)=σ(bio+ΣjUi,joxj(t)+ΣjWi,johj(t1))(bio+ΣjUi,joxj(t)+ΣjWi,johj(t1))L=bioL=qi(t)Lσ)=qi(t)L(1σ)σ=qi(t)Lqi(t)(1qi(t))Wi,joL=qi(t)Lσ)hj(t1)Ui,joL=qi(t)Lσ)xj(t)
因为这里只有一层,就不求dq/dx了

∂L∂hj(t−1)=Σi∂L∂qi(t)∗(σ)′∗Wi,jo\frac{\partial L}{\partial h_j^{(t-1)}}=\Sigma_i \frac{\partial L}{\partial q_i^{(t)}}*(\sigma)'*W_{i,j}^ohj(t1)L=Σiqi(t)Lσ)Wi,jo
LSTM层之后的线性映射层给hj(t-1)传递了一个损失,这个是q传递来的另一个损失。还有通过其他的各种传递过来的。

dL_dqx=torch.zeros(dL_do.shape)
dL_dqx[-1]=dL_dq[-1]*q[-1]*(1-q[-1])
dL_dbo=torch.zeros(bo.shape)
dL_dbo+=dL_dqx[-1,0]print(dL_dqx)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0097, -0.0005,  0.0106,  0.0380, -0.0055,  0.0678]]],grad_fn=<CopySlices>)

tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0525, -0.1445, -0.0005,  0.0053,  0.1187, -0.1321]]],grad_fn=<CopySlices>)
h_t_1=torch.zeros(o.shape)
h_t_1[1:]=o[:-1]
print(h_t_1[-1,0].reshape(hidden_dim,1))
print(dL_dbo[-1])
print(h_t_1[-1,0].reshape(hidden_dim,1)*dL_dbo[-1])
tensor([[0.0117],[0.0337],[0.1533],[0.0002],[0.0696],[0.0307]], grad_fn=<AsStridedBackward>)
tensor(0.0678, grad_fn=<SelectBackward>)
tensor([[7.9293e-04],[2.2807e-03],[1.0387e-02],[1.2254e-05],[4.7176e-03],[2.0805e-03]], grad_fn=<MulBackward0>)
dL_dWo=torch.zeros(Wo.shape)
dL_dWo+=h_t_1[-1,0].reshape(hidden_dim,1)*dL_dbo[-1]
print(dL_dWo)
tensor([[7.9293e-04, 7.9293e-04, 7.9293e-04, 7.9293e-04, 7.9293e-04, 7.9293e-04],[2.2807e-03, 2.2807e-03, 2.2807e-03, 2.2807e-03, 2.2807e-03, 2.2807e-03],[1.0387e-02, 1.0387e-02, 1.0387e-02, 1.0387e-02, 1.0387e-02, 1.0387e-02],[1.2254e-05, 1.2254e-05, 1.2254e-05, 1.2254e-05, 1.2254e-05, 1.2254e-05],[4.7176e-03, 4.7176e-03, 4.7176e-03, 4.7176e-03, 4.7176e-03, 4.7176e-03],[2.0805e-03, 2.0805e-03, 2.0805e-03, 2.0805e-03, 2.0805e-03, 2.0805e-03]],grad_fn=<AddBackward0>)
dL_do[-2]+=torch.matmul(dL_dqx[-1],torch.transpose(Wo,1,0))
print(dL_do)
tensor([[[  1.1719,   4.0198,  -0.1581,  -6.9059,  -4.1330,   5.0020]],[[  1.0842,  -0.5113,   0.2987,   0.7790,  -0.1800,   1.7739]],[[-16.1690, -10.2418,   9.0003,  10.4557,   6.8416, -34.2560]],[[  0.3626,   0.9318,  -0.1315,  -1.9774,  -1.0867,   1.3610]],[[  1.3049,  -0.6155,   0.3596,   0.9376,  -0.2166,   2.1351]]],grad_fn=<CopySlices>)
print(input_x[-1].reshape(embedding_dim,1))
print(dL_dbo[-1])
dL_dUo=torch.zeros(Uo.shape)
dL_dUo+=input_x[-1].reshape(embedding_dim,1)*dL_dbo[-1]
print(dL_dUo)
tensor([[ 1.2812],[ 1.0754],[ 0.7863],[ 0.6510],[-1.1592],[-0.4033]], grad_fn=<AsStridedBackward>)
tensor(0.0678, grad_fn=<SelectBackward>)
tensor([[ 0.0868,  0.0868,  0.0868,  0.0868,  0.0868,  0.0868],[ 0.0729,  0.0729,  0.0729,  0.0729,  0.0729,  0.0729],[ 0.0533,  0.0533,  0.0533,  0.0533,  0.0533,  0.0533],[ 0.0441,  0.0441,  0.0441,  0.0441,  0.0441,  0.0441],[-0.0786, -0.0786, -0.0786, -0.0786, -0.0786, -0.0786],[-0.0273, -0.0273, -0.0273, -0.0273, -0.0273, -0.0273]],grad_fn=<AddBackward0>)

si(t)=fi(t)∗si(t−1)+gi(t)∗σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t−1))∂L∂fi(t)=∂L∂si(t)∗si(t−1)∂L∂si(t−1)=∂L∂si(t)∗fi(t)∂L∂gi(t)=∂L∂si(t)∗σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t−1))s_i^{(t)}=f_i^{(t)}*s_i^{(t-1)}+g_i^{(t)}*\sigma(b_i^c+\Sigma_jU_{i,j}^cx_j^{(t)}+\Sigma_jW_{i,j}^ch_j{(t-1)})\\ \frac{\partial L}{\partial f_i^{(t)}}=\frac{\partial L}{\partial s_i^{(t)}}*s_i^{(t-1)}\\ \frac{\partial L}{\partial s_i^{(t-1)}}=\frac{\partial L}{\partial s_i^{(t)}}*f_i^{(t)}\\ \frac{\partial L}{\partial g_i^{(t)}}=\frac{\partial L}{\partial s_i^{(t)}}*\sigma(b_i^c+\Sigma_jU_{i,j}^cx_j^{(t)}+\Sigma_jW_{i,j}^ch_j{(t-1)})si(t)=fi(t)si(t1)+gi(t)σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t1))fi(t)L=si(t)Lsi(t1)si(t1)L=si(t)Lfi(t)gi(t)L=si(t)Lσ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t1))

dL_dg=torch.zeros(dL_do.shape)
dL_dg[-1]=dL_ds[-1]*gc[-1]
print(dL_dg)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0160, -0.0140,  0.2014,  0.1427, -0.0542,  0.1234]]],grad_fn=<CopySlices>)
dL_df=torch.zeros(dL_do.shape)
dL_df[-1]=dL_ds[-1]*c[-2]
print(dL_df)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0016, -0.0010,  0.1021,  0.0447, -0.0374,  0.0328]]],grad_fn=<CopySlices>)
dL_ds[-2]+=dL_ds[-1]*f[-1]
print(dL_ds)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0039, -0.0039,  0.0058,  0.1272, -0.0662,  0.2722]],[[ 0.0639, -0.0234,  0.2650,  0.1432, -0.1783,  0.3224]]],grad_fn=<CopySlices>)

gci(t)=σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t−1))∂L∂σic,(t)=∂L∂si(t)∗gi(t)∂L∂bic=∂L∂σic,(t)∂L∂Wi,jc=∂L∂σic,(t)∗hj(t−1)∂L∂Ui,jc=∂L∂σic,(t)∗xj(t)∂L∂hj(t−1)=Σi∂L∂σic,(t)∗Wi,jcgc_i^{(t)}=\sigma(b_i^c+\Sigma_jU_{i,j}^cx_j^{(t)}+\Sigma_jW_{i,j}^ch_j{(t-1)})\\ \frac{\partial L}{\partial \sigma_i^{c,(t)}}=\frac{\partial L}{\partial s_i^{(t)}}*g_i^{(t)}\\ \frac{\partial L}{\partial b_i^c}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}\\ \frac{\partial L}{\partial W_{i,j}^c}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}*h_j^{(t-1)}\\ \frac{\partial L}{\partial U_{i,j}^c}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}*x_j^{(t)}\\ \frac{\partial L}{\partial h_j{(t-1)}}=\Sigma_i \frac{\partial L}{\partial \sigma_i^{c,(t)}}*W_{i,j}^{c}gci(t)=σ(bic+ΣjUi,jcxj(t)+ΣjWi,jchj(t1))σic,(t)L=si(t)Lgi(t)bicL=σic,(t)LWi,jcL=σic,(t)Lhj(t1)Ui,jcL=σic,(t)Lxj(t)hj(t1)L=Σiσic,(t)LWi,jc

dL_dgcx=torch.zeros(dL_do.shape)
dL_dgcx[-1]=dL_ds[-1]*g[-1]*gc[-1]*(1-gc[-1])
dL_dbc=torch.zeros(bc.shape)dL_dbc+=dL_dgcx[-1,0]print(dL_dgcx)
tensor([[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,0.0000e+00]],[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,0.0000e+00]],[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,0.0000e+00]],[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,0.0000e+00]],[[ 7.4212e-03, -1.2571e-04,  9.7080e-03,  1.1346e-05, -1.7320e-02,3.0807e-02]]], grad_fn=<CopySlices>)
dL_dWc=torch.zeros(Wc.shape)
dL_dUc=torch.zeros(Uc.shape)
i=-1
dL_dWc+=h_t_1[i, 0].reshape(hidden_dim, 1) * dL_dgcx[i]
dL_dUc += input_x[i].reshape(embedding_dim, 1) * dL_dgcx[i]
print(dL_dUc)
tensor([[ 9.5082e-03, -1.6107e-04,  1.2438e-02,  1.4536e-05, -2.2190e-02,3.9470e-02],[ 7.9811e-03, -1.3520e-04,  1.0440e-02,  1.2202e-05, -1.8626e-02,3.3131e-02],[ 5.8356e-03, -9.8853e-05,  7.6338e-03,  8.9215e-06, -1.3619e-02,2.4225e-02],[ 4.8314e-03, -8.1843e-05,  6.3202e-03,  7.3863e-06, -1.1276e-02,2.0056e-02],[-8.6024e-03,  1.4572e-04, -1.1253e-02, -1.3151e-05,  2.0076e-02,-3.5710e-02],[-2.9927e-03,  5.0695e-05, -3.9148e-03, -4.5752e-06,  6.9843e-03,-1.2423e-02]], grad_fn=<AddBackward0>)
dL_do[-2]+=torch.matmul(dL_dgcx[-1],torch.transpose(Wc,1,0))
print(dL_do)
tensor([[[  1.1719,   4.0198,  -0.1581,  -6.9059,  -4.1330,   5.0020]],[[  1.0842,  -0.5113,   0.2987,   0.7790,  -0.1800,   1.7739]],[[-16.1690, -10.2418,   9.0003,  10.4557,   6.8416, -34.2560]],[[  0.3128,   0.9465,  -0.0852,  -1.8964,  -1.1307,   1.4150]],[[  1.3049,  -0.6155,   0.3596,   0.9376,  -0.2166,   2.1351]]],grad_fn=<CopySlices>)

fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t−1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t−1))∂L∂bif=∂L∂f∗σ′∂L∂Wi,jf=∂L∂f∗σ′∗hj(t−1)∂L∂Ui,jf=∂L∂f∗σ′∗xj(t)∂L∂hj(t−1)+=Σi∂L∂f∗Wi,jc∂L∂big=∂L∂σic,(t)∂L∂Wi,jg=∂L∂σic,(t)∗hj(t−1)∂L∂Ui,jg=∂L∂σic,(t)∗xj(t)∂L∂hj(t−1)+=Σi∂L∂σic,(t)∗Wi,jcf_i^{(t)}=\sigma(b_i^f+\Sigma_jU_{i,j}^fx_j^{(t)}+\Sigma_jW_{i,j}^fh_j{(t-1)})\\ g_i^{(t)}=\sigma(b_i^g+\Sigma_jU_{i,j}^gx_j^{(t)}+\Sigma_jW_{i,j}^gh_j{(t-1)})\\ \frac{\partial L}{\partial b_i^f}=\frac{\partial L}{\partial f}*\sigma'\\ \frac{\partial L}{\partial W_{i,j}^f}=\frac{\partial L}{\partial f}*\sigma'*h_j^{(t-1)}\\ \frac{\partial L}{\partial U_{i,j}^f}=\frac{\partial L}{\partial f}*\sigma'*x_j^{(t)}\\ \frac{\partial L}{\partial h_j{(t-1)}}+=\Sigma_i \frac{\partial L}{\partial f}*W_{i,j}^{c}\\ \frac{\partial L}{\partial b_i^g}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}\\ \frac{\partial L}{\partial W_{i,j}^g}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}*h_j^{(t-1)}\\ \frac{\partial L}{\partial U_{i,j}^g}=\frac{\partial L}{\partial \sigma_i^{c,(t)}}*x_j^{(t)}\\ \frac{\partial L}{\partial h_j{(t-1)}}+=\Sigma_i \frac{\partial L}{\partial \sigma_i^{c,(t)}}*W_{i,j}^{c}fi(t)=σ(bif+ΣjUi,jfxj(t)+ΣjWi,jfhj(t1))gi(t)=σ(big+ΣjUi,jgxj(t)+ΣjWi,jghj(t1))bifL=fLσWi,jfL=fLσhj(t1)Ui,jfL=fLσxj(t)hj(t1)L+=ΣifLWi,jcbigL=σic,(t)LWi,jgL=σic,(t)Lhj(t1)Ui,jgL=σic,(t)Lxj(t)hj(t1)L+=Σiσic,(t)LWi,jc

dL_dfx=torch.zeros(dL_do.shape)
dL_dbf=torch.zeros(bf.shape)
dL_dWf=torch.zeros(Wf.shape)
dL_dUf=torch.zeros(Uf.shape)dL_dgx=torch.zeros(dL_do.shape)
dL_dbg=torch.zeros(bg.shape)
dL_dWg=torch.zeros(Wg.shape)
dL_dUg=torch.zeros(Ug.shape)

i=-1
#f
dL_dfx[i] = dL_df[i] * f[i] * (1 - f[i])
dL_dbf += dL_dfx[i, 0]
dL_dWf += h_t_1[i, 0].reshape(hidden_dim, 1) * dL_dfx[i]
dL_dUf += input_x[i].reshape(embedding_dim, 1) * dL_dfx[i]# g
dL_dgx[i] = dL_dg[i] * g[i] * (1 - g[i])
dL_dbg += dL_dgx[i, 0]
dL_dWg += h_t_1[i, 0].reshape(hidden_dim, 1) * dL_dgx[i]
dL_dUg += input_x[i].reshape(embedding_dim, 1) * dL_dgx[i]
dL_do[i - 1] += torch.matmul(dL_dfx[i], torch.transpose(Wf, 1, 0))
dL_do[i - 1] += torch.matmul(dL_dgx[i], torch.transpose(Wg, 1, 0))
i=-1
dL_dx=torch.zeros(input_x.shape)
dL_dx[i]+=torch.matmul(dL_dqx[i], torch.transpose(Uo, 1, 0))
dL_dx[i]+=torch.matmul(dL_dgcx[i], torch.transpose(Uc, 1, 0))
dL_dx[i]+=torch.matmul(dL_dfx[i], torch.transpose(Uf, 1, 0))
dL_dx[i]+=torch.matmul(dL_dgx[i], torch.transpose(Ug, 1, 0))
print(dL_dx)
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],[[ 0.0747, -0.1784, -0.0532, -0.0891,  0.0476, -0.1091]]],grad_fn=<CopySlices>)

2.序列标注

import torch
import torch.nn as nn
class LSTMTag:def __init__(self,embedding_dim,hidden_dim,vocab_size,tagset_size,layers_num):self.hidden_dim=hidden_dimself.vocab_size=vocab_sizeself.tagset_size=tagset_sizeself.layers_num=layers_numself.embedding_dim=embedding_dimself.embedding = torch.randn(self.vocab_size, self.embedding_dim)self.hidden=self.init_hidden()self.lstm=LSTM(self.embedding_dim,self.hidden_dim)self.lstm1=LSTM(self.hidden_dim,self.hidden_dim)self.hidden2tag=torch.randn(self.hidden_dim,self.tagset_size)self.lr=0.001def init_hidden(self):# 一开始并没有隐藏状态所以我们要先初始化一个# 关于维度为什么这么设计请参考Pytoch相关文档# 各个维度的含义是 (num_layers*num_directions, batch_size, hidden_dim)return (torch.zeros(1, 1, self.hidden_dim),torch.zeros(1, 1, self.hidden_dim))def log_softmax(self,x):e=torch.exp(x)s=torch.sum(e,axis=1)return torch.log(e)-torch.log(s).reshape(x.shape[0],1);def forward(self,x):self.one_hot = torch.zeros(len(x), self.vocab_size).scatter_(1,x.reshape(len(x),1), 1)# embedding_matrix = torch.randn(self.vocab_size, self.embedding_dim)my_embed = torch.matmul(self.one_hot, self.embedding)self.o,(h,c)=self.lstm.single_layer_LSTM(my_embed.view(len(x), 1, -1),self.hidden)# o1,(h1,c1)=self.lstm1.single_layer_LSTM(o,self.hidden)# print(o1)tagspace=torch.matmul(self.o.view(len(x), -1),self.hidden2tag)# print(tagspace)self.tag_score=self.log_softmax(tagspace)# print(self.tag_score)# print(self.o.view(len(x),-1).shape)return self.tag_scoredef BP(self,y):one_hot_y = torch.zeros(len(y), self.tagset_size).scatter_(1, y.reshape(len(y), 1), -1.)self.Loss =one_hot_y*self.tag_score# print(self.Loss)dL_dtagspace=torch.exp(self.Loss)-1self.Loss=torch.sum(self.Loss,axis=1)# print(dL_dtagspace.shape)d_hidden2tag=torch.matmul(torch.transpose(self.o.view(len(x),-1),1,0),dL_dtagspace)dL_do=torch.matmul(dL_dtagspace,torch.transpose(self.hidden2tag,1,0))# print(dL_do)dL_dembedding=self.lstm.BPTT(dL_do.view(len(x),1,-1))# print(self.one_hot.shape)dL_dEm=torch.matmul(torch.transpose(self.one_hot,1,0),dL_dembedding.view(len(y),-1))# print(d_hidden2tag)self.hidden2tag=self.hidden2tag-d_hidden2tag*self.lrself.embedding-=dL_dEm*self.lr# print(self.hidden2tag)# d_hidden2tag=dL_dtagspace
class LSTM:def __init__(self,embedding_dim,hidden_dim):self.hidden_dim = hidden_dimself.embedding_dim = embedding_dim# 遗忘门self.Uf = torch.randn(self.embedding_dim, self.hidden_dim)self.Wf = torch.randn(self.hidden_dim, self.hidden_dim)self.bf = torch.randn(self.hidden_dim)#输入门self.Ug = torch.randn(self.embedding_dim, self.hidden_dim)self.Wg = torch.randn( self.hidden_dim, self.hidden_dim)self.bg = torch.randn( self.hidden_dim)# 状态self.Uc = torch.randn( self.embedding_dim, self.hidden_dim)self.Wc = torch.randn( self.hidden_dim, self.hidden_dim)self.bc = torch.randn( self.hidden_dim)# 输出门参数self.Uo = torch.randn(self.embedding_dim, self.hidden_dim)self.Wo = torch.randn(self.hidden_dim, self.hidden_dim)self.bo = torch.randn(self.hidden_dim)self.lr=0.001def sigmoid(self,x):return 1 / (1 + torch.exp(-1 * x))def tanh(self,x):return (torch.exp(x) - torch.exp(-1 * x)) / (torch.exp(x) + torch.exp(-1 * x))def LSTM_Cell(self,input_x, h0, c0):f1 = self.sigmoid(self.bf + torch.matmul(input_x, self.Uf) + torch.matmul(h0, self.Wf))#     print(f1)g1 =self.sigmoid(self.bg + torch.matmul(input_x, self.Ug) + torch.matmul(h0, self.Wg))#     print(g1)gc0=self.sigmoid(self.bc + torch.matmul(input_x, self.Uc) + torch.matmul(h0, self.Wc))c1 = f1 * c0 + g1 * gc0#     print(c1)q1 = self.sigmoid(self.bo + torch.matmul(input_x, self.Uo) + torch.matmul(h0, self.Wo))#     print(q1)h1 = self.tanh(c1) * q1#     print(h1)return (h1, c1),f1,g1,q1,gc0# forwarddef single_layer_LSTM(self,input_x,hidden):h0,c0=hiddenself.h = torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)self.c = torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)self.f = torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)self.g = torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)self.q = torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)self.x=input_xself.gc=torch.zeros(input_x.shape[0], input_x.shape[1], self.hidden_dim)for i in range(len(input_x)):(h0, c0),f0,g0,q0,gc0 = self.LSTM_Cell(input_x[i], h0, c0)self.h[i] = h0self.c[i] = c0self.f[i] = f0self.g[i] = g0self.q[i] = q0self.gc[i] = gc0return self.h, (h0, c0)def BPTT(self,dL_do):# dL_do=torch.cat((torch.zeros(1,dL_do.shape[1],dL_do.shape[2]),dL_do),axis=0)dL_dq=torch.zeros(dL_do.shape)dL_ds=torch.zeros(dL_do.shape)dL_dqx = torch.zeros(dL_do.shape)# qdL_dbo = torch.zeros(self.bo.shape)h_t_1 = torch.zeros(self.h.shape)h_t_1[1:] = self.h[:-1]c_t_1 = torch.zeros(self.c.shape)c_t_1[1:] = self.c[:-1]dL_dWo = torch.zeros(self.Wo.shape)dL_dUo = torch.zeros(self.Uo.shape)# sdL_df=torch.zeros(dL_do.shape)dL_dg = torch.zeros(dL_do.shape)dL_dgcx = torch.zeros(dL_do.shape)#gcdL_dbc = torch.zeros(self.bc.shape)dL_dWc=torch.zeros(self.Wc.shape)dL_dUc=torch.zeros(self.Uc.shape)#fdL_dfx=torch.zeros(dL_do.shape)dL_dbf=torch.zeros(self.bf.shape)dL_dWf=torch.zeros(self.Wf.shape)dL_dUf=torch.zeros(self.Uf.shape)#gdL_dgx=torch.zeros(dL_do.shape)dL_dbg=torch.zeros(self.bg.shape)dL_dWg=torch.zeros(self.Wg.shape)dL_dUg=torch.zeros(self.Ug.shape)dL_dx = torch.zeros(self.x.shape)for i in range(len(dL_do)-1,-1,-1):#$ print(i)dL_dq[i] = self.tanh(self.c[i]) * dL_do[i]dL_ds[i] += dL_do[i] * (1 - self.tanh(self.c[i]) * self.tanh(self.c[i])) * self.q[i]dL_dqx[i] = dL_dq[i] * self.q [i]* (1 - self.q[i])dL_dbo+=dL_dqx[i,0]dL_dWo += h_t_1[i, 0].reshape(self.hidden_dim, 1) * dL_dqx[i]# dL_dbo = dL_dqxdL_dUo += self.x[i].reshape(self.embedding_dim, 1) * dL_dqx[i]# sdL_df[i]=dL_ds[i]*c_t_1[i]dL_dg[i]=dL_ds[i]*self.gc[i]dL_dgcx[i]=dL_ds[i]*self.g[i]*self.gc[i]*(1-self.gc[i])#gcdL_dbc+=dL_dgcx[i,0]dL_dWc+=h_t_1[i, 0].reshape(self.hidden_dim, 1) * dL_dgcx[i]dL_dUc += self.x[i].reshape(self.embedding_dim, 1) * dL_dgcx[i]#fdL_dfx[i] = dL_df[i] * self.f[i] * (1 - self.f[i])dL_dbf += dL_dfx[i, 0]dL_dWf += h_t_1[i, 0].reshape(self.hidden_dim, 1) * dL_dfx[i]dL_dUf += self.x[i].reshape(self.embedding_dim, 1) * dL_dfx[i]# gdL_dgx[i] = dL_dg[i] * self.g[i] * (1 - self.g[i])dL_dbg += dL_dgx[i, 0]dL_dWg += h_t_1[i, 0].reshape(self.hidden_dim, 1) * dL_dgx[i]dL_dUg += self.x[i].reshape(self.embedding_dim, 1) * dL_dgx[i]if(i>1):dL_do[i-1]+=torch.matmul(dL_dqx[i],torch.transpose(self.Wo,1,0))dL_do[i - 1] += torch.matmul(dL_dgcx[i], torch.transpose(self.Wc, 1, 0))dL_do[i - 1] += torch.matmul(dL_dfx[i], torch.transpose(self.Wf, 1, 0))dL_do[i - 1] += torch.matmul(dL_dgx[i], torch.transpose(self.Wg, 1, 0))dL_ds[i-1]+=dL_ds[i]*self.f[i]dL_dx[i] += torch.matmul(dL_dqx[i], torch.transpose(self.Uo, 1, 0))# print(dL_dx)dL_dx[i] += torch.matmul(dL_dgcx[i], torch.transpose(self.Uc, 1, 0))# print(dL_dx)dL_dx[i] += torch.matmul(dL_dfx[i], torch.transpose(self.Uf, 1, 0))dL_dx[i] += torch.matmul(dL_dgx[i], torch.transpose(self.Ug, 1, 0))self.Wo-=self.lr*dL_dWoself.bo-=self.lr*dL_dboself.Uo-=self.lr*dL_dUoself.Wc -= self.lr * dL_dWcself.bc-= self.lr * dL_dbcself.Uc -= self.lr * dL_dUcself.Wf -= self.lr * dL_dWfself.bf -= self.lr * dL_dbfself.Uf -= self.lr * dL_dUfself.Wg -= self.lr * dL_dWgself.bg -= self.lr * dL_dbgself.Ug -= self.lr * dL_dUgreturn dL_dx
def prepare_sequence(seq, to_ix):idxs = [to_ix[w] for w in seq]return torch.tensor(idxs, dtype=torch.long)training_data = [("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
for sent, tags in training_data:for word in sent:if word not in word_to_ix:word_to_ix[word] = len(word_to_ix)
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2}# 实际中通常使用更大的维度如32维, 64维.
# 这里我们使用小的维度, 为了方便查看训练过程中权重的变化.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6model = LSTMTag(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix),1)
print("训练前")
x=prepare_sequence(training_data[0][0],word_to_ix)
y=prepare_sequence(training_data[0][1],tag_to_ix)
print(torch.max(model.forward(x),axis=1))
# model.BP(y)
for epoch in range(30):for sentence, tags in training_data:x = prepare_sequence(sentence, word_to_ix)y = prepare_sequence(tags, tag_to_ix)model.forward(x)model.BP(y)
print("训练后")
x=prepare_sequence(training_data[0][0],word_to_ix)
y=prepare_sequence(training_data[0][1],tag_to_ix)
print(torch.max(model.forward(x),axis=1))
{'The': 0, 'dog': 1, 'ate': 2, 'the': 3, 'apple': 4, 'Everybody': 5, 'read': 6, 'that': 7, 'book': 8}
训练前
torch.return_types.max(
values=tensor([-0.6463, -0.5826, -0.7066, -0.2778, -0.2951]),
indices=tensor([1, 1, 1, 1, 1]))
训练后
torch.return_types.max(
values=tensor([-0.6426, -0.2794, -0.1518, -0.1473, -0.8550]),
indices=tensor([1, 1, 1, 1, 2]))

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/481559.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Science最新:Jeff Gore团队揭示复杂生态系统中涌现的相变

来源&#xff1a;集智俱乐部作者&#xff1a;胡脊梁编辑&#xff1a;邓一雪导语生态学致力于理解自然生态系统中的多样化的物种和复杂的动力学行为&#xff0c;然而科学家长期缺乏描述和预测生物多样性和生态动力学的统一框架。MIT物理系的胡脊梁和Jeff Gore等科学家结合理论和…

强化学习发现矩阵乘法算法,DeepMind再登Nature封面推出AlphaTensor

来源&#xff1a;机器之心 微信公众号DeepMind 的 Alpha 系列 AI 智能体家族又多了一个成员——AlphaTensor&#xff0c;这次是用来发现算法。数千年来&#xff0c;算法一直在帮助数学家们进行基本运算。早在很久之前&#xff0c;古埃及人就发明了一种不需要乘法表就能将两个数…

论文学习18-Relation extraction and the influence of automatic named-entity recognition(联合实体关系抽取模型,2007)

文章目录abstract1.introduction3.问题形式化4.系统架构5. 命名实体识别6.关系抽取&#xff08;核方法&#xff09;6.1global context kernel6.2 local context kernel6.3 shallow linguistic kernel7实验Giuliano, C., et al. “Relation extraction and the influence of aut…

Nature:进化新方式?线粒体DNA会插入我们的基因组

来源&#xff1a;生物通科学家们惊讶地发现&#xff0c;每4000个新生儿中就有一个会将线粒体中的一些遗传密码插入到我们的DNA中&#xff0c;这为人类的进化方式提供了新见解。剑桥大学和伦敦玛丽女王大学的研究人员表明&#xff0c;线粒体DNA也会出现在一些癌症DNA中&#xff…

论文学习19-Structured prediction models for RNN based sequence labeling in clinical text(LSTM_CRF,2016)

文章目录abstract1. Introduction2.相关工作3.方法3.1 Bi-LSTM (baseline)3.2BiLSTMCRF3.3 BiLSTM_CRF with pairwise modeling3.4 Approximate Skip-chain CRF5.实验Jagannatha, A. and H. Yu “Structured prediction models for RNN based sequence labeling in clinical te…

「深度学习表情动作单元识别」 最新2022研究综述

来源&#xff1a;专知基于深度学习的表情动作单元识别是计算机视觉与情感计算领域的热点课题.每个动作单元描述了一种人脸局部表情动作&#xff0c;其组合可定量地表示任意表情.当前动作单元识别主要面临标签稀缺、特征难捕捉和标签不均衡3个挑战因素. 基于此&#xff0c;本文将…

为什么物理诺奖颁给量子信息科学?——量子信息的过去、现在和未来

导语10月4日&#xff0c;2022年诺贝尔物理学奖授予 Alain Aspect, John F. Clauser 和 Anton Zeilinger&#xff0c;表彰他们“用纠缠光子实验&#xff0c;验证了量子力学违反贝尔不等式&#xff0c;开创了量子信息科学”。他们的研究为基于量子信息的新技术奠定了基础&#xf…

论文学习20-End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF(序列标注,2016ACL

文章目录abstract1.introduction2.Architecture2.1 CNN for Character-level Representation2.2 BiLSTM2.2.1 LSTM单元2.2.2BiLSTM2.3CRF2.4BiLSTM-CNNs-CRF3.训练Ma, X. and E. Hovy “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF.”abstract 最先进的序列…

可溶解光开关利用光控制神经元

ISTOCK来源&#xff1a;IEEE电气电子工程师大约20年前&#xff0c;出现了一种称为光遗传学的策略&#xff0c;用激光控制大脑活动。它利用病毒将基因插入细胞&#xff0c;使其对光敏感。光遗传学给研究人员提供了一种精确的方法来刺激或抑制大脑回路&#xff0c;并阐明它们在大…

生成式AI无敌了: 大神微调Stable Diffusion,打造神奇宝贝新世界

来源&#xff1a;大数据文摘授权转载自AI科技评论作者&#xff1a;李梅、施方圆编辑&#xff1a;陈彩娴作为一个强大、公开且足够简单的模型&#xff0c;最近大火的 Stable Diffusion 在文本生成图像之外&#xff0c;给了大家无限的创作可能性。最近&#xff0c;来自 Lambda La…

论文学习21-Globally Normalized Transition-Based Neural Networks(2016,标签偏差问题

文章目录abstract1.introduction2.Model2.1 Transition System2.2 全局和局部归一化3.训练3.2标签偏差问题abstract 介绍了一种基于全局规范化转换的神经网络模型&#xff0c;该模型实现了最先进的词性标注、依存分析和句子压缩结果。我们的模型是一个简单的前馈神经网络&#…

推翻单一干细胞理论:哺乳动物大脑中发现了第二种干细胞

来源&#xff1a;生物通在成年哺乳动物的大脑中&#xff0c;神经干细胞保证了新的神经细胞&#xff0c;即神经元的不断形成。这个过程被称为成年神经发生&#xff0c;帮助鼠维持它们的嗅觉。一个研究小组最近在鼠大脑中发现了第二种干细胞群&#xff0c;它主要参与成年鼠嗅球中…

论文阅读课1-Attention Guided Graph Convolutional Networks for Relation Extraction(关系抽取,图卷积,ACL2019,n元)

文章目录abstract1.introduction1.1 dense connectionGCN1.2 效果突出1.3 contribution2.Attention Guided GCNs2.1 GCNs2.2 Attention Guided Layer2.3 Densely Connected Layer2.4 线性层2.5 AGGCN for RE3.实验3.1 数据集3.2 设置3.3 n-ary3.4 句子级4.ablation Study4.相关…

Nat. Rev. Genet. | 通过可解释人工智能从深度学习中获得遗传学见解

编译 | 沈祥振审稿 | 夏忻焱今天为大家介绍的是来自Maxwell W. Libbrecht&#xff0c;Wyeth W. Wasserman和Sara Mostafavi的一篇关于人工智能对于基因组学的可解释性的研究的综述。基于深度学习的人工智能&#xff08;AI&#xff09;模型现在代表了基因组学研究中进行功能预测…

复杂系统的逆向工程——通过时间序列重构复杂网络和动力学

导语蛋白质相互作用网络、生态群落、全球气候系统……很多复杂系统都可以抽象为一个相互作用的网络和其上的动力学。传统的研究主要关注在如何构建网络动力学模型&#xff0c;从而产生和实验观测数据具有相似统计特征的结果。所谓的复杂系统逆向工程&#xff0c;就是反其道而行…

关系提取论文总结

文章目录1.模型总结1.1 基于序列的方法1.2 dependency-based&#xff08;基于依赖的&#xff09;(有图&#xff09;1.2.2 句间关系抽取1.5 自动学习特征的方法1.4 联合抽取模型1.6 RNN/CNN/GCN用于关系提取1.7 远程监督1.8句子级关系提取1.9MCR&#xff08;阅读理解&#xff09…

邬贺铨:“物超人”具有里程碑意义,五方面仍需发力

来源&#xff1a;人民邮电报作者&#xff1a;邬贺铨我国正式迈入“物超人”时代。据工业和信息化部最新数据显示&#xff0c;截至8月末&#xff0c;我国三家基础电信企业发展移动物联网终端用户16.98亿户&#xff0c;成为全球主要经济体中率先实现“物超人”的国家。“物超人”…

深度:计算机的本质到底是什么?

来源&#xff1a;图灵人工智能来源&#xff1a;www.cnblogs.com/jackyfei/p/13862607.html作者&#xff1a;张飞洪 01 抽象模型庄子说过吾生有崖&#xff0c;知无涯。以有限的生命去学习无尽的知识是很愚蠢的。所以&#xff0c;学习的终极目标一定不是知识本身&#xff0c;因为…

中科大郭光灿院士团队发PRL,量子力学基础研究取得重要进展

来源&#xff1a;FUTURE | 远见选编&#xff1a;FUTURE | 远见 闵青云 中国科学技术大学郭光灿院士团队在量子力学基础研究方面取得重要进展。该团队李传锋、黄运锋等人与西班牙理论物理学家合作&#xff0c;实验验证了基于局域操作和共享随机性&#xff08;LOSR, Local operat…

论文阅读课2-Inter-sentence Relation Extraction with Document-level (GCNN,句间关系抽取,ACL2019

文章目录abstract1.introduction2.model2.1输入层2.2构造图2.3 GCNN层2.4MIL-based Relation Classification3.实验设置3.1 数据集3.2 数据预处理3.3 基线模型3.4 训练3.5结果4.相关工作4.1 句子间关系抽取4.2 GCNN5. 结论相关博客Sahu, S. K., et al. (2019). Inter-sentence …