05.序列模型 W1.循环序列模型(作业:手写RNN+恐龙名字生成)

文章目录

测试题:参考博文

笔记:05.序列模型 W1.循环序列模型

作业1:建立你的循环神经网络

RNN 模型对序列问题(如NLP)非常有效,因为它有记忆,能记住一些信息,并传递至后面的时间步当中

  • 导入一些包
import numpy as np
from rnn_utils import *

1. RNN 前向传播

这是一个基本的RNN模型,其输入输出等长

1.1 RNN 单元

# GRADED FUNCTION: rnn_cell_forwarddef rnn_cell_forward(xt, a_prev, parameters):"""Implements a single forward step of the RNN-cell as described in Figure (2)Arguments:xt -- your input data at timestep "t", numpy array of shape (n_x, m).a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)ba --  Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)Returns:a_next -- next hidden state, of shape (n_a, m)yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)"""# Retrieve parameters from "parameters"Wax = parameters["Wax"]Waa = parameters["Waa"]Wya = parameters["Wya"]ba = parameters["ba"]by = parameters["by"]### START CODE HERE ### (≈2 lines)# compute next activation state using the formula given above# 按公式写即可a_next = np.tanh(np.dot(Wax, xt)+np.dot(Waa, a_prev)+ba)# compute output of the current cell using the formula given aboveyt_pred = softmax(np.dot(Wya, a_next)+by) ### END CODE HERE #### store values you need for backward propagation in cachecache = (a_next, a_prev, xt, parameters)return a_next, yt_pred, cache

1.2 RNN 前向传播

把上面的单元重复n次,前一个输出,作为下一个单元的输入

# GRADED FUNCTION: rnn_forwarddef rnn_forward(x, a0, parameters):"""Implement the forward propagation of the recurrent neural network described in Figure (3).Arguments:x -- Input data for every time-step, of shape (n_x, m, T_x).a0 -- Initial hidden state, of shape (n_a, m)parameters -- python dictionary containing:Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)ba --  Bias numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)Returns:a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)caches -- tuple of values needed for the backward pass, contains (list of caches, x)"""# Initialize "caches" which will contain the list of all cachescaches = []# Retrieve dimensions from shapes of x and Wyn_x, m, T_x = x.shapen_y, n_a = parameters["Wya"].shape### START CODE HERE #### initialize "a" and "y" with zeros (≈2 lines)a = np.zeros((n_a, m, T_x))y_pred = np.zeros((n_y, m, T_x))# Initialize a_next (≈1 line)a_next = a0# loop over all time-stepsfor t in range(T_x):# Update next hidden state, compute the prediction, get the cache (≈1 line)a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)# Save the value of the new "next" hidden state in a (≈1 line)a[:,:,t] = a_next# Save the value of the prediction in y (≈1 line)y_pred[:,:,t] = yt_pred# Append "cache" to "caches" (≈1 line)caches.append(cache)### END CODE HERE #### store values needed for backward propagation in cachecaches = (caches, x)return a, y_pred, caches

上面的模型存在梯度消失的问题,预测值是根据局部信息来预测的

下面我们建立更复杂的 LSTM 模型,它可以更好的解决梯度消失问题,它可以记住一些信息,并在后序很多步中保留

2. LSTM 网络

  • forget 门,Γf<t>∈(0,1)\Gamma_f^{<t>} \in (0,1)Γf<t>(0,1),等于0,就是忘记该信息
    Γf⟨t⟩=σ(Wf[a⟨t−1⟩,x⟨t⟩]+bf)\Gamma_f^{\langle t \rangle} = \sigma(W_f[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f) Γft=σ(Wf[at1,xt]+bf)
  • update 门,是否更新当前的信息至记忆
    Γu⟨t⟩=σ(Wu[a⟨t−1⟩,x{t}]+bu)\Gamma_u^{\langle t \rangle} = \sigma(W_u[a^{\langle t-1 \rangle}, x^{\{t\}}] + b_u)Γut=σ(Wu[at1,x{t}]+bu)
  • 更新单元
    c~⟨t⟩=tanh⁡(Wc[a⟨t−1⟩,x⟨t⟩]+bc)\tilde{c}^{\langle t \rangle} = \tanh(W_c[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)c~t=tanh(Wc[at1,xt]+bc)

c⟨t⟩=Γf⟨t⟩∗c⟨t−1⟩+Γu⟨t⟩∗c~⟨t⟩c^{\langle t \rangle} = \Gamma_f^{\langle t \rangle}* c^{\langle t-1 \rangle} + \Gamma_u^{\langle t \rangle} *\tilde{c}^{\langle t \rangle}ct=Γftct1+Γutc~t

  • output 门
    Γo⟨t⟩=σ(Wo[a⟨t−1⟩,x⟨t⟩]+bo)\Gamma_o^{\langle t \rangle}= \sigma(W_o[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)Γot=σ(Wo[at1,xt]+bo)

a⟨t⟩=Γo⟨t⟩∗tanh⁡(c⟨t⟩)a^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle})at=Γottanh(ct)

2.1 LSTM 单元

# GRADED FUNCTION: lstm_cell_forwarddef lstm_cell_forward(xt, a_prev, c_prev, parameters):"""Implement a single forward step of the LSTM-cell as described in Figure (4)Arguments:xt -- your input data at timestep "t", numpy array of shape (n_x, m).a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)parameters -- python dictionary containing:Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)bf -- Bias of the forget gate, numpy array of shape (n_a, 1)Wi -- Weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)bi -- Bias of the save gate, numpy array of shape (n_a, 1)Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)bc --  Bias of the first "tanh", numpy array of shape (n_a, 1)Wo -- Weight matrix of the focus gate, numpy array of shape (n_a, n_a + n_x)bo --  Bias of the focus gate, numpy array of shape (n_a, 1)Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)Returns:a_next -- next hidden state, of shape (n_a, m)c_next -- next memory state, of shape (n_a, m)yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilda),c stands for the memory value"""# Retrieve parameters from "parameters"Wf = parameters["Wf"]bf = parameters["bf"]Wi = parameters["Wi"]bi = parameters["bi"]Wc = parameters["Wc"]bc = parameters["bc"]Wo = parameters["Wo"]bo = parameters["bo"]Wy = parameters["Wy"]by = parameters["by"]# Retrieve dimensions from shapes of xt and Wyn_x, m = xt.shapen_y, n_a = Wy.shape### START CODE HERE #### Concatenate a_prev and xt (≈3 lines)concat = np.concatenate((a_prev, xt), axis=0)concat[: n_a, :] = a_prevconcat[n_a :, :] = xt# Compute values for ft, it, cct, c_next, ot, a_next using the formulas given figure (4) (≈6 lines)ft = sigmoid(np.dot(Wf, concat)+bf) # forget 门it = sigmoid(np.dot(Wi, concat)+bi) # update 门cct = np.tanh(np.dot(Wc, concat)+bc)c_next = ft*c_prev + it*cctot = sigmoid(np.dot(Wo, concat)+bo) # output 门a_next = ot*np.tanh(c_next)# Compute prediction of the LSTM cell (≈1 line)yt_pred = softmax(np.dot(Wy, a_next)+by)### END CODE HERE #### store values needed for backward propagation in cachecache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)return a_next, c_next, yt_pred, cache

2.2 LSTM 前向传播

# GRADED FUNCTION: lstm_forwarddef lstm_forward(x, a0, parameters):"""Implement the forward propagation of the recurrent neural network using an LSTM-cell described in Figure (3).Arguments:x -- Input data for every time-step, of shape (n_x, m, T_x).a0 -- Initial hidden state, of shape (n_a, m)parameters -- python dictionary containing:Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)bf -- Bias of the forget gate, numpy array of shape (n_a, 1)Wi -- Weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)bi -- Bias of the save gate, numpy array of shape (n_a, 1)Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)Wo -- Weight matrix of the focus gate, numpy array of shape (n_a, n_a + n_x)bo -- Bias of the focus gate, numpy array of shape (n_a, 1)Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)Returns:a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)"""# Initialize "caches", which will track the list of all the cachescaches = []### START CODE HERE #### Retrieve dimensions from shapes of xt and Wy (≈2 lines)n_x, m, T_x = x.shapen_y, n_a = parameters['Wy'].shape# initialize "a", "c" and "y" with zeros (≈3 lines)a = np.zeros((n_a, m, T_x))c = np.zeros((n_a, m, T_x))y = np.zeros((n_y, m, T_x))# Initialize a_next and c_next (≈2 lines)a_next = a0c_next = np.zeros((n_a, m))# loop over all time-stepsfor t in range(T_x):# Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)a_next, c_next, yt, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters)# Save the value of the new "next" hidden state in a (≈1 line)a[:,:,t] = a_next# Save the value of the prediction in y (≈1 line)y[:,:,t] = yt# Save the value of the next cell state (≈1 line)c[:,:,t]  = c_next# Append the cache into caches (≈1 line)caches.append(cache)### END CODE HERE #### store values needed for backward propagation in cachecaches = (caches, x)return a, y, c, caches

3. RNN 反向传播

深度学习框架一般都会帮你自动实现反向传播,下面我们来简要看看

3.1 基础 RNN 反向传播

def rnn_cell_backward(da_next, cache):"""Implements the backward pass for the RNN-cell (single time-step).Arguments:da_next -- Gradient of loss with respect to next hidden statecache -- python dictionary containing useful values (output of rnn_step_forward())Returns:gradients -- python dictionary containing:dx -- Gradients of input data, of shape (n_x, m)da_prev -- Gradients of previous hidden state, of shape (n_a, m)dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)dba -- Gradients of bias vector, of shape (n_a, 1)"""# Retrieve values from cache(a_next, a_prev, xt, parameters) = cache# Retrieve values from parametersWax = parameters["Wax"]Waa = parameters["Waa"]Wya = parameters["Wya"]ba = parameters["ba"]by = parameters["by"]### START CODE HERE #### compute the gradient of tanh with respect to a_next (≈1 line)dtanh = (1-a_next**2)*da_next# compute the gradient of the loss with respect to Wax (≈2 lines)dxt = np.dot(Wax.T, dtanh)dWax = np.dot(dtanh, xt.T)# compute the gradient with respect to Waa (≈2 lines)da_prev = np.dot(Waa.T, dtanh)dWaa = np.dot(dtanh, a_prev.T)# compute the gradient with respect to b (≈1 line)dba = np.sum(dtanh, axis=1, keepdims=True)### END CODE HERE #### Store the gradients in a python dictionarygradients = {"dxt": dxt, "da_prev": da_prev, "dWax": dWax, "dWaa": dWaa, "dba": dba}return gradients
  • 在整个 RNN 网络上实现反向传播
def rnn_backward(da, caches):"""Implement the backward pass for a RNN over an entire sequence of input data.Arguments:da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)caches -- tuple containing information from the forward pass (rnn_forward)Returns:gradients -- python dictionary containing:dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)dba -- Gradient w.r.t the bias, of shape (n_a, 1)"""### START CODE HERE #### Retrieve values from the first cache (t=1) of caches (≈2 lines)(caches, x) = caches(a1, a0, x1, parameters) = caches[0]# Retrieve dimensions from da's and x1's shapes (≈2 lines)n_a, m, T_x = da.shapen_x, m = x1.shape# initialize the gradients with the right sizes (≈6 lines)dx = np.zeros((n_x, m, T_x))dWax = np.zeros((n_a, n_x))dWaa = np.zeros((n_a, n_a))dba = np.zeros((n_a, 1))da0 = np.zeros((n_a, m))da_prevt = np.zeros((n_a, m))# Loop through all the time stepsfor t in reversed(range(T_x)):# Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)gradients = rnn_cell_backward(da[:,:,t]+da_prevt, caches[t])# Retrieve derivatives from gradients (≈ 1 line)dxt, da_prevt, dWaxt, dWaat, dbat = gradients['dxt'],gradients['da_prev'],gradients['dWax'],gradients['dWaa'],gradients['dba']# Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)dx[:, :, t] = dxtdWax = dWax + dWaxtdWaa = dWaa + dWaatdba = dba + dbat# Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line) da0 = da_prevt### END CODE HERE #### Store the gradients in a python dictionarygradients = {"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba}return gradients

3.2 LSTM 反向传播

  • gate 导数:

dΓo⟨t⟩=danext∗tanh⁡(cnext)∗Γo⟨t⟩∗(1−Γo⟨t⟩)d \Gamma_o^{\langle t \rangle} = da_{next}*\tanh(c_{next}) * \Gamma_o^{\langle t \rangle}*(1-\Gamma_o^{\langle t \rangle})dΓot=danexttanh(cnext)Γot(1Γot)

dc~⟨t⟩=dcnext∗Γi⟨t⟩+Γo⟨t⟩(1−tanh⁡(cnext)2)∗it∗danext∗c~⟨t⟩∗(1−tanh⁡(c~)2)d\tilde c^{\langle t \rangle} = dc_{next}*\Gamma_i^{\langle t \rangle}+ \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * i_t * da_{next} * \tilde c^{\langle t \rangle} * (1-\tanh(\tilde c)^2) dc~t=dcnextΓit+Γot(1tanh(cnext)2)itdanextc~t(1tanh(c~)2)

dΓu⟨t⟩=dcnext∗c~⟨t⟩+Γo⟨t⟩(1−tanh⁡(cnext)2)∗c~⟨t⟩∗danext∗Γu⟨t⟩∗(1−Γu⟨t⟩)d\Gamma_u^{\langle t \rangle} = dc_{next}*\tilde c^{\langle t \rangle} + \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * \tilde c^{\langle t \rangle} * da_{next}*\Gamma_u^{\langle t \rangle}*(1-\Gamma_u^{\langle t \rangle})dΓut=dcnextc~t+Γot(1tanh(cnext)2)c~tdanextΓut(1Γut)

dΓf⟨t⟩=dcnext∗c~prev+Γo⟨t⟩(1−tanh⁡(cnext)2)∗cprev∗danext∗Γf⟨t⟩∗(1−Γf⟨t⟩)d\Gamma_f^{\langle t \rangle} = dc_{next}*\tilde c_{prev} + \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * c_{prev} * da_{next}*\Gamma_f^{\langle t \rangle}*(1-\Gamma_f^{\langle t \rangle})dΓft=dcnextc~prev+Γot(1tanh(cnext)2)cprevdanextΓft(1Γft)

  • parameter 导数

dWf=dΓf⟨t⟩∗(aprevxt)TdW_f = d\Gamma_f^{\langle t \rangle} * \begin{pmatrix} a_{prev} \\ x_t\end{pmatrix}^TdWf=dΓft(aprevxt)T

dWu=dΓu⟨t⟩∗(aprevxt)TdW_u = d\Gamma_u^{\langle t \rangle} * \begin{pmatrix} a_{prev} \\ x_t\end{pmatrix}^TdWu=dΓut(aprevxt)T

dWc=dc~⟨t⟩∗(aprevxt)TdW_c = d\tilde c^{\langle t \rangle} * \begin{pmatrix} a_{prev} \\ x_t\end{pmatrix}^TdWc=dc~t(aprevxt)T

dWo=dΓo⟨t⟩∗(aprevxt)TdW_o = d\Gamma_o^{\langle t \rangle} * \begin{pmatrix} a_{prev} \\ x_t\end{pmatrix}^TdWo=dΓot(aprevxt)T

为了计算 dbf,dbu,dbc,dbodb_f, db_u, db_c, db_odbf,dbu,dbc,dbo ,在 dΓf⟨t⟩,dΓu⟨t⟩,dc~⟨t⟩,dΓo⟨t⟩d\Gamma_f^{\langle t \rangle}, d\Gamma_u^{\langle t \rangle}, d\tilde c^{\langle t \rangle}, d\Gamma_o^{\langle t \rangle}dΓft,dΓut,dc~t,dΓot 水平轴 (axis= 1)上求和 ,注意keep_dims = True

daprev=WfT∗dΓf⟨t⟩+WuT∗dΓu⟨t⟩+WcT∗dc~⟨t⟩+WoT∗dΓo⟨t⟩da_{prev} = W_f^T*d\Gamma_f^{\langle t \rangle} + W_u^T * d\Gamma_u^{\langle t \rangle}+ W_c^T * d\tilde c^{\langle t \rangle} + W_o^T * d\Gamma_o^{\langle t \rangle}daprev=WfTdΓft+WuTdΓut+WcTdc~t+WoTdΓot

dcprev=dcnextΓf⟨t⟩+Γo⟨t⟩∗(1−tanh⁡(cnext)2)∗Γf⟨t⟩∗danextdc_{prev} = dc_{next}\Gamma_f^{\langle t \rangle} + \Gamma_o^{\langle t \rangle} * (1- \tanh(c_{next})^2)*\Gamma_f^{\langle t \rangle}*da_{next}dcprev=dcnextΓft+Γot(1tanh(cnext)2)Γftdanext

dx⟨t⟩=WfT∗dΓf⟨t⟩+WuT∗dΓu⟨t⟩+WcT∗dc~t+WoT∗dΓo⟨t⟩dx^{\langle t \rangle} = W_f^T*d\Gamma_f^{\langle t \rangle} + W_u^T * d\Gamma_u^{\langle t \rangle}+ W_c^T * d\tilde c_t + W_o^T * d\Gamma_o^{\langle t \rangle}dxt=WfTdΓft+WuTdΓut+WcTdc~t+WoTdΓot
注:感觉上面的公式跟正确答案的代码有点对不上。

def lstm_cell_backward(da_next, dc_next, cache):"""Implement the backward pass for the LSTM-cell (single time-step).Arguments:da_next -- Gradients of next hidden state, of shape (n_a, m)dc_next -- Gradients of next cell state, of shape (n_a, m)cache -- cache storing information from the forward passReturns:gradients -- python dictionary containing:dxt -- Gradient of input data at time-step t, of shape (n_x, m)da_prev -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)dc_prev -- Gradient w.r.t. the previous memory state, of shape (n_a, m, T_x)dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)dWi -- Gradient w.r.t. the weight matrix of the input gate, numpy array of shape (n_a, n_a + n_x)dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)dWo -- Gradient w.r.t. the weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)dbo -- Gradient w.r.t. biases of the save gate, of shape (n_a, 1)"""# Retrieve information from "cache"(a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters) = cache### START CODE HERE #### Retrieve dimensions from xt's and a_next's shape (≈2 lines)n_x, m = xt.shapen_a, m = a_next.shape# Compute gates related derivatives, you can find their values can be found by looking carefully at equations (7) to (10) (≈4 lines)dot = da_next*np.tanh(c_next)*ot*(1-ot)dcct = (dc_next*it+ot*(1-np.tanh(c_next)**2)*it*da_next)*(1-cct**2)dit = (dc_next*cct+ot*(1-np.tanh(c_next)**2)*cct*da_next)*it*(1-it)dft = (dc_next*c_prev+ot*(1-np.tanh(c_next)**2)*c_prev*da_next)*ft*(1-ft)# Compute parameters related derivatives. Use equations (11)-(14) (≈8 lines)concat = np.concatenate((a_prev, xt), axis=0)dWf = np.dot(dft,concat.T)dWi = np.dot(dit,concat.T)dWc = np.dot(dcct,concat.T)dWo = np.dot(dot,concat.T)dbf = np.sum(dft, axis=1, keepdims=True)dbi = np.sum(dit, axis=1, keepdims=True)dbc = np.sum(dcct, axis=1, keepdims=True)dbo = np.sum(dot, axis=1, keepdims=True)# Compute derivatives w.r.t previous hidden state, previous memory state and input. Use equations (15)-(17). (≈3 lines)da_prev = np.dot(parameters['Wf'][:, :n_a].T, dft)+np.dot(parameters['Wi'][:, :n_a].T, dit)+np.dot(parameters['Wc'][:, :n_a].T,dcct)+np.dot(parameters['Wo'][:, :n_a].T,dot)dc_prev = dc_next*ft+ot*(1-np.tanh(c_next)**2)*ft*da_nextdxt = np.dot(parameters['Wf'][:, n_a:].T,dft)+np.dot(parameters['Wi'][:, n_a:].T,dit)+np.dot(parameters['Wc'][:, n_a:].T,dcct)+np.dot(parameters['Wo'][:, n_a:].T,dot)### END CODE HERE #### Save gradients in dictionarygradients = {"dxt": dxt, "da_prev": da_prev, "dc_prev": dc_prev, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi,"dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo}return gradients

3.3 LSTM RNN网络反向传播

def lstm_backward(da, caches):"""Implement the backward pass for the RNN with LSTM-cell (over a whole sequence).Arguments:da -- Gradients w.r.t the hidden states, numpy-array of shape (n_a, m, T_x)dc -- Gradients w.r.t the memory states, numpy-array of shape (n_a, m, T_x)caches -- cache storing information from the forward pass (lstm_forward)Returns:gradients -- python dictionary containing:dx -- Gradient of inputs, of shape (n_x, m, T_x)da0 -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)dWi -- Gradient w.r.t. the weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)dWo -- Gradient w.r.t. the weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)dbo -- Gradient w.r.t. biases of the save gate, of shape (n_a, 1)"""# Retrieve values from the first cache (t=1) of caches.(caches, x) = caches(a1, c1, a0, c0, f1, i1, cc1, o1, x1, parameters) = caches[0]### START CODE HERE #### Retrieve dimensions from da's and x1's shapes (≈2 lines)n_a, m, T_x = da.shapen_x, m = x1.shape# initialize the gradients with the right sizes (≈12 lines)dx = np.zeros([n_x, m, T_x])da0 = np.zeros([n_a, m])da_prevt = np.zeros([n_a, 1])dc_prevt = np.zeros([n_a, 1])dWf = np.zeros([n_a, n_a + n_x])dWi = np.zeros([n_a, n_a + n_x])dWc = np.zeros([n_a, n_a + n_x])dWo = np.zeros([n_a, n_a + n_x])dbf = np.zeros([n_a, 1])dbi = np.zeros([n_a, 1])dbc = np.zeros([n_a, 1])dbo = np.zeros([n_a, 1])# loop back over the whole sequencefor t in reversed(range(T_x)):# Compute all gradients using lstm_cell_backwardgradients = lstm_cell_backward(da[:,:,t], dc_prevt, caches[t])# da_prevt, dc_prevt = gradients['da_prev'], gradients["dc_prev"]# Store or add the gradient to the parameters' previous step's gradientdx[:,:,t] = gradients['dxt']dWf = dWf+gradients['dWf']dWi = dWi+gradients['dWi']dWc = dWc+gradients['dWc']dWo = dWo+gradients['dWo']dbf = dbf+gradients['dbf']dbi = dbi+gradients['dbi']dbc = dbc+gradients['dbc']dbo = dbo+gradients['dbo']# Set the first activation's gradient to the backpropagated gradient da_prev.da0 = gradients['da_prev']### END CODE HERE #### Store the gradients in a python dictionarygradients = {"dx": dx, "da0": da0, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi,"dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo}return gradients

作业2:字符级语言模型:恐龙岛

恐龙回归了,你要给恐龙命名,你的助手收集了他们能找到的所有恐龙名称的列表,并将它们编译到这个数据集中。

要创建新的恐龙名称,您将构建一个字符级语言模型来生成新名称。您的算法将学习不同的名称模式,并随机生成新的名称。

通过完成这项作业,你将学到:

  • 如何存储文本数据以使用RNN进行处理
  • 如何合成数据,通过在每个时间步采样预测并将其传递给下一个RNN单元
  • 如何建立字符级文本生成RNN网络
  • 为什么梯度修剪很重要

加载一些包

import numpy as np
from utils import *
import random
from random import shuffle

1. 问题陈述

1.1 数据集和预处理

data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

输出:

There are 19909 total characters and 27 unique characters in your data.

所有恐龙的名字有 26个唯一的字母,还有\n

  • 建立 字符:数字 哈希映射关系
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)
print(char_to_ix)

输出:

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}{'\n': 0, 'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}

1.2 模型预览

模型结构:

  • 初始化参数
  • 运行优化循环
    1.前向传播计算损失
    2.反向传播计算对应的梯度
    3.梯度修剪,防止梯度爆炸
    4.使用梯度更新参数
  • 返回学习到的参数

2. 构建模块

模块1:梯度修剪,防止梯度爆炸
模块2:采样,生成字符

2.1 在优化循环中进行梯度修剪


在更新参数之前,先对梯度进行修剪,限制在一定的大小范围内,对不在范围内的取最近的区间端点

numpy.clip(a, a_min, a_max, out=None)https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html

### GRADED FUNCTION: clipdef clip(gradients, maxValue):'''Clips the gradients' values between minimum and maximum.Arguments:gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValueReturns: gradients -- a dictionary with the clipped gradients.'''dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']### START CODE HERE #### clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)for gradient in [dWax, dWaa, dWya, db, dby]:np.clip(gradient, -maxValue, maxValue, out=gradient)### END CODE HERE ###gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}return gradients

2.2 采样

假设你的模型已经训练好了,你要生成新的文本(字符)

步骤:

  1. 给模型一个虚拟的输入 x⟨1⟩=0⃗x^{\langle 1 \rangle} = \vec{0}x1=0a⟨0⟩=0⃗a^{\langle 0 \rangle} = \vec{0}a0=0

  2. 运行一次前向传播,得到 a⟨1⟩a^{\langle 1 \rangle}a1y^⟨1⟩\hat{y}^{\langle 1 \rangle}y^1

a⟨t+1⟩=tanh⁡(Waxx⟨t⟩+Waaa⟨t⟩+b)a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)at+1=tanh(Waxxt+Waaat+b)

z⟨t+1⟩=Wyaa⟨t+1⟩+byz^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_yzt+1=Wyaat+1+by

y^⟨t+1⟩=softmax(z⟨t+1⟩)\hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })y^t+1=softmax(zt+1)

  1. 根据 y^⟨t+1⟩\hat{y}^{\langle t+1 \rangle }y^t+1 的概率分布选择一个字符,可以使用 np.random.choice

一个例子:

np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())
  1. 使用 ont-hot 编码后的 x⟨t+1⟩x^{\langle t + 1 \rangle }xt+1 写入 xxx ,前向传播 x⟨t+1⟩x^{\langle t + 1 \rangle }xt+1 直到遇见\n(EOS 结束标志)
# GRADED FUNCTION: sampledef sample(parameters, char_to_ix, seed):"""Sample a sequence of characters according to a sequence of probability distributions output of the RNNArguments:parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. char_to_ix -- python dictionary mapping each character to an index.seed -- used for grading purposes. Do not worry about it.Returns:indices -- a list of length n containing the indices of the sampled characters."""# Retrieve parameters and relevant shapes from "parameters" dictionaryWaa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']vocab_size = by.shape[0]n_a = Waa.shape[1]### START CODE HERE #### Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)x = np.zeros((vocab_size, 1))# Step 1': Initialize a_prev as zeros (≈1 line)a_prev = np.zeros((n_a, 1))# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)indices = []# Idx is a flag to detect a newline character, we initialize it to -1idx = -1 # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well # trained model), which helps debugging and prevents entering an infinite loop. counter = 0newline_character = char_to_ix['\n']while (idx != newline_character and counter != 50):# Step 2: Forward propagate x using the equations (1), (2) and (3)a = np.tanh(np.dot(Wax, x)+np.dot(Waa, a_prev)+b)z = np.dot(Wya, a)+byy = softmax(z)# for grading purposesnp.random.seed(counter+seed) # Step 3: Sample the index of a character within the vocabulary from the probability distribution yidx = np.random.choice(list(range(vocab_size)), p = y.ravel())# Append the index to "indices"indices.append(idx)# Step 4: Overwrite the input character as the one corresponding to the sampled index.x = np.zeros((vocab_size, 1))x[idx] = 1# Update "a_prev" to be "a"a_prev = a# for grading purposesseed += 1counter +=1### END CODE HERE ###if (counter == 50):indices.append(char_to_ix['\n'])return indices

3. 建立语言模型

3.1 梯度下降

已经写好的函数:

def rnn_forward(X, Y, a_prev, parameters):""" Performs the forward propagation through the RNN and computes the cross-entropy loss.It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""....return loss, cachedef rnn_backward(X, Y, parameters, cache):""" Performs the backward propagation through time to compute the gradients of the loss with respectto the parameters. It returns also all the hidden states."""...return gradients, adef update_parameters(parameters, gradients, learning_rate):""" Updates parameters using the Gradient Descent Update Rule."""...return parameters
  • 优化过程如下:
# GRADED FUNCTION: optimizedef optimize(X, Y, a_prev, parameters, learning_rate = 0.01):"""Execute one step of the optimization to train the model.Arguments:X -- list of integers, where each integer is a number that maps to a character in the vocabulary.Y -- list of integers, exactly the same as X but shifted one index to the left.a_prev -- previous hidden state.parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)b --  Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)learning_rate -- learning rate for the model.Returns:loss -- value of the loss function (cross-entropy)gradients -- python dictionary containing:dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)db -- Gradients of bias vector, of shape (n_a, 1)dby -- Gradients of output bias vector, of shape (n_y, 1)a[len(X)-1] -- the last hidden state, of shape (n_a, 1)"""### START CODE HERE #### Forward propagate through time (≈1 line)loss, cache = rnn_forward(X,Y,a_prev,parameters)# Backpropagate through time (≈1 line)gradients, a = rnn_backward(X,Y,parameters,cache)# Clip your gradients between -5 (min) and 5 (max) (≈1 line)gradients = clip(gradients, maxValue=5)# Update parameters (≈1 line)parameters = update_parameters(parameters, gradients, learning_rate)### END CODE HERE ###return loss, gradients, a[len(X)-1]

3.2 训练模型

给定恐龙名称的数据集,使用数据集的每一行(一个名称)作为一个训练样本。
每100步随机梯度下降,抽样10个随机选择的名字,看看算法是如何做的,记住随机打乱数据集

当样本包含一个恐龙的名字时,创建训练样本 (X,Y)(X,Y)(X,Y)

index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]] 
Y = X[1:] + [char_to_ix["\n"]]

Y 跟 X 一样,但是往左偏移了 1 位,最后加了一个结束符\n

# GRADED FUNCTION: modeldef model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):"""Trains the model and generates dinosaur names. Arguments:data -- text corpusix_to_char -- dictionary that maps the index to a characterchar_to_ix -- dictionary that maps a character to an indexnum_iterations -- number of iterations to train the model forn_a -- number of units of the RNN celldino_names -- number of dinosaur names you want to sample at each iteration. vocab_size -- number of unique characters found in the text, size of the vocabularyReturns:parameters -- learned parameters"""# Retrieve n_x and n_y from vocab_sizen_x, n_y = vocab_size, vocab_size# Initialize parametersparameters = initialize_parameters(n_a, n_x, n_y)# Initialize loss (this is required because we want to smooth our loss, don't worry about it)loss = get_initial_loss(vocab_size, dino_names)# Build list of all dinosaur names (training examples).with open("dinos.txt") as f:examples = f.readlines()examples = [x.lower().strip() for x in examples]# Shuffle list of all dinosaur namesshuffle(examples)# Initialize the hidden state of your LSTMa_prev = np.zeros((n_a, 1))# Optimization loopfor j in range(num_iterations):### START CODE HERE #### Use the hint above to define one training example (X,Y) (≈ 2 lines)index = j%len(examples)X = [None]+[char_to_ix[ch] for ch in examples[index]]Y = X[1:] + [char_to_ix['\n']]# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters# Choose a learning rate of 0.01curr_loss, gradients, a_prev = optimize(X,Y,a_prev,parameters,learning_rate=0.01)### END CODE HERE #### Use a latency trick to keep the loss smooth. It happens here to accelerate the training.loss = smooth(loss, curr_loss)# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properlyif j % 2000 == 0:print('Iteration: %d, Loss: %f' % (j, loss) + '\n')# The number of dinosaur names to printseed = 0for name in range(dino_names):# Sample indices and print themsampled_indices = sample(parameters, char_to_ix, seed)print_sample(sampled_indices, ix_to_char)seed += 1  # To get the same result for grading purposed, increment the seed by one. print('\n')return parameters
  • 运行模型
parameters = model(data, ix_to_char, char_to_ix)

您应该观察模型在第一次迭代中输出随机字符。
在几千次迭代之后,模型应该学会生成看起来合理的名称。

  • 后续生成的大部分带有osaurus后缀(拉丁词根,蜥蜴类的)
Iteration: 0, Loss: 23.093929Nkzxwtdmfqoeyhsqwasjjjvu
Kneb
Kzxwtdmfqoeyhsqwasjjjvu
Neb
Zxwtdmfqoeyhsqwasjjjvu
Eb
XwtdmfqoeyhsqwasjjjvuIteration: 2000, Loss: 27.865115Livtos
Hnba
Iwtos
Lca
Xuscandorawhus
Ba
TosIteration: 4000, Loss: 25.632137Livosaqrasaurus
Imacaipqia
Iwtosaurus
Lebagosan
Xusiangopdtipos
Acaipon
TorangosaurusIteration: 6000, Loss: 24.694657Mhytosaurus
Imacaesaurus
Iustolmascatarosaurus
Macagptoia
Wustandosaurus
Baaerpe
StoimatonyirosaurusIteration: 8000, Loss: 24.138770Nhyusicheoravfpsadrenitochustelanfetalkang
Klecalosaurus
Lyusodomophxgshuaomimus
Ngaagosaurus
Xutognatoptkoroclingos
Eeahosaurus
TroenatoptloroclingosIteration: 10000, Loss: 23.604738Ngyusichaosaurus
Inecamosaurus
Kytrodoninaweosanqosaurosaurus
Ncaadosaurus
Xustangosaurus
Caadosaurus
TrocheosaurusIteration: 12000, Loss: 23.576294Mivustandosaurus
Inceaeus
Jyustandorix
Macacitadantithinviceyalosaurus
Xustanesaurus
Cabarsan
TrrangosaurusIteration: 14000, Loss: 23.446216Ngyrosaurus
Kiecanosaurus
Lyuroknesaurus
Nebairopadrus
Xusrangpreusaurus
Daahosaurus
TorangosaurusIteration: 16000, Loss: 23.113554Mewtosaurus
Inedahosaurus
Iwtroceplocuriosaurus
Macamosaurus
Xustangriasaurus
Cabarpelarops
TroceratosaurusIteration: 18000, Loss: 23.254092Mevutoneosaurus
Inecaltona
Kyutollessaurus
Macaisteialus
Xustarchulultitan
Caaerta
TrodicticurotoknathusIteration: 20000, Loss: 23.110590Onwutonganmaurosaurus
Lkehalosaurus
Lyutolidon
Omaakrong
Xwuterasaurus
Daakosaurus
TrokianlausIteration: 22000, Loss: 22.879895Lixsopelisaurus
Indaaerosaurus
Iwuskanesaurus
Lecacosaurus
Yuusangosaurus
Ccacosaurus
TrochenoguchosaurusIteration: 24000, Loss: 22.836100Miwtosaurus
Kidiabrong
Lyuspangtomuqusgarihisialopupia
Macalosaurus
Ywurophosaurus
Edalosaurus
TyrhimosaurusIteration: 26000, Loss: 22.734218Levotolia
Ilaca
Kyusolegosaurus
Lacacisaurus
Wstrasaurus
Caaeosaurus
SurapignaveratapaldysIteration: 28000, Loss: 22.750129Piwustaorathus
Ligabiskia
Lyvusaurus
Pecalosaurus
Xutolomisaurus
Egaiskia
TrocibisaurusIteration: 30000, Loss: 22.524480Lixusaurus
Hicaaeros
Ivrpolopopaudus
Lebairus
Xuromelosaurus
Baaishaecitaurus
SurciinidusIteration: 32000, Loss: 22.514697Mgxusoconltfus
Kiceadosaurus
Lyusteodon
Ngaberopa
Wusteodon
Cabbqukaclus
SurangosaurusIteration: 34000, Loss: 22.639142Llytrodon
Ingaaeropechus
Ivstonnatopulorocophisairus
Lecagosaurus
Xusudolosaurus
Caadosaurus
Surangosaurus

结论:

  • 可以看到,算法已经开始产生可信的恐龙名字接近训练结束
  • 起初,它是生成随机字符,但到最后你可以看到恐龙的名字有很酷的结尾
  • 模型也了解到恐龙的名字往往以saurus(蜥蜴)donaurator等结尾

4. 创作莎士比亚诗歌

你可以使用莎士比亚诗集,而不是从恐龙名字的数据集中学习。使用 LSTM 单元,你可以学习更长的依赖关系跨越很多字符

from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io

模型已经是训练好的,把这个模型再训练一个时代。当它完成时,可以运行generate_output,它将提示您输入(<40个字符)。这首诗将从你的句子开始,模型将为你完成这首诗的剩余部分!

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])
# Run this cell to try with different inputs without having to re-train the model 
generate_output()

输出:
Write the beginning of your poem, the Shakespeare machine will complete it. Your input is:

我输入love is forever

我输入love is forever (加一个空格)


Keras Team’s text generation https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

作业3:用LSTM网络即兴演奏爵士乐独奏

注意:pip install music21 安装这个包

from __future__ import print_function
import IPython
import sys
from music21 import *
import numpy as np
from grammar import *
from qa import *
from preprocess import * 
from music_utils import *
from data_utils import *
from keras.models import load_model, Model
from keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector
from keras.initializers import glorot_uniform
from keras.utils import to_categorical
from keras.optimizers import Adam
from keras import backend as K

1. 问题陈述

你要给朋友过生日,你想创作一段音乐,但是你不懂音乐,你要使用 LSTM RNN 生成音乐

1.1 数据集

  • 听一下这段音乐
IPython.display.Audio('./data/30s_seq.mp3')


我们的音乐生成系统将使用 78 个独特的值(声调)。运行以下代码来加载原始音乐数据并将其预处理为数字

X, Y, n_values, indices_values = load_music_utils()
print('shape of X:', X.shape)
print('number of training examples:', X.shape[0])
print('Tx (length of sequence):', X.shape[1])
print('total # of unique values:', n_values)
print('Shape of Y:', Y.shape)

输出:

shape of X: (60, 30, 78)
number of training examples: 60
Tx (length of sequence): 30
total # of unique values: 78
Shape of Y: (30, 60, 78)
  • X:维度 (m,Tx,78)(m, T_x,78)(m,Tx,78) m 个样本,每个样本有30个音乐值,每个值用 78 维的 one-hot 编码表示
  • Y:跟 X 一样,向左移动了一步,维度重塑为 (Ty,m,78)(T_y, m, 78)(Ty,m,78)Ty=TxT_y=T_xTy=Tx,方便给 LSTM 喂数据
  • n_values:数据集里独立的编码个数:78
  • indices_values:编码字典映射序号,0-77

1.2 模型预览


使用 64 维隐藏状态的 LSTM

n_a = 64 

LSTM 参考 https://keras.io/zh/layers/recurrent/#lstm

Dense 参考 https://keras.io/zh/layers/core/#dense

reshapor = Reshape((1, 78))                        # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True)         # Used in Step 2.C
densor = Dense(n_values, activation='softmax')     # Used in Step 2.D

实现djmodel()步骤:

  1. 创建空的 list output 存储每个时间步的 LSTM 单元
  2. for 循环 t∈[1,Tx]t \in [1,T_x]t[1,Tx]
    A. 从X里选择第 i 个时间步向量,x = Lambda(lambda x: x[:,t,:])(X)
    B. reshape x 为(1,78)使用 layer 对象 reshapor = Reshape((1, 78))
    C. 运行 x 经过 一步LSTM单元,记住用前一步的隐藏层状态 a 和 cell 状态 c 初始化 LSTM单元:a, _, c = LSTM_cell(input_x, initial_state=[previous hidden state, previous cell state])
    D. 使用 dense + softmax 得到激活输出
    E. 记录预测值到outputs
# GRADED FUNCTION: djmodeldef djmodel(Tx, n_a, n_values):"""Implement the modelArguments:Tx -- length of the sequence in a corpusn_a -- the number of activations used in our modeln_values -- number of unique values in the music data Returns:model -- a keras model with the """# Define the input of your model with a shape X = Input(shape=(Tx, n_values))# Define s0, initial hidden state for the decoder LSTMa0 = Input(shape=(n_a,), name='a0')c0 = Input(shape=(n_a,), name='c0')a = a0c = c0### START CODE HERE ### # Step 1: Create empty list to append the outputs while you iterate (≈1 line)outputs = []# Step 2: Loopfor t in range(Tx):# Step 2.A: select the "t"th time step vector from X. x = Lambda(lambda x: x[:,t,:])(X)# Step 2.B: Use reshapor to reshape x to be (1, n_values) (≈1 line)x = reshapor(x)# Step 2.C: Perform one step of the LSTM_cella, _, c = LSTM_cell(x, initial_state=[a, c])# Step 2.D: Apply densor to the hidden state output of LSTM_Cellout = densor(a)# Step 2.E: add the output to "outputs"outputs.append(out)# Step 3: Create model instancemodel = Model(inputs=[X, a0, c0], outputs=outputs)### END CODE HERE ###return model

这段测试,一直报错,过不去,也找不到原因。。。

model = djmodel(Tx = 30 , n_a = 64, n_values = 78)

报错:

LinAlgError                               Traceback (most recent call last)
<ipython-input-7-57eb2d19469c> in <module>
----> 1 model = djmodel(Tx = 30 , n_a = 64, n_values = 78)<ipython-input-6-7a17ca9b5b35> in djmodel(Tx, n_a, n_values)35         x = reshapor(x)36         # Step 2.C: Perform one step of the LSTM_cell
---> 37         a, _, c = LSTM_cell(x, initial_state=[a, c])38         # Step 2.D: Apply densor to the hidden state output of LSTM_Cell39         out = densor(a)c:\program files\python37\lib\site-packages\keras\layers\recurrent.py in __call__(self, inputs, initial_state, constants, **kwargs)582             if 'constants' in kwargs:583                 kwargs.pop('constants')
--> 584             output = super(RNN, self).__call__(full_input, **kwargs)585             self.input_spec = original_input_spec586             return outputc:\program files\python37\lib\site-packages\keras\engine\base_layer.py in __call__(self, inputs, **kwargs)461                                          'You can build it manually via: '462                                          '`layer.build(batch_input_shape)`')
--> 463                 self.build(unpack_singleton(input_shapes))464                 self.built = True465 c:\program files\python37\lib\site-packages\keras\layers\recurrent.py in build(self, input_shape)500                 self.cell.build([step_input_shape] + constants_shape)501             else:
--> 502                 self.cell.build(step_input_shape)503 504         # set or validate state_specc:\program files\python37\lib\site-packages\keras\layers\recurrent.py in build(self, input_shape)1923             initializer=self.recurrent_initializer,1924             regularizer=self.recurrent_regularizer,
-> 1925             constraint=self.recurrent_constraint)1926 1927         if self.use_bias:c:\program files\python37\lib\site-packages\keras\engine\base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint)277         if dtype is None:278             dtype = self.dtype
--> 279         weight = K.variable(initializer(shape, dtype=dtype),280                             dtype=dtype,281                             name=name,c:\program files\python37\lib\site-packages\keras\initializers.py in __call__(self, shape, dtype)266             self.seed += 1267         a = rng.normal(0.0, 1.0, flat_shape)
--> 268         u, _, v = np.linalg.svd(a, full_matrices=False)269         # Pick the one with the correct shape.270         q = u if u.shape == flat_shape else v<__array_function__ internals> in svd(*args, **kwargs)c:\program files\python37\lib\site-packages\numpy\linalg\linalg.py in svd(a, full_matrices, compute_uv, hermitian)1624 1625         signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1626         u, s, vh = gufunc(a, signature=signature, extobj=extobj)1627         u = u.astype(result_t, copy=False)1628         s = s.astype(_realType(result_t), copy=False)c:\program files\python37\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)104 105 def _raise_linalgerror_svd_nonconvergence(err, flag):
--> 106     raise LinAlgError("SVD did not converge")107 108 def _raise_linalgerror_lstsq(err, flag):LinAlgError: SVD did not converge

这个生成音乐先不做了,继续学习。如果有相同错误的小伙伴解决了,记得在下面留言告知方法,多谢了!


我的CSDN博客地址 https://michael.blog.csdn.net/

长按或扫码关注我的公众号(Michael阿明),一起加油、一起学习进步!
Michael阿明

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/473890.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

SQL2008中Merge的用法

SQL2008中Merge的用法 在SQL2008中&#xff0c;新增了一个关键字&#xff1a;Merge&#xff0c;这个和Oracle的Merge的用法差不多&#xff0c;只是新增了一个delete方法而已。下面就是具体的使用说明&#xff1a; 首先是对merge的使用说明&#xff1a; merge [into][目标表]usi…

LeetCode 1146. 快照数组(哈希map + 有序map)

文章目录1. 题目2. 解题1. 题目 实现支持下列接口的「快照数组」- SnapshotArray&#xff1a; SnapshotArray(int length) - 初始化一个与指定长度相等的 类数组 的数据结构。初始时&#xff0c;每个元素都等于 0。void set(index, val) - 会将指定索引 index 处的元素设置为…

LeetCode 825. 适龄的朋友(计数排序+前缀和)

文章目录1. 题目2. 解题1. 题目 人们会互相发送好友请求&#xff0c;现在给定一个包含有他们年龄的数组&#xff0c;ages[i] 表示第 i 个人的年龄。 当满足以下任一条件时&#xff0c;A 不能给 B&#xff08;A、B不为同一人&#xff09;发送好友请求&#xff1a; age[B] <…

LeetCode 954. 二倍数对数组(map计数)

文章目录1. 题目2. 解题1. 题目 给定一个长度为偶数的整数数组 A&#xff0c;只有对 A 进行重组后可以满足 对于每个 0 < i < len(A) / 2&#xff0c;都有 A[2 * i 1] 2 * A[2 * i] 时&#xff0c;返回 true&#xff1b;否则&#xff0c;返回 false。 示例 1&#xf…

LeetCode 732. 我的日程安排表 III(差分思想)

文章目录1. 题目2. 解题1. 题目 实现一个 MyCalendar 类来存放你的日程安排&#xff0c;你可以一直添加新的日程安排。 MyCalendar 有一个 book(int start, int end)方法。它意味着在start到end时间内增加一个日程安排&#xff0c;注意&#xff0c;这里的时间是半开区间&…

05.序列模型 W2.自然语言处理与词嵌入

文章目录1. 词汇表征2. 使用词嵌入3. 词嵌入的特性4. 嵌入矩阵5. 学习词嵌入6. Word2Vec7. 负采样8. GloVe 词向量9. 情感分类10. 词嵌入除偏作业参考&#xff1a;吴恩达视频课深度学习笔记自然语言处理与词嵌入 Natural Language Processing and Word Embeddings 1. 词汇表征…

Hadoop学习之HDFS

Hadoop学习之HDFS 1 HDFS相关概念 1.1 设计思路 分散存储&#xff0c;冗余备份。 分散存储&#xff1a;大文件被切割成小文件&#xff0c;使用分而治之的思想让多个服务器对同一个文件进行联合管理&#xff1b; 冗余备份&#xff1a;每个小文件做冗余备份&#xff0c;并且…

LeetCode 799. 香槟塔(DP动态规划)

文章目录1. 题目2. 解题1. 题目 我们把玻璃杯摆成金字塔的形状&#xff0c;其中第一层有1个玻璃杯&#xff0c;第二层有2个&#xff0c;依次类推到第100层&#xff0c;每个玻璃杯(250ml)将盛有香槟。 从顶层的第一个玻璃杯开始倾倒一些香槟&#xff0c;当顶层的杯子满了&…

天池在线编程 2020国庆八天乐 - 7 进制

文章目录1. 题目2. 解题1. 题目 https://tianchi.aliyun.com/oj/118289365933779217/122647324212270017 Given an integer, return its base 7 string representation. 输入范围为[-1e7, 1e7] 。 示例 样例 1: 输入: num 100 输出: 202样例 2: 输入: num -7 输出: -102.…

Hadoop学习之MapReduce

Hadoop学习之MapReduce 目录 Hadoop学习之MapReduce 1 MapReduce简介 1.1 什么是MapReduce 1.2 MapReduce的作用 1.3 MapReduce的运行方式 2 MapReduce的运行机制 2.1 相关进程 2.2 MapReduce的编程套路 2.3 MapTask的并行度 2.4 切片及其源码解读 2.5 ReduceTask的…

Hadoop学习之yarn

Hadoop学习之YARN 1 YARN简介 1.1 概述 YARN &#xff08;Yet Another Resource Negotiator&#xff09;是一个资源调度平台&#xff0c;负责为运算程序提供服务器运算资源&#xff0c;相当于一个分布式的操作系统平台&#xff0c;而 MapReduce 等运算程序则相当于运行于操作…

天池在线编程 2020国庆八天乐 - 8. 分糖果

文章目录1. 题目2. 解题1. 题目 https://tianchi.aliyun.com/oj/118289365933779217/122647324212270016 描述&#xff1a; 给定长度为偶数的整数数组&#xff0c;该数组中不同的数字代表不同种类的糖果&#xff0c; 每个数字表示一种糖果。 您需要将这些糖果平均分配给弟弟和…

Hive基础知识

Hive基础知识 1 Hive相关概念 1.1 Hive是什么 Hive是基于 Hadoop 的一个数据仓库工具&#xff0c;可以将结构化的数据映射为一张数据库表&#xff0c;并提供 HQL(Hive SQL)查询功能&#xff0c;最终底层将HQL语句转换为MapReduce任务的&#xff0c;底层数据是存储在 HDFS 上…

天池在线编程 2020国庆八天乐 - 6. 山谷序列(DP)

文章目录1. 题目2. 解题1. 题目 https://tianchi.aliyun.com/oj/118289365933779217/122647324212270018 描述&#xff1a; 给你一个长度为 n 的序列&#xff0c;在他的子序列中让你找一个山谷序列&#xff0c;山谷序列定义为&#xff1a; 序列的长度为偶数。假设子序列的长…

天池在线编程 2020国庆八天乐 - 4. 生成更大的陆地(BFS)

文章目录1. 题目2. 解题1. 题目 https://tianchi.aliyun.com/oj/118289365933779217/122647324262601668 LeetCode 上也有该题 827. 最大人工岛 描述 在一个0和1的2D网格中&#xff0c;我们最多将一个0改为1。 之后&#xff0c;最大岛屿的大小是多少&#xff1f; &#xff0…

python操作MySQL 模拟简单银行转账操作

一、基础知识 1、MySQL-python的安装 下载&#xff0c;然后 pip install 安装包 2、python编写通用数据库程序的API规范 &#xff08;1&#xff09;、数据库连接对象 connection&#xff0c;建立python客户端与数据库的网络连接&#xff0c;创建方法为 MySQLdb.Connect(参数) 参…

LeetCode 1007. 行相等的最少多米诺旋转

文章目录1. 题目2. 解题1. 题目 在一排多米诺骨牌中&#xff0c;A[i] 和 B[i] 分别代表第 i 个多米诺骨牌的上半部分和下半部分。&#xff08;一个多米诺是两个从 1 到 6 的数字同列平铺形成的 —— 该平铺的每一半上都有一个数字。&#xff09; 我们可以旋转第 i 张多米诺&a…

Hive内置函数大全

Hive内置函数大全 目录 Hive内置函数大全 1.复合类型构造函数 2 复合类型操作符 3 数值计算函数 4 日期函数 5 条件函数 6 字符串函数 7 汇总统计函数&#xff08;UDAF&#xff09; 8 表格生成函数(UDTF) 9 类型转换函数 10 数学函数 11 数学运算 12 逻辑运算 13…

python-mysql超简单银行转账

1首先先建数据库bank&#xff0c;数据结构表的名称为accoment&#xff1a; 2.python与mysql交互代码如下&#xff1a; # coding utf-8 # 1.导入模块 from pymysql import * import sys import pymysql# 2.接受命令行参数 if __name__ __main__:source_acctid 11target_acctid…

Linux Kernel ‘mp_get_count()’函数本地信息泄露漏洞

漏洞名称&#xff1a;Linux Kernel ‘mp_get_count()’函数本地信息泄露漏洞CNNVD编号&#xff1a;CNNVD-201311-054发布时间&#xff1a;2013-11-06更新时间&#xff1a;2013-11-06危害等级&#xff1a; 漏洞类型&#xff1a;信息泄露威胁类型&#xff1a;本地CVE编号&#x…