RNN入门笔记

本笔记来源自Youtube李宏毅老师的RNN视频归纳,主要分为以下几个知识点:

  • RNN 的特点
  • RNN 的几种实现方法 (Simple RNN, LSTM)
  • RNN 的训练不稳定性
  • RNN 的keras实现 (定长和变长输入案例)

Recurrent Neural Network

Feature of RNN

Differ from normal Neural Network, Recurrent Neural Network has memory. RNN can remember the history, named hidden state, and combine it with current state to choose the action aaa.

Show as above, x1x_1x1 and x2x_2x2 are tow inputs, outputs of each hidden layer will be stored in the Network after be calculated. In the next iteration, with another two inputs x1′x_1'x1 and x2′x_2'x2 , network will combine this new inputs {$x_1’ $, x2′x_2'x2 } with last outputs history {a1a_1a1 , a2a_2a2 }, and compute the final value yyy. In particular, a1a_1a1 and a2a_2a2 are history data, could be explained as memory, so we can say that RNN has ability to remember history states.
Here is an example to understand the RNN: assume all weights in network is 1, no bias and all activation functions are linear. We input a sequence : [1, 1], [1, 1], [2, 2], … and see the result of output.

  • At first we need to initialize the value of memory block, here we set 0 for all block. With [1, 1] input, hidden layer 1 will compute : x1x_1x1 (1) + x2x_2x2 (1) + a1a_1a1 (0) = 2. Same as other neural:
  • When the first iteration finished, memory block will update its value to the output of hidden neural. Now we input the second tuple in the sequence : [1, 1], the output should be : x1x_1x1 (1) + x2x_2x2 (1) + a1a_1a1 (2) = 4.

Note : Change the sequence order will change the output.

So the total work flow of RNN looks like:


Different forms of RNN

RNN has a lot of different types, here introduce three of most common methods: Elman Network, Jordan Network and Bidirectional RNN.

Elman Network and Jordan Network

The main difference of these two method is : Elman Network stores all parameters of hidden layers, use last parameters to compute the value; Jordan Network stores the target output value, and combined this output with next input to compute the value.

Bidirectional RNN

Bidirectional RNN is train both Forward and Backward model, use the combination of both model to decide the result of xtx^txt, shown as below :

The advantage of this model is that it will decide the value of xtx^txt according to the entire state not only the history state (in forward rnn, value of xtx^txt is only decided by the state history before ttt while there exists backward model in bidirectional rnn which can consider the history between ttt to NNN, NNN is the final state of this sequence).

★ Long Short-term Memory (LSTM)

LSTM focus on the structure of Memory cell, which contains three parts in the controller of memory cell: Input Gate, Output Gate and Forget Gate.

  • Input Gate: memory cell stores the parameters to remember the history, Input Gate decides if these parameters update or not in each step. New parameters of current step can pass into memory cell only if Input Gate is open while the state of Input Gate is controlled by Control Signal.

  • Output Gate: output of this model is the parameters in the memory cell, Output Gate which controlled by Output Gate Signal decides these parameters can output or not.

  • Forget Gate: memory cell stores a lot of history states, which means there may exist some old and outdated data. Forget Gate controlled by Forget Gate Signal decides which data need to be forgot.

There are 4 inputs (Input Gate Signal, Output Gate Signal, Forget Gate Signal and Input Data) and 1 output in this model, structure shown as below:

Here we input data as zzz, pass the activate function ggg and become g(z)g(z)g(z). Input Gate Signal ziz_izi pass the activate function fff and become f(zi)f(z_i)f(zi), output of Input Gate is f(zi)⋅g(z)f(z_i)·g(z)f(zi)g(z). ccc is the origin parameters stored in Memory Cell, zfz_fzf is the Forget Gate Signal and the output of Forget Gate is f(zf)⋅cf(z_f)·cf(zf)c, which named c′c'c. The final output aaa equals f(z0)⋅h(c′)f(z_0)·h(c')f(z0)h(c).

Note: Activation Function fff often be sigmoid function, output is 0 or 1, which expresses the Gate is opened or not. For Input Gate, If value equals 1, the input zzz is fully passed through this gate while if value equals 0, none of input zzz could be passed.

A clear example could be found here (29:50) :

Now we know the structure of Memory Cell. In LSTM, each Memory Cell expresses a neural which means each neural needs 4 different inputs. Typically we have only 1 input, but these 4 inputs are all computed from that 1 input ( by multiple different vector ).

Furthermore, we compute the output using 4 inputs in each neural :

yt=h[f(zf)⋅ct−1+f(zi)⋅z]⋅f(zo)y^t = h[f(z_f)·c_{t-1} + f(z_i)·z]·f(z_o) yt=h[f(zf)ct1+f(zi)z]f(zo)

Picture below shows the computation logic :

That is not the end of LSTM, we need to add some extra data as input. LSTM use the output yty^tyt and the memory ctc_tct as the input of next step t+1t+1t+1, so the entire structure of LSTM should look like:

GRU (Gated recurrent unit)

A method based on LSTM, which delete 1 gate (only 2 gates used) but has the same performance as LSTM.

How to train RNN?

The method of RNN training is also Backpropagation, but exists slight difference: we need to sum the loss of all steps.

##### Unstable in RNN training

In RNN training, the loss values are often jump sharply, show as below picture:

What is the reason of this strange problem? Visualize the loss value of w1w_1w1 and w2w_2w2, found there exists a sharply changing. When orange point jump from 2 to 3, it will get a big loss.

The resolution of this problem is Clipping: if the loss value is bigger than a threshold, then let the loss value equal to this threshold.
This unstable loss is the feature of RNN, but why? What on earth is the reason which causes this problem? Here is an easy example to show the effect of result yty_tyt by changing the www.

Assume only one neural in each network, weight is 1, no bias, www is transform weight (parameters in last layer needs multiply www then input into next layer).

w=1→y1000=1w=1 \quad \rightarrow \quad y_{1000}=1w=1y1000=1
w=1.01→y1000≈20000w=1.01 \quad \rightarrow \quad y_{1000}\approx20000w=1.01y100020000

Even if www changes slightly, the output of last layer y1000y_{1000}y1000 changes a lot. This because we multiply www many times which means we magnify the effect of transform weight.

w=0.99→y1000≈0w = 0.99 \quad \rightarrow \quad y_{1000} \approx0w=0.99y10000
w=0.01→y1000≈0w = 0.01 \quad \rightarrow \quad y_{1000} \approx 0w=0.01y10000

Even if two www are total different, the output of last layer is same.

Use LSTM to solve problem —— handle gradient vanishing

The difficulty of RNN is that we don’t know how to choose a suitable learning rate: big η\etaη need for plane surface but small η\etaη for sharp changing area. LSTM could solve this problem by making those plane area more uneven so that you can safely assign a small η\etaη.

In traditional RNN, parameters in memory cell will be covered after each step which means it forget the history before. But in LSTM, parameters in memory cell not be covered, data in this step are added with history before, so the history could be remembered. Further more, gradient won’t be vanished if forget gate is not opened, so there is no gradient vanishing in LSTM.

Implementation LSTM with keras

To use keras, we need to understand following terms:

  • input_dim: dimension of input vector.
  • time_steps: length of Sequence.
  • output_dim = dimension of output. Notice this output is not the final output of entire Neural Network, it’s just the output of LSTM Layers ( LSTM might be a part of one Neural Network, see as code below ).

Say if we want to predict the “happy feeling” or “sad feeling” of two sentence, “I am lucky!” and “I feel bad!”. Then the time_steps should be 3 ( the length of sentence is 3 ), and the output_dim should be 1 ( 0 expresses unhappy while 1 expresses happy ). Because of the different length of words ( ‘I’ has 1 character while ‘lucky’ has 5 characters ), we can’t choose a fixed input dimension. The resolution of this problem is using Embedding Layer to map the different length words to a specific dimension.

LSTM for Constant Length Input problem

Here is a simple example which use LSTM to predict the MNIST problem, key code block shown as below:

input_dim = 28  # MNIST image's width
time_steps = 28  # MNIST image's height
output_dim = 28  # output dimension for LSTM
model = Sequential()
model.add(LSTM(units=ouput_dim, input_shape=(time_steps, input_dim))) # add LSTM Layer
model.add(Dense(10, activation='softmax'))  # since there are 10 classes in MNIST
LSTM for Different Length Input problem

Because of the different length between different sentences. We need to add a Masking Layer to solve the Length Problem.
First, we add 0 to make all samples have the same length, for example, here is a sample set:

X = np.array([[1, 3, 2], # sentence 1 - 3 words[2],       # sentence 2 - 1 words[3, 4]     # sentence 3 - 2 words
])

we use inner function of keras to change the sample length:

X = keras.preprocessing.sequence.pad_sequences(X, maxlen=3, padding='post')

Now the matrix X changes to:

array([[1, 3, 2], # sentence 1 - 3 words[2, 0, 0], # sentence 2 - 3 words[3, 4, 0]  # sentence 3 - 3 words
])

Then, we add a Mask Layer to filter the 0 value which means these neural with 0 value won’t be calculated in the feed-forward process.

model = Sequential()
model.add(Masking(mask_value=0, input_shape=(3, 3))) # add Masking Layer
model.add(LSTM(units=ouput_dim, input_shape=(3, 3))) # add LSTM Layer
model.add(Dense(10, activation='softmax')) 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/290391.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

WPF 基础控件之 DatePicker 样式

此群已满340500857 ,请加新群458041663由于微信群人数太多入群请添加小编微信号yanjinhuawechat 或 W_Feng_aiQ 邀请入群需备注WPF开发者 PS:有更好的方式欢迎推荐。支持NugetInstall-Package WPFDevelopers.Minimal -Version 3.2.001—代码如下一、创建…

stagefright框架(四)-Video Buffer传输流程

這篇文章將介紹Stagefright中是如何和OMX video decoder传送buffer。 (1) OMXCodec會在一開始的時候透過read函式來傳送未解碼的data給decoder,並且要求decoder將解碼後的data傳回來 status_t OMXCodec::read(...){ if (mInitialBufferSubmit) { mInitialBuffe…

微信支付四大支付模式分别有哪些区别?

微信支付是集成在微信客户端的支付功能,用户可以通过手机完成快速的支付流程。微信支付已为百货、餐厅、便利店、酒店、快递、景区、医院、售货机等提供了支付与营销的全方位支持。 目前微信支付已实现刷卡支付、扫码支付、公众号支付、APP支付,并提供企…

利用Deep Reinforcement Learning训练王者荣耀超强AI

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning(一)知识背景(二)系统架构(三)算法结构3.1 Target Attention3.2 利用LSTM学习技能连招释放3.3 Decoupling of Control Dependen…

C和指针之编译出现warning: implicit declaration of function ‘matrix_multiply‘ is invalid in C99问题

1、问题 在我的mac上编译一个c文件,出现下面错误2、原因和解决办法 是因为我用vim的时候,把函数名少写了一个字符导致,把这个函数名改正就行了。

5. 堪比JMeter的.Net压测工具 - Crank 实战篇 - 接口以及场景压测

1. 前言通过之前的学习,我们已经掌握了crank的配置以及对应http基准工具bombardier、wrk、wrk2的用法,本篇文章介绍一下如何将其用于实战,在实际的项目中我们如何使用crank来完成压测任务。2. 项目背景目前有一个项目,我们希望通过…

Pytorch快速入门笔记

Pytorch 入门笔记1. Pytorch下载与安装2. Pytorch的使用教程2.1 Pytorch设计理念及其基本操作2.2 使用torch.nn搭建神经网络2.3 创建属于自己的Dataset和DataLoader2.3.1 编写Dataset类2.3.2 编写Transform类2.3.3 将Transform融合到Dataset中去2.3.4 编写DataLoader类2.4 使用…

详解用65行javascript代码做Flappy Bird

点击查看特效JavaScript做Flappy Bird游戏,代码仅仅65行资源包括:javascript源码:phaser.min.js;main.js;index.html素材:两张图片!素材PS:素材源码下载来我的前端群570946165&#…

C和指针之数组编程练习5 (矩阵相乘)

1、问题 5.如果A是个x行y列的矩阵,B是个y行z列的矩阵,把A和B相乘,其结果将是另一个x行z列的矩阵C。这个矩阵的每个元素是由下面的公式决定的: 例如: 结果矩阵中14这个值是通过2-2加上-6-3得到的,编写一个函数,用于执行两个矩阵的乘法。函数的原型如下: void matrix_mul…

我的技术回顾因ABP框架触发DevOps云原生之路-2020年

我的技术回顾:2015年:我的技术回顾那些与ABP框架有关的故事-2015年2016年:从ABP框架国内社区发展回顾.NET技术变迁-2016年2017年:我的技术回顾那些与ABP框架有关的故事-2017年2018年:我的技术回顾那些与ABP框架有关的故…

半身头像

画的好丑。。。继续加油 转载于:https://www.cnblogs.com/manlurensheng/p/4102631.html

Swift - 操作SQLite数据库(引用SQLite3库)

SQLite轻量级数据库在移动应用中使用非常普遍,但是目前的库是C编写的,为了方便使用,对SQLite相关的操作用Swift进行了封装。这个封装代码使用了一个开源项目SQLiteDB,地址是:https://github.com/fahimf/sqlitedb 重要事…

如何在Clion中使用C++调用Python代码

在很多时候,我们需要在一个c工程项目中调用部分Python代码,这就需要我们实现Python和C之间的交互。交互方式有两种:1. 依靠 TCP 建立的网络通信交互;2. 嵌入式混合语言编程(Embedding Code)。这里主要介绍后…

.NET6之MiniAPI(二十四):用Polly重试

为了保障系统的稳定和安全,在调用三方服务时,可以增加重试和熔断。重试是调用一次失败后再试几试,避免下游服务一次闪断,就把整个链路终止;熔断是为了防止太多的次数的无效访问,导致系统不可知异常。Polly是…

CLion 中使用 C++ 版本的 OpenCV

配置环境: Windows 10CLion 2020OpenCV 3.4.1MinGW-w64 1. 下载 CLion 并配置好 MinGW CLion 下载地址:https://www.jetbrains.com/clion MinGW 安装包下载地址:链接:https://pan.baidu.com/s/1c00uHbcf_jGeDDrVg99jtA 提取码&…

如何理解 C# 中的 System.Void 类型?

咨询区 ordag我知道方法声明成 void 表示不返回什么东西,但我发现在 C# 中 void 不仅仅是一个关键词,而且还是一个真实的类型。void 是 System.Void 的别名,就像 int 的别名是 System.Int32 一样,但为什么不允许直接使用Void类型呢…

获得手机的ip

本文转载至 http://blog.csdn.net/showhilllee/article/details/8746114 iosip手机貌似ASI里获取ip地址的链接不可以了。也曾试过whatismyip,在其网站上的截图获取的ip是正确的,单不知道为什么在我这里却是错误的。所以,在这里分享一下获得手…

Idea maven项目不能新建package和class的解决

如图,新建的maven项目不能新建package 这是因为Java是普通的文件夹,要设置为 现在就可以了

基于文本知识库的强化学习技术——Learning to Win by Reading Manuals in a Monte-Carlo Framework

论文链接:http://people.csail.mit.edu/branavan/papers/acl2011.pdf 文章目录1. 背景介绍2. 将攻略文本引入值函数 Q(s,a)Q(s, a)Q(s,a) 评价2.1 复杂环境下使用传统 Q(s,a)Q(s, a)Q(s,a) 函数的缺陷2.2 设计 Q(s,a,d)Q(s, a, d)Q(s,a,d) 神经网络2.3 模型训练流程…

这是Blazor上传文件的最佳方式吗?

Blazor不得不说真是好东西,极大的提升了开发效率,很多的页面交互功能基本上只需要写很少的代码就能实现了,而且还是无js实现,你也绝对没有想到过,Blazor实现文件上传是有多么简单!先说结论:Blaz…