文章目录
- 下载IMDb数据
- 读取IMDb数据
- 建立分词器
- 将评论数据转化为数字列表
- 让转换后的数字长度相同
- 加入嵌入层
- 建立多层感知机模型
- 加入平坦层
- 加入隐藏层
- 加入输出层
- 查看模型摘要
- 训练模型
- 评估模型准确率
- 进行预测
- 查看测试数据预测结果
- 完整函数
- 用RNN模型进行IMDb情感分析
- 用LSTM模型进行IMDb情感分析
GITHUB地址https://github.com/fz861062923/Keras
下载IMDb数据
#下载网站http://ai.stanford.edu/~amaas/data/sentiment/
读取IMDb数据
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
C:\Users\admin\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.from ._conv import register_converters as _register_converters
Using TensorFlow backend.
#因为数据也是从网络上爬取的,所以还需要用正则表达式去除HTML标签
import re
def remove_html(text):r=re.compile(r'<[^>]+>')return r.sub('',text)
#观察IMDB文件目录结构,用函数进行读取
import os
def read_file(filetype):path='./aclImdb/'file_list=[]positive=path+filetype+'/pos/'for f in os.listdir(positive):file_list+=[positive+f]negative=path+filetype+'/neg/'for f in os.listdir(negative):file_list+=[negative+f]print('filetype:',filetype,'file_length:',len(file_list))label=([1]*12500+[0]*12500)#train数据和test数据中positive都是12500,negative都是12500text=[]for f_ in file_list:with open(f_,encoding='utf8') as f:text+=[remove_html(''.join(f.readlines()))]return label,text
#用x表示label,y表示text里面的内容
x_train,y_train=read_file('train')
filetype: train file_length: 25000
x_test,y_test=read_file('test')
filetype: test file_length: 25000
y_train[0]
'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!'
建立分词器
具体用法可以参看官网https://keras.io/preprocessing/text/
token=Tokenizer(num_words