课程来源:人工智能实践:Tensorflow笔记2
文章目录
- 前言
- 1、文件一览
- 2、将load_data()函数替换掉
- 2、调用generateds函数
- 4、效果
- 总结
前言
本讲目标:自制数据集,解决本领域应用
将我们手中的图片和标签信息制作为可以直接导入的npy文件。
1、文件一览
首先看看我们的文件长什么样:
路径:D:\python code\AI\class4\MNIST_FC\mnist_image_label\mnist_test_jpg_10000
图片文件:(黑底白字的灰度图,大小:28x28,每个像素点都是0~255之间的整数)
标签文件:(图片名和对应的标签,中间用空格隔开)
2、将load_data()函数替换掉
之前我们导入数据集的方式是(以mnist数据集为例):
fashion = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
导入后变量的数据类型和形状:
x_train.shape | (60000,28,28) ,3维数组,60000个28行28列的图片灰度值 |
---|---|
y_train.shape | (60000,) ,60000张图片对应的标签,是1维数组 |
x_test.shape | (10000,28,28) ,3维数组,10000个28行28列的图片灰度值 |
y_test.shape | (10000,) ,10000张图片对应的标签,是1维数组 |
我们需要自己写个函数generateds(图片路径,标签文件):
观察数据集:
我们需要做的:把图片灰度值数据拼接到图片列表,把标签数据拼接到标签列表。
函数代码如下:
def generateds(path, txt):f = open(txt, 'r') #只读形式读取文本数据contents = f.readlines() # 按行读取,读取所有行f.close() #关闭文件x, y_ = [], [] #建立空列表for content in contents: #逐行读出value = content.split() # 以空格分开,存入数组 图片名为value0 标签为value1img_path = path + value[0] #图片路径+图片名->拼接出索引路径img = Image.open(img_path) #读入图片img = np.array(img.convert('L'))img = img / 255. #归一化数据x.append(img) #将归一化的数据贴到列表xy_.append(value[1]) #标签贴到列表y_print('loading : ' + content) #打印状态提示x = np.array(x)y_ = np.array(y_)y_ = y_.astype(np.int64)return x, y_
2、调用generateds函数
使用函数代码:
'''添加了:
训练集图片路径
训练集标签文件
训练集输入特征存储文件
训练集标签存储文件
测试集图片路径
测试集标签文件
测试集输入特征存储文件
测试集标签存储文件'''
train_path = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_train_jpg_60000/'
train_txt = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_train_jpg_60000.txt'
x_train_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_x_train.npy'
y_train_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fahion_y_train.npy'test_path = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_test_jpg_10000/'
test_txt = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_test_jpg_10000.txt'
x_test_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_x_test.npy'
y_test_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_y_test.npy'
#观察测试集训练集文件是否存在,如果存在直接读取,如果不存在调用generate datasets函数
if os.path.exists(x_train_savepath) and os.path.exists(y_train_savepath) and os.path.exists(x_test_savepath) and os.path.exists(y_test_savepath):print('-------------Load Datasets-----------------')x_train_save = np.load(x_train_savepath)y_train = np.load(y_train_savepath)x_test_save = np.load(x_test_savepath)y_test = np.load(y_test_savepath)x_train = np.reshape(x_train_save, (len(x_train_save), 28, 28))x_test = np.reshape(x_test_save, (len(x_test_save), 28, 28))
else:print('-------------Generate Datasets-----------------')x_train, y_train = generateds(train_path, train_txt)x_test, y_test = generateds(test_path, test_txt)print('-------------Save Datasets-----------------')x_train_save = np.reshape(x_train, (len(x_train), -1))x_test_save = np.reshape(x_test, (len(x_test), -1))np.save(x_train_savepath, x_train_save)np.save(y_train_savepath, y_train)np.save(x_test_savepath, x_test_save)np.save(y_test_savepath, y_test)model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dense(10, activation='softmax')
])model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['sparse_categorical_accuracy'])model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()
4、效果
制作完数据集之后开始用神经网络训练:
可以发现原本的文件夹中出现了你所需要的npy文件。
完整代码:
import tensorflow as tf
from PIL import Image
import numpy as np
import ostrain_path = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_train_jpg_60000/'
train_txt = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_train_jpg_60000.txt'
x_train_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_x_train.npy'
y_train_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fahion_y_train.npy'test_path = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_test_jpg_10000/'
test_txt = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_test_jpg_10000.txt'
x_test_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_x_test.npy'
y_test_savepath = 'D:/python code/AI/class4/FASHION_FC/fashion_image_label/fashion_y_test.npy'def generateds(path, txt):f = open(txt, 'r')contents = f.readlines() # 按行读取f.close()x, y_ = [], []for content in contents:value = content.split() # 以空格分开,存入数组img_path = path + value[0]img = Image.open(img_path)img = np.array(img.convert('L'))img = img / 255.x.append(img)y_.append(value[1])print('loading : ' + content)x = np.array(x)y_ = np.array(y_)y_ = y_.astype(np.int64)return x, y_if os.path.exists(x_train_savepath) and os.path.exists(y_train_savepath) and os.path.exists(x_test_savepath) and os.path.exists(y_test_savepath):print('-------------Load Datasets-----------------')x_train_save = np.load(x_train_savepath)y_train = np.load(y_train_savepath)x_test_save = np.load(x_test_savepath)y_test = np.load(y_test_savepath)x_train = np.reshape(x_train_save, (len(x_train_save), 28, 28))x_test = np.reshape(x_test_save, (len(x_test_save), 28, 28))
else:print('-------------Generate Datasets-----------------')x_train, y_train = generateds(train_path, train_txt)x_test, y_test = generateds(test_path, test_txt)print('-------------Save Datasets-----------------')x_train_save = np.reshape(x_train, (len(x_train), -1))x_test_save = np.reshape(x_test, (len(x_test), -1))np.save(x_train_savepath, x_train_save)np.save(y_train_savepath, y_train)np.save(x_test_savepath, x_test_save)np.save(y_test_savepath, y_test)model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dense(10, activation='softmax')
])model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['sparse_categorical_accuracy'])model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()
总结
课程链接:MOOC人工智能实践:TensorFlow笔记2