用 Python 从零开始创建神经网络(十九):真实数据集


  • 引言
  • 数据准备
  • 数据加载
  • 数据预处理
  • 数据洗牌
  • 批次(Batches)
  • 训练(Training)
  • 到目前为止的全部代码:



如果你在阅读本书之前已经探索过深度学习,你可能已经熟悉(也可能感到厌倦)MNIST数据集,这是一个包含手写数字(0到9)的图像数据集,每张图像的分辨率为28x28像素。它是一个相对较小的数据集,对模型来说也相对容易学习。这个数据集曾成为深度学习的“Hello World”,并且一度是机器学习算法的基准。然而,这个数据集的问题在于,获得99%以上的准确率变得极其容易,因此它无法提供足够的空间来学习各种参数如何影响模型的学习过程。然而,在2017年,一家名为Zalando的公司发布了一个名为Fashion MNIST的数据集(https://arxiv.org/abs/1708.07747),这是MNIST数据集的直接替代品(https://github.com/zalandoresearch/fashion-mnist)。

Fashion MNIST数据集包含60,000个训练样本和10,000个测试样本,这些样本是28x28像素的图像,涵盖了10种不同的服装类别,例如鞋子、靴子、衬衫、包等。我们稍后会看到一些示例,但首先我们需要获取实际的数据。由于原始数据集由包含特定格式编码图像数据的二进制文件组成,为了本书的使用,我们已经准备并托管了一个预处理数据集,其中包含以.png格式保存的图像。通常,对于图像来说,使用无损压缩是明智的,因为有损压缩(例如JPEG)会通过更改图像数据对图像造成影响。这些图像还根据标签分组,并被分成训练组和测试组。样本是服装物品的图像,而标签是分类信息。以下是数字标签及其对应的描述:




URL = 'https://nnfs.io/datasets/fashion_mnist_images.zip'
FILE = 'fashion_mnist_images.zip'
FOLDER = 'fashion_mnist_images'


import os
import urllib
import urllib.requestif not os.path.isfile(FILE):print(f'Downloading {URL} and saving as {FILE}...')urllib.request.urlretrieve(URL, FILE)


from zipfile import ZipFileprint('Unzipping images...')
with ZipFile(FILE) as zip_images:zip_images.extractall(FOLDER)


from zipfile import ZipFile
import os
import urllib
import urllib.requestURL = 'https://nnfs.io/datasets/fashion_mnist_images.zip'
FILE = 'fashion_mnist_images.zip'
FOLDER = 'fashion_mnist_images'
if not os.path.isfile(FILE):print(f'Downloading {URL} and saving as {FILE}...')urllib.request.urlretrieve(URL, FILE)print('Unzipping images...')
with ZipFile(FILE) as zip_images:zip_images.extractall(FOLDER)print('Done!')


Downloading https://nnfs.io/datasets/fashion_mnist_images.zip and saving as fashion_mnist_images.zip...
Unzipping images...


在目录 7 中,我们有非靴子鞋,或本数据集创建者分类的运动鞋。例如:

将图像转换为灰度图(即将每像素的三通道RGB值转换为单一的黑白范围,像素值为0到255)是一种常见的做法,不过这些图像已经是灰度图。另外,将图像调整大小以规范其尺寸也是一种常见的做法,但同样地,Fashion MNIST数据集已经经过处理,所有图像的尺寸都相同(28x28)。



import oslabels = os.listdir('fashion_mnist_images/train')
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


files = os.listdir('fashion_mnist_images/train/0')
['0000.png', '0001.png', '0002.png', '0003.png', '0004.png', '0005.png', '0006.png', '0007.png', '0008.png', '0009.png']



另一个选择是使用类别权重,在计算损失时为频率较高的类别赋予小于1的权重。然而,在实践中我们几乎没有见过这种方法效果很好。对于图像数据,另一个选择是通过裁剪、旋转、水平或垂直翻转等操作来扩充样本。在应用这些变换之前,需确保它们会生成符合目标的有效样本。幸运的是,我们无需担心这一点,因为Fashion MNIST数据集已经完全平衡。现在,我们将通过查看单个样本来探索数据。为处理图像数据,我们将使用包含OpenCV的Python包,即cv2库,你可以通过pip/pip3安装它:

pip3 install opencv-python


import cv2
image_data = cv2.imread('fashion_mnist_images/train/7/0002.png', cv2.IMREAD_UNCHANGED)


import numpy as np


[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0  49 135 182 150  59   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  78 255 220 212 219 255 246 191 155  87   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0  57 206 215 203 191 203 212 216 217 220 211  15   0][  0   0   0   0   0   0   0   0   0   0   1   0   0   0  58 231 220 210 199 209 218 218 217 208 200 215  56   0][  0   0   0   0   1   2   0   0   4   0   0   0   0 145 213 207 199 187 203 210 216 217 215 215 206 215 130   0][  0   0   0   0   1   2   4   0   0   0   3 105 225 205 190 201 210 214 213 215 215 212 211 208 205 207 218   0][  1   5   7   0   0   0   0   0  52 162 217 189 174 157 187 198 202 217 220 223 224 222 217 211 217 201 247  65][  0   0   0   0   0   0  21  72 185 189 171 171 185 203 200 207 208 209 214 219 222 222 224 215 218 211 212 148][  0  70 114 129 145 159 179 196 172 176 185 196 199 206 201 210 212 213 216 218 219 217 212 207 208 200 198 173][  0 122 158 184 194 192 193 196 203 209 211 211 215 218 221 222 226 227 227 226 226 223 222 216 211 208 216 185][ 21   0   0  12  48  82 123 152 170 184 195 211 225 232 233 237 242 242 240 240 238 236 222 209 200 193 185 106][ 26  47  54  18   5   0   0   0   0   0   0   0   0   0   2   4   6   9   9   8   9   6   6   4   2   0   0   0][  0  10  27  45  55  59  57  50  44  51  58  62  65  56  54  57  59  61  60  63  68  67  66  73  77  74  65  39][  0   0   0   0   4   9  18  23  26  25  23  25  29  37  38  37  39  36  29  31  33  34  28  24  20  14   7   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0][  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]


import matplotlib.pyplot as plt


import matplotlib.pyplot as plt
image_data = cv2.imread('fashion_mnist_images/train/4/0011.png', cv2.IMREAD_UNCHANGED)



import matplotlib.pyplot as plt
image_data = cv2.imread('fashion_mnist_images/train/4/0011.png', cv2.IMREAD_UNCHANGED)
plt.imshow(image_data, cmap='gray')


现在我们可以遍历所有样本,将它们加载到输入数据( X X X)和目标( y y y)列表中。首先,我们扫描训练文件夹,正如之前提到的,该文件夹包含从0到9命名的子文件夹,这些子文件夹同时也充当样本标签。我们遍历这些文件夹及其中的图像,将图像添加到一个列表变量(命名为 X X X)中,并将其对应的标签添加到另一个列表变量(命名为 y y y)中,从而形成我们的样本和真实标签(目标标签):

# Scan all the directories and create a list of labels
labels = os.listdir('fashion_mnist_images/train')
# Create lists for samples and labels
X = []
y = []
# For each label folder
for label in labels:# And for each image in given folderfor file in os.listdir(os.path.join('fashion_mnist_images', 'train', label)):# Read the imageimage = cv2.imread(os.path.join('fashion_mnist_images/train', label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)


import numpy as np
import cv2
import os# Loads a MNIST dataset
def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels = os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX = []y = []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage = cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype('uint8')

由于 X X X被定义为一个列表,并且我们将以NumPy数组形式表示的图像添加到这个列表中,因此我们会在最后调用np.array() X X X从列表转换为一个正式的NumPy数组。对于标签( y y y)也会执行相同的操作,因为它们是一个数字列表,我们还需要告知NumPy这些标签是整数(而非浮点数)值。


# MNIST dataset (train + test)
def create_data_mnist(path):# Load both sets separatelyX, y = load_mnist_dataset('train', path)X_test, y_test = load_mnist_dataset('test', path)# And return all the datareturn X, y, X_test, y_test


import numpy as np
import cv2
import os# Loads a MNIST dataset
def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels = os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX = []y = []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage = cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype('uint8')# MNIST dataset (train + test)
def create_data_mnist(path):# Load both sets separatelyX, y = load_mnist_dataset('train', path)X_test, y_test = load_mnist_dataset('test', path)# And return all the datareturn X, y, X_test, y_test


# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')


接下来,我们将对数据进行缩放(不是对图像本身,而是表示它们的数字)。神经网络在数据范围为0到1或-1到1时通常表现最佳。在这里,图像数据的范围是0到255。我们需要决定如何对这些数据进行缩放。通常,这一过程需要一些实验和反复试验。例如,我们可以将图像缩放到-1到1的范围,通过对每个像素值减去所有像素值的最大值的一半(即 255 / 2 = 127.5 255/2 = 127.5 255/2=127.5),然后除以这一半,从而生成一个范围为-1到1的值。我们也可以通过简单地将数据除以255(最大值)将其缩放到0到1的范围。首先,我们选择将数据缩放到-1到1的范围。在执行这一操作之前,我们需要更改NumPy数组的数据类型,当前的数据类型是uint8(无符号整数,范围为0到255的整数值)。如果我们不更改,NumPy会将其转换为float64数据类型,而我们的目的是使用float32(32位浮点值)。可以通过在NumPy数组对象上调用.astype(np.float32)实现。标签将保持不变:

# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Scale features
X = (X.astype(np.float32) - 127.5) / 127.5
X_test = (X_test.astype(np.float32) - 127.5) / 127.5




print(X.min(), X.max())
-1.0 1.0


(60000, 28, 28)



example = np.array([[1,2],[3,4]])
flattened = example.reshape(-1)
[[1 2][3 4]]
(2, 2)
[1 2 3 4]

我们也可以使用np.flatten()方法,但当处理一批样本时,我们的意图有所不同。在样本的情况下,我们希望保留所有60,000个样本,因此我们需要将训练数据的形状调整为(60000, -1)。这将通知NumPy我们希望保留60,000个样本(第一维度),但将其余的部分展平(-1作为第二维度意味着我们希望将所有样本数据放入这个单一维度中,形成一维数组)。这将创建60,000个样本,每个样本包含784个特征。这784个特征是28·28的结果。为此,我们将分别使用训练数据(X.shape[0])和测试数据(X_test.shape[0])的样本数量,并对它们进行reshape操作:

# Reshape to vectors
X = X.reshape(X.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)


.reshape(X.shape[0], X.shape[1]*X.shape[2])



我们当前的数据集由样本及其目标分类组成,按顺序从0到9排列。为了说明这一点,我们可以在不同位置查询 y y y数据。前6000个样本的标签都将是0。例如:

[0 0 0 0 0 0 0 0 0 0]


[1 1 1 1 1 1 1 1 1 1]






keys = np.array(range(X.shape[0]))
array([0 1 2 3 4 5 6 7 8 9])


import nnfs
[ 3048 19563 58303  8870 40228 31488 21860 56864   845 25770]


X = X[keys]
y = y[keys]


[0 3 9 1 6 5 3 9 0 4 8 9 0 6 6]


import matplotlib.pyplot as plt
plt.imshow((X[4].reshape(28, 28))) # Reshape as image is a vector already










steps = X.shape[0] // BATCH_SIZE


if steps * BATCH_SIZE < X.shape[0]:steps += 1

我们可以通过一个简单的例子来说明为什么要添加这个 1:

batch_size = 2
X = [1, 2, 3, 4]
print(len(X) // batch_size)
X = [1, 2, 3, 4, 5]
print(len(X) // batch_size)



import nnfs
from nnfs.datasets import spiral_datannfs.init()# Create dataset
X, y = spiral_data(samples=100, classes=3)EPOCHS = 10 # Train 10 times
BATCH_SIZE = 128 # We take 128 samples at once# Calculate number of steps
steps = X.shape[0] // BATCH_SIZE
# Dividing rounds down. If there are some remaining data,
# but not a full batch, this won't include it.
# Add 1 to include the remaining samples in 1 more step.
if steps * BATCH_SIZE < X.shape[0]:steps += 1
for epoch in range(EPOCHS):for step in range(steps):batch_X = X[step*BATCH_SIZE:(step+1)*BATCH_SIZE]batch_y = y[step*BATCH_SIZE:(step+1)*BATCH_SIZE]# Now we perform forward pass, loss calculation,# backward pass and update parameters

我们加载了数据集,定义了训练轮数(epochs)和批次大小(batch size),然后计算了步骤的数量。接下来,我们有两个循环——一个是遍历训练轮数的外循环,另一个是遍历步骤的内循环。在每一轮的每一步中,我们会选择训练数据的一个切片。现在我们已经知道如何以批次方式训练模型,我们希望了解每一步和每一轮的训练损失和准确率。



		# Add accumulated sum of losses and sample countself.accumulated_sum += np.sum(sample_losses)self.accumulated_count += len(sample_losses)



	# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularization=False):# Calculate sample lossessample_losses = self.forward(output, y)# Calculate mean lossdata_loss = np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum += np.sum(sample_losses)self.accumulated_count += len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()   


    # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularization=False):# Calculate mean lossdata_loss = self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()



    # Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum = 0self.accumulated_count = 0


# Common loss class
class Loss:# Regularization loss calculationdef regularization_loss(self):        # 0 by defaultregularization_loss = 0# Calculate regularization loss# iterate all trainable layersfor layer in self.trainable_layers:# L1 regularization - weights# calculate only when factor greater than 0if layer.weight_regularizer_l1 > 0:regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))# L2 regularization - weightsif layer.weight_regularizer_l2 > 0:regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)# L1 regularization - biases# calculate only when factor greater than 0if layer.bias_regularizer_l1 > 0:regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))# L2 regularization - biasesif layer.bias_regularizer_l2 > 0:regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)return regularization_loss# Set/remember trainable layersdef remember_trainable_layers(self, trainable_layers):self.trainable_layers = trainable_layers# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularization=False):# Calculate sample lossessample_losses = self.forward(output, y)# Calculate mean lossdata_loss = np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum += np.sum(sample_losses)self.accumulated_count += len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()   # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularization=False):# Calculate mean lossdata_loss = self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()# Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum = 0self.accumulated_count = 0


# Common accuracy class
class Accuracy:# Calculates an accuracy# given predictions and ground truth valuesdef calculate(self, predictions, y):# Get comparison resultscomparisons = self.compare(predictions, y)# Calculate an accuracyaccuracy = np.mean(comparisons)# Add accumulated sum of matching values and sample countself.accumulated_sum += np.sum(comparisons)self.accumulated_count += len(comparisons)# Return accuracyreturn accuracy     # Calculates accumulated accuracydef calculate_accumulated(self):# Calculate an accuracyaccuracy = self.accumulated_sum / self.accumulated_count# Return the data and regularization lossesreturn accuracy# Reset variables for accumulated accuracydef new_pass(self):self.accumulated_sum = 0self.accumulated_count = 0



    def train(self, X, y, *, epochs=1, batch_size=None, print_every=1, validation_data=None):


		# Default value if batch size is not settrain_steps = 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps = 1# For better readabilityX_val, y_val = validation_data


		# Calculate number of stepsif batch_size is not None:train_steps = len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this won't include it# Add 1 to include this not full batchif train_steps * batch_size < len(X):train_steps += 1if validation_data is not None:validation_steps = len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this won't include it# Add 1 to include this not full batchif validation_steps * batch_size < len(X_val):validation_steps += 1


        # Main training loopfor epoch in range(1, epochs+1):# Print epoch numberprint(f'epoch: {epoch}')# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):


				# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = Xbatch_y = y# Otherwise slice a batchelse:batch_X = X[step*batch_size:(step+1)*batch_size]batch_y = y[step*batch_size:(step+1)*batch_size]


				# Perform the forward passoutput = self.forward(batch_X, training=True)# Calculate lossdata_loss, regularization_loss = self.loss.calculate(output, batch_y,include_regularization=True)loss = data_loss + regularization_loss# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)accuracy = self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step == train_steps - 1:print(f'step: {step}, ' +f'acc: {accuracy:.3f}, ' +f'loss: {loss:.3f} (' +f'data_loss: {data_loss:.3f}, ' +f'reg_loss: {regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')


			# Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)epoch_loss = epoch_data_loss + epoch_regularization_lossepoch_accuracy = self.accuracy.calculate_accumulated()print(f'training, ' +f'acc: {epoch_accuracy:.3f}, ' +f'loss: {epoch_loss:.3f} (' +f'data_loss: {epoch_data_loss:.3f}, ' +f'reg_loss: {epoch_regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')


			# If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = X_valbatch_y = y_val# Otherwise slice a batchelse:batch_X = X_val[step*batch_size:(step+1)*batch_size]batch_y = y_val[step*batch_size:(step+1)*batch_size]# Perform the forward passoutput = self.forward(batch_X, training=False)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss = self.loss.calculate_accumulated()validation_accuracy = self.accuracy.calculate_accumulated()print(f'validation, ' +f'acc: {validation_accuracy:.3f}, ' +f'loss: {validation_loss:.3f}')



    # Train the model# def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):def train(self, X, y, *, epochs=1, batch_size=None, print_every=1, validation_data=None):# Initialize accuracy objectself.accuracy.init(y)# Default value if batch size is not being settrain_steps = 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps = 1# For better readabilityX_val, y_val = validation_data# Calculate number of stepsif batch_size is not None:train_steps = len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this won't include it# Add `1` to include this not full batchif train_steps * batch_size < len(X):train_steps += 1if validation_data is not None:validation_steps = len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this won't include it# Add `1` to include this not full batchif validation_steps * batch_size < len(X_val):validation_steps += 1# Main training loopfor epoch in range(1, epochs+1):# Print epoch numberprint(f'epoch: {epoch}')# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = Xbatch_y = y# Otherwise slice a batchelse:batch_X = X[step*batch_size:(step+1)*batch_size]batch_y = y[step*batch_size:(step+1)*batch_size]# Perform the forward passoutput = self.forward(batch_X, training=True)# Calculate lossdata_loss, regularization_loss = self.loss.calculate(output, batch_y, include_regularization=True)loss = data_loss + regularization_loss# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)accuracy = self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step == train_steps - 1:print(f'step: {step}, ' +f'acc: {accuracy:.3f}, ' +f'loss: {loss:.3f} (' +f'data_loss: {data_loss:.3f}, ' +f'reg_loss: {regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')# Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)epoch_loss = epoch_data_loss + epoch_regularization_lossepoch_accuracy = self.accuracy.calculate_accumulated()print(f'training, ' +f'acc: {epoch_accuracy:.3f}, ' +f'loss: {epoch_loss:.3f} (' +f'data_loss: {epoch_data_loss:.3f}, ' +f'reg_loss: {epoch_regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')# If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = X_valbatch_y = y_val# Otherwise slice a batchelse:batch_X = X_val[step*batch_size:(step+1)*batch_size]batch_y = y_val[step*batch_size:(step+1)*batch_size]# Perform the forward passoutput = self.forward(batch_X, training=False)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss = self.loss.calculate_accumulated()validation_accuracy = self.accuracy.calculate_accumulated()# Print a summaryprint(f'validation, ' +f'acc: {validation_accuracy:.3f}, ' +f'loss: {validation_loss:.3f}')



# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')


# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
X = X[keys]
y = y[keys]

然后平移样本,缩放至 -1 至 1 的范围:

# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5


# Instantiate the model
model = Model()# Add layers
model.add(Layer_Dense(X.shape[1], 64))
model.add(Layer_Dense(64, 64))
model.add(Layer_Dense(64, 10))


# Set loss, optimizer and accuracy objects


# Finalize the model
model.finalize()# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=5, batch_size=128, print_every=100)
epoch: 1
step: 0, acc: 0.078, loss: 2.473 (data_loss: 2.473, reg_loss: 0.000), lr: 0.001
step: 100, acc: 0.766, loss: 0.527 (data_loss: 0.527, reg_loss: 0.000), lr: 0.0009950248756218907
step: 200, acc: 0.852, loss: 0.417 (data_loss: 0.417, reg_loss: 0.000), lr: 0.0009900990099009901
step: 300, acc: 0.797, loss: 0.511 (data_loss: 0.511, reg_loss: 0.000), lr: 0.0009852216748768474
step: 400, acc: 0.828, loss: 0.434 (data_loss: 0.434, reg_loss: 0.000), lr: 0.000980392156862745
step: 468, acc: 0.865, loss: 0.305 (data_loss: 0.305, reg_loss: 0.000), lr: 0.0009771350400625367
training, acc: 0.797, loss: 0.568 (data_loss: 0.568, reg_loss: 0.000), lr: 0.0009771350400625367
validation, acc: 0.843, loss: 0.445
epoch: 2
step: 0, acc: 0.859, loss: 0.377 (data_loss: 0.377, reg_loss: 0.000), lr: 0.0009770873027505008
step: 100, acc: 0.820, loss: 0.434 (data_loss: 0.434, reg_loss: 0.000), lr: 0.000972337012008362
step: 200, acc: 0.867, loss: 0.318 (data_loss: 0.318, reg_loss: 0.000), lr: 0.0009676326866321544
step: 300, acc: 0.867, loss: 0.430 (data_loss: 0.430, reg_loss: 0.000), lr: 0.0009629736626703259
step: 400, acc: 0.836, loss: 0.398 (data_loss: 0.398, reg_loss: 0.000), lr: 0.0009583592888974076
step: 468, acc: 0.906, loss: 0.248 (data_loss: 0.248, reg_loss: 0.000), lr: 0.0009552466924583273
training, acc: 0.857, loss: 0.397 (data_loss: 0.397, reg_loss: 0.000), lr: 0.0009552466924583273
validation, acc: 0.856, loss: 0.400
epoch: 3
step: 0, acc: 0.883, loss: 0.317 (data_loss: 0.317, reg_loss: 0.000), lr: 0.0009552010698251983
step: 100, acc: 0.852, loss: 0.355 (data_loss: 0.355, reg_loss: 0.000), lr: 0.0009506607091928891
step: 200, acc: 0.883, loss: 0.288 (data_loss: 0.288, reg_loss: 0.000), lr: 0.0009461633077869241
step: 300, acc: 0.898, loss: 0.391 (data_loss: 0.391, reg_loss: 0.000), lr: 0.0009417082587814295
step: 400, acc: 0.859, loss: 0.367 (data_loss: 0.367, reg_loss: 0.000), lr: 0.0009372949667260287
step: 468, acc: 0.906, loss: 0.219 (data_loss: 0.219, reg_loss: 0.000), lr: 0.000934317481080071
training, acc: 0.871, loss: 0.355 (data_loss: 0.355, reg_loss: 0.000), lr: 0.000934317481080071
validation, acc: 0.863, loss: 0.380
epoch: 4
step: 0, acc: 0.867, loss: 0.294 (data_loss: 0.294, reg_loss: 0.000), lr: 0.0009342738356612324
step: 100, acc: 0.852, loss: 0.323 (data_loss: 0.323, reg_loss: 0.000), lr: 0.0009299297903008323
step: 200, acc: 0.883, loss: 0.271 (data_loss: 0.271, reg_loss: 0.000), lr: 0.0009256259545517657
step: 300, acc: 0.914, loss: 0.369 (data_loss: 0.369, reg_loss: 0.000), lr: 0.0009213617727000506
step: 400, acc: 0.875, loss: 0.349 (data_loss: 0.349, reg_loss: 0.000), lr: 0.0009171366992250195
step: 468, acc: 0.906, loss: 0.197 (data_loss: 0.197, reg_loss: 0.000), lr: 0.0009142857142857143
training, acc: 0.878, loss: 0.331 (data_loss: 0.331, reg_loss: 0.000), lr: 0.0009142857142857143
validation, acc: 0.867, loss: 0.368
epoch: 5
step: 0, acc: 0.867, loss: 0.278 (data_loss: 0.278, reg_loss: 0.000), lr: 0.0009142439202779302
step: 100, acc: 0.867, loss: 0.297 (data_loss: 0.297, reg_loss: 0.000), lr: 0.0009100837277029487
step: 200, acc: 0.891, loss: 0.258 (data_loss: 0.258, reg_loss: 0.000), lr: 0.0009059612248595759
step: 300, acc: 0.891, loss: 0.340 (data_loss: 0.340, reg_loss: 0.000), lr: 0.0009018759018759019
step: 400, acc: 0.875, loss: 0.335 (data_loss: 0.335, reg_loss: 0.000), lr: 0.0008978272580355541
step: 468, acc: 0.917, loss: 0.186 (data_loss: 0.186, reg_loss: 0.000), lr: 0.0008950948800572861
training, acc: 0.885, loss: 0.312 (data_loss: 0.312, reg_loss: 0.000), lr: 0.0008950948800572861
validation, acc: 0.871, loss: 0.359



# # Shuffle the training dataset
# keys = np.array(range(X.shape[0]))
# np.random.shuffle(keys)
# X = X[keys]
# y = y[keys]


epoch: 1
step: 0, acc: 0.000, loss: 2.320 (data_loss: 2.320, reg_loss: 0.000), lr: 0.001
step: 100, acc: 0.000, loss: 3.763 (data_loss: 3.763, reg_loss: 0.000), lr: 0.0009950248756218907
step: 200, acc: 0.000, loss: 2.677 (data_loss: 2.677, reg_loss: 0.000), lr: 0.0009900990099009901
step: 300, acc: 1.000, loss: 0.421 (data_loss: 0.421, reg_loss: 0.000), lr: 0.0009852216748768474
step: 400, acc: 1.000, loss: 0.023 (data_loss: 0.023, reg_loss: 0.000), lr: 0.000980392156862745
step: 468, acc: 1.000, loss: 0.004 (data_loss: 0.004, reg_loss: 0.000), lr: 0.0009771350400625367
training, acc: 0.657, loss: 1.930 (data_loss: 1.930, reg_loss: 0.000), lr: 0.0009771350400625367
validation, acc: 0.109, loss: 5.800
epoch: 2
step: 0, acc: 0.000, loss: 3.527 (data_loss: 3.527, reg_loss: 0.000), lr: 0.0009770873027505008
step: 100, acc: 0.000, loss: 3.722 (data_loss: 3.722, reg_loss: 0.000), lr: 0.000972337012008362
step: 200, acc: 0.531, loss: 1.189 (data_loss: 1.189, reg_loss: 0.000), lr: 0.0009676326866321544
step: 300, acc: 0.961, loss: 0.504 (data_loss: 0.504, reg_loss: 0.000), lr: 0.0009629736626703259
step: 400, acc: 0.984, loss: 0.063 (data_loss: 0.063, reg_loss: 0.000), lr: 0.0009583592888974076
step: 468, acc: 1.000, loss: 0.004 (data_loss: 0.004, reg_loss: 0.000), lr: 0.0009552466924583273
training, acc: 0.746, loss: 1.066 (data_loss: 1.066, reg_loss: 0.000), lr: 0.0009552466924583273
validation, acc: 0.110, loss: 5.365
epoch: 3
step: 0, acc: 0.000, loss: 4.172 (data_loss: 4.172, reg_loss: 0.000), lr: 0.0009552010698251983
step: 100, acc: 0.336, loss: 1.288 (data_loss: 1.288, reg_loss: 0.000), lr: 0.0009506607091928891
step: 200, acc: 0.680, loss: 1.366 (data_loss: 1.366, reg_loss: 0.000), lr: 0.0009461633077869241
step: 300, acc: 1.000, loss: 0.017 (data_loss: 0.017, reg_loss: 0.000), lr: 0.0009417082587814295
step: 400, acc: 0.984, loss: 0.128 (data_loss: 0.128, reg_loss: 0.000), lr: 0.0009372949667260287
step: 468, acc: 1.000, loss: 0.001 (data_loss: 0.001, reg_loss: 0.000), lr: 0.000934317481080071
training, acc: 0.782, loss: 0.838 (data_loss: 0.838, reg_loss: 0.000), lr: 0.000934317481080071
validation, acc: 0.209, loss: 4.189
epoch: 4
step: 0, acc: 0.000, loss: 2.829 (data_loss: 2.829, reg_loss: 0.000), lr: 0.0009342738356612324
step: 100, acc: 0.031, loss: 2.136 (data_loss: 2.136, reg_loss: 0.000), lr: 0.0009299297903008323
step: 200, acc: 0.609, loss: 1.109 (data_loss: 1.109, reg_loss: 0.000), lr: 0.0009256259545517657
step: 300, acc: 0.984, loss: 0.097 (data_loss: 0.097, reg_loss: 0.000), lr: 0.0009213617727000506
step: 400, acc: 0.984, loss: 0.034 (data_loss: 0.034, reg_loss: 0.000), lr: 0.0009171366992250195
step: 468, acc: 1.000, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.0009142857142857143
training, acc: 0.813, loss: 0.719 (data_loss: 0.719, reg_loss: 0.000), lr: 0.0009142857142857143
validation, acc: 0.164, loss: 7.111
epoch: 5
step: 0, acc: 0.000, loss: 5.931 (data_loss: 5.931, reg_loss: 0.000), lr: 0.0009142439202779302
step: 100, acc: 0.781, loss: 0.784 (data_loss: 0.784, reg_loss: 0.000), lr: 0.0009100837277029487
step: 200, acc: 0.750, loss: 0.808 (data_loss: 0.808, reg_loss: 0.000), lr: 0.0009059612248595759
step: 300, acc: 0.984, loss: 0.133 (data_loss: 0.133, reg_loss: 0.000), lr: 0.0009018759018759019
step: 400, acc: 0.961, loss: 0.091 (data_loss: 0.091, reg_loss: 0.000), lr: 0.0008978272580355541
step: 468, acc: 1.000, loss: 0.002 (data_loss: 0.002, reg_loss: 0.000), lr: 0.0008950948800572861
training, acc: 0.860, loss: 0.544 (data_loss: 0.544, reg_loss: 0.000), lr: 0.0008950948800572861
validation, acc: 0.224, loss: 4.844


轮次准确率较低,因为模型在切换标签后需要一段时间来学习新标签,在此期间准确率较低。验证准确率是在特定轮次的训练结束后计算的,正如我们所记得的,模型只学会预测一个标签。在验证过程中,模型预测的标签是它最后看到的标签——准确率接近 1 / 10 1/10 1/10,因为我们的训练数据集包含10个类别。


# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Layer_Dense(128, 128))
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())# Set loss, optimizer and accuracy objects
model.set(loss=Loss_CategoricalCrossentropy(),optimizer=Optimizer_Adam(decay=1e-3),accuracy=Accuracy_Categorical())# Finalize the model
model.finalize()# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10, batch_size=128, print_every=100)
epoch: 1
step: 0, acc: 0.031, loss: 3.608 (data_loss: 3.608, reg_loss: 0.000), lr: 0.001
step: 100, acc: 0.773, loss: 0.542 (data_loss: 0.542, reg_loss: 0.000), lr: 0.0009090909090909091
step: 200, acc: 0.883, loss: 0.325 (data_loss: 0.325, reg_loss: 0.000), lr: 0.0008333333333333334
step: 300, acc: 0.875, loss: 0.439 (data_loss: 0.439, reg_loss: 0.000), lr: 0.0007692307692307692
step: 400, acc: 0.836, loss: 0.422 (data_loss: 0.422, reg_loss: 0.000), lr: 0.0007142857142857143
step: 468, acc: 0.875, loss: 0.291 (data_loss: 0.291, reg_loss: 0.000), lr: 0.000681198910081744
training, acc: 0.813, loss: 0.519 (data_loss: 0.519, reg_loss: 0.000), lr: 0.000681198910081744
validation, acc: 0.843, loss: 0.429
epoch: 2
step: 0, acc: 0.836, loss: 0.373 (data_loss: 0.373, reg_loss: 0.000), lr: 0.0006807351940095304
step: 100, acc: 0.812, loss: 0.387 (data_loss: 0.387, reg_loss: 0.000), lr: 0.0006373486297004461
step: 200, acc: 0.867, loss: 0.285 (data_loss: 0.285, reg_loss: 0.000), lr: 0.0005991611743559018
step: 300, acc: 0.891, loss: 0.384 (data_loss: 0.384, reg_loss: 0.000), lr: 0.0005652911249293386
step: 400, acc: 0.844, loss: 0.381 (data_loss: 0.381, reg_loss: 0.000), lr: 0.0005350454788657037
step: 468, acc: 0.896, loss: 0.214 (data_loss: 0.214, reg_loss: 0.000), lr: 0.0005162622612287042
training, acc: 0.866, loss: 0.370 (data_loss: 0.370, reg_loss: 0.000), lr: 0.0005162622612287042
validation, acc: 0.859, loss: 0.384
epoch: 3
step: 0, acc: 0.844, loss: 0.330 (data_loss: 0.330, reg_loss: 0.000), lr: 0.0005159958720330237
step: 100, acc: 0.852, loss: 0.328 (data_loss: 0.328, reg_loss: 0.000), lr: 0.0004906771344455348
step: 200, acc: 0.898, loss: 0.251 (data_loss: 0.251, reg_loss: 0.000), lr: 0.0004677268475210477
step: 300, acc: 0.883, loss: 0.345 (data_loss: 0.345, reg_loss: 0.000), lr: 0.00044682752457551384
step: 400, acc: 0.859, loss: 0.352 (data_loss: 0.352, reg_loss: 0.000), lr: 0.00042771599657827206
step: 468, acc: 0.917, loss: 0.185 (data_loss: 0.185, reg_loss: 0.000), lr: 0.0004156275976724854
training, acc: 0.880, loss: 0.329 (data_loss: 0.329, reg_loss: 0.000), lr: 0.0004156275976724854
validation, acc: 0.866, loss: 0.364
epoch: 4
step: 0, acc: 0.852, loss: 0.302 (data_loss: 0.302, reg_loss: 0.000), lr: 0.0004154549231408392
step: 100, acc: 0.875, loss: 0.278 (data_loss: 0.278, reg_loss: 0.000), lr: 0.00039888312724371757
step: 200, acc: 0.930, loss: 0.232 (data_loss: 0.232, reg_loss: 0.000), lr: 0.0003835826620636747
step: 300, acc: 0.898, loss: 0.310 (data_loss: 0.310, reg_loss: 0.000), lr: 0.0003694126339120798
step: 400, acc: 0.867, loss: 0.336 (data_loss: 0.336, reg_loss: 0.000), lr: 0.0003562522265764161
step: 468, acc: 0.917, loss: 0.177 (data_loss: 0.177, reg_loss: 0.000), lr: 0.00034782608695652176
training, acc: 0.890, loss: 0.304 (data_loss: 0.304, reg_loss: 0.000), lr: 0.00034782608695652176
validation, acc: 0.872, loss: 0.352
epoch: 5
step: 0, acc: 0.883, loss: 0.274 (data_loss: 0.274, reg_loss: 0.000), lr: 0.0003477051460361613
step: 100, acc: 0.898, loss: 0.256 (data_loss: 0.256, reg_loss: 0.000), lr: 0.00033602150537634406
step: 200, acc: 0.922, loss: 0.220 (data_loss: 0.220, reg_loss: 0.000), lr: 0.00032509752925877764
step: 300, acc: 0.914, loss: 0.283 (data_loss: 0.283, reg_loss: 0.000), lr: 0.00031486146095717883
step: 400, acc: 0.867, loss: 0.329 (data_loss: 0.329, reg_loss: 0.000), lr: 0.00030525030525030525
step: 468, acc: 0.917, loss: 0.171 (data_loss: 0.171, reg_loss: 0.000), lr: 0.0002990430622009569
training, acc: 0.896, loss: 0.287 (data_loss: 0.287, reg_loss: 0.000), lr: 0.0002990430622009569
validation, acc: 0.874, loss: 0.347
epoch: 6
step: 0, acc: 0.891, loss: 0.254 (data_loss: 0.254, reg_loss: 0.000), lr: 0.0002989536621823617
step: 100, acc: 0.914, loss: 0.241 (data_loss: 0.241, reg_loss: 0.000), lr: 0.00029027576197387516
step: 200, acc: 0.914, loss: 0.209 (data_loss: 0.209, reg_loss: 0.000), lr: 0.0002820874471086037
step: 300, acc: 0.922, loss: 0.267 (data_loss: 0.267, reg_loss: 0.000), lr: 0.00027434842249657066
step: 400, acc: 0.883, loss: 0.321 (data_loss: 0.321, reg_loss: 0.000), lr: 0.000267022696929239
step: 468, acc: 0.927, loss: 0.163 (data_loss: 0.163, reg_loss: 0.000), lr: 0.00026226068712300026
training, acc: 0.901, loss: 0.273 (data_loss: 0.273, reg_loss: 0.000), lr: 0.00026226068712300026
validation, acc: 0.877, loss: 0.343
epoch: 7
step: 0, acc: 0.898, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00026219192448872575
step: 100, acc: 0.922, loss: 0.225 (data_loss: 0.225, reg_loss: 0.000), lr: 0.00025549310168625444
step: 200, acc: 0.930, loss: 0.201 (data_loss: 0.201, reg_loss: 0.000), lr: 0.00024912805181863477
step: 300, acc: 0.922, loss: 0.259 (data_loss: 0.259, reg_loss: 0.000), lr: 0.0002430724355858046
step: 400, acc: 0.883, loss: 0.311 (data_loss: 0.311, reg_loss: 0.000), lr: 0.00023730422401518745
step: 468, acc: 0.927, loss: 0.159 (data_loss: 0.159, reg_loss: 0.000), lr: 0.00023353573096683791
training, acc: 0.906, loss: 0.262 (data_loss: 0.262, reg_loss: 0.000), lr: 0.00023353573096683791
validation, acc: 0.878, loss: 0.340
epoch: 8
step: 0, acc: 0.906, loss: 0.224 (data_loss: 0.224, reg_loss: 0.000), lr: 0.00023348120476301658
step: 100, acc: 0.906, loss: 0.214 (data_loss: 0.214, reg_loss: 0.000), lr: 0.00022815423226100847
step: 200, acc: 0.930, loss: 0.191 (data_loss: 0.191, reg_loss: 0.000), lr: 0.0002230649118893598
step: 300, acc: 0.922, loss: 0.253 (data_loss: 0.253, reg_loss: 0.000), lr: 0.00021819768710451667
step: 400, acc: 0.898, loss: 0.307 (data_loss: 0.307, reg_loss: 0.000), lr: 0.00021353833013025838
step: 468, acc: 0.927, loss: 0.156 (data_loss: 0.156, reg_loss: 0.000), lr: 0.00021048200378867611
training, acc: 0.909, loss: 0.252 (data_loss: 0.252, reg_loss: 0.000), lr: 0.00021048200378867611
validation, acc: 0.878, loss: 0.336
epoch: 9
step: 0, acc: 0.906, loss: 0.209 (data_loss: 0.209, reg_loss: 0.000), lr: 0.0002104377104377104
step: 100, acc: 0.922, loss: 0.202 (data_loss: 0.202, reg_loss: 0.000), lr: 0.0002061005770816158
step: 200, acc: 0.922, loss: 0.181 (data_loss: 0.181, reg_loss: 0.000), lr: 0.00020193861066235866
step: 300, acc: 0.922, loss: 0.250 (data_loss: 0.250, reg_loss: 0.000), lr: 0.0001979414093428345
step: 400, acc: 0.898, loss: 0.303 (data_loss: 0.303, reg_loss: 0.000), lr: 0.0001940993788819876
step: 468, acc: 0.938, loss: 0.151 (data_loss: 0.151, reg_loss: 0.000), lr: 0.00019157088122605365
training, acc: 0.912, loss: 0.244 (data_loss: 0.244, reg_loss: 0.000), lr: 0.00019157088122605365
validation, acc: 0.881, loss: 0.335
epoch: 10
step: 0, acc: 0.906, loss: 0.198 (data_loss: 0.198, reg_loss: 0.000), lr: 0.0001915341888527102
step: 100, acc: 0.930, loss: 0.193 (data_loss: 0.193, reg_loss: 0.000), lr: 0.00018793459875963167
step: 200, acc: 0.922, loss: 0.175 (data_loss: 0.175, reg_loss: 0.000), lr: 0.00018446781036709093
step: 300, acc: 0.922, loss: 0.245 (data_loss: 0.245, reg_loss: 0.000), lr: 0.00018112660749864155
step: 400, acc: 0.898, loss: 0.303 (data_loss: 0.303, reg_loss: 0.000), lr: 0.00017790428749332856
step: 468, acc: 0.938, loss: 0.144 (data_loss: 0.144, reg_loss: 0.000), lr: 0.00017577781683951485
training, acc: 0.915, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00017577781683951485
validation, acc: 0.881, loss: 0.334



import numpy as np
import cv2
import os# Loads a MNIST dataset
def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels = os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX = []y = []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage = cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype('uint8')# MNIST dataset (train + test)
def create_data_mnist(path):# Load both sets separatelyX, y = load_mnist_dataset('train', path)X_test, y_test = load_mnist_dataset('test', path)# And return all the datareturn X, y, X_test, y_testimport numpy as np
import nnfs
from nnfs.datasets import sine_data, spiral_data
import sysnnfs.init()# Dense layer
class Layer_Dense:# Layer initializationdef __init__(self, n_inputs, n_neurons,weight_regularizer_l1=0, weight_regularizer_l2=0,bias_regularizer_l1=0, bias_regularizer_l2=0):# Initialize weights and biases# self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)self.biases = np.zeros((1, n_neurons))# Set regularization strengthself.weight_regularizer_l1 = weight_regularizer_l1self.weight_regularizer_l2 = weight_regularizer_l2self.bias_regularizer_l1 = bias_regularizer_l1self.bias_regularizer_l2 = bias_regularizer_l2# Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs = inputs# Calculate output values from inputs, weights and biasesself.output = np.dot(inputs, self.weights) + self.biases# Backward passdef backward(self, dvalues):# Gradients on parametersself.dweights = np.dot(self.inputs.T, dvalues)self.dbiases = np.sum(dvalues, axis=0, keepdims=True)# Gradients on regularization# L1 on weightsif self.weight_regularizer_l1 > 0:dL1 = np.ones_like(self.weights)dL1[self.weights < 0] = -1self.dweights += self.weight_regularizer_l1 * dL1# L2 on weightsif self.weight_regularizer_l2 > 0:self.dweights += 2 * self.weight_regularizer_l2 * self.weights# L1 on biasesif self.bias_regularizer_l1 > 0:dL1 = np.ones_like(self.biases)dL1[self.biases < 0] = -1self.dbiases += self.bias_regularizer_l1 * dL1# L2 on biasesif self.bias_regularizer_l2 > 0:self.dbiases += 2 * self.bias_regularizer_l2 * self.biases# Gradient on valuesself.dinputs = np.dot(dvalues, self.weights.T)# Dropout
class Layer_Dropout:        # Initdef __init__(self, rate):# Store rate, we invert it as for example for dropout# of 0.1 we need success rate of 0.9self.rate = 1 - rate# Forward passdef forward(self, inputs, training):# Save input valuesself.inputs = inputs# If not in the training mode - return valuesif not training:self.output = inputs.copy()return# Generate and save scaled maskself.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate# Apply mask to output valuesself.output = inputs * self.binary_mask# Backward passdef backward(self, dvalues):# Gradient on valuesself.dinputs = dvalues * self.binary_mask# Input "layer"
class Layer_Input:# Forward passdef forward(self, inputs, training):self.output = inputs# ReLU activation
class Activation_ReLU:  # Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs = inputs# Calculate output values from inputsself.output = np.maximum(0, inputs)# Backward passdef backward(self, dvalues):# Since we need to modify original variable,# let's make a copy of values firstself.dinputs = dvalues.copy()# Zero gradient where input values were negativeself.dinputs[self.inputs <= 0] = 0# Calculate predictions for outputsdef predictions(self, outputs):return outputs# Softmax activation
class Activation_Softmax:# Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs = inputs# Get unnormalized probabilitiesexp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))# Normalize them for each sampleprobabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)self.output = probabilities# Backward passdef backward(self, dvalues):# Create uninitialized arrayself.dinputs = np.empty_like(dvalues)# Enumerate outputs and gradientsfor index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):# Flatten output arraysingle_output = single_output.reshape(-1, 1)# Calculate Jacobian matrix of the output andjacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)# Calculate sample-wise gradient# and add it to the array of sample gradientsself.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)# Calculate predictions for outputsdef predictions(self, outputs):return np.argmax(outputs, axis=1)# Sigmoid activation
class Activation_Sigmoid:# Forward passdef forward(self, inputs, training):# Save input and calculate/save output# of the sigmoid functionself.inputs = inputsself.output = 1 / (1 + np.exp(-inputs))# Backward passdef backward(self, dvalues):# Derivative - calculates from output of the sigmoid functionself.dinputs = dvalues * (1 - self.output) * self.output# Calculate predictions for outputsdef predictions(self, outputs):return (outputs > 0.5) * 1# Linear activation
class Activation_Linear:# Forward passdef forward(self, inputs, training):# Just remember valuesself.inputs = inputsself.output = inputs# Backward passdef backward(self, dvalues):# derivative is 1, 1 * dvalues = dvalues - the chain ruleself.dinputs = dvalues.copy()# Calculate predictions for outputsdef predictions(self, outputs):return outputs# SGD optimizer
class Optimizer_SGD:# Initialize optimizer - set settings,# learning rate of 1. is default for this optimizerdef __init__(self, learning_rate=1., decay=0., momentum=0.):self.learning_rate = learning_rateself.current_learning_rate = learning_rateself.decay = decayself.iterations = 0self.momentum = momentum# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If we use momentumif self.momentum:# If layer does not contain momentum arrays, create them# filled with zerosif not hasattr(layer, 'weight_momentums'):layer.weight_momentums = np.zeros_like(layer.weights)# If there is no momentum array for weights# The array doesn't exist for biases yet either.layer.bias_momentums = np.zeros_like(layer.biases)# Build weight updates with momentum - take previous# updates multiplied by retain factor and update with# current gradientsweight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweightslayer.weight_momentums = weight_updates# Build bias updatesbias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiaseslayer.bias_momentums = bias_updates# Vanilla SGD updates (as before momentum update)else:weight_updates = -self.current_learning_rate * layer.dweightsbias_updates = -self.current_learning_rate * layer.dbiases# Update weights and biases using either# vanilla or momentum updateslayer.weights += weight_updateslayer.biases += bias_updates# Call once after any parameter updatesdef post_update_params(self):self.iterations += 1        # Adagrad optimizer
class Optimizer_Adagrad:# Initialize optimizer - set settingsdef __init__(self, learning_rate=1., decay=0., epsilon=1e-7):self.learning_rate = learning_rateself.current_learning_rate = learning_rateself.decay = decayself.iterations = 0self.epsilon = epsilon# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, 'weight_cache'):layer.weight_cache = np.zeros_like(layer.weights)layer.bias_cache = np.zeros_like(layer.biases)# Update cache with squared current gradientslayer.weight_cache += layer.dweights**2layer.bias_cache += layer.dbiases**2# Vanilla SGD parameter update + normalization# with square rooted cachelayer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations += 1# RMSprop optimizer
class Optimizer_RMSprop:            # Initialize optimizer - set settingsdef __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, rho=0.9):self.learning_rate = learning_rateself.current_learning_rate = learning_rateself.decay = decayself.iterations = 0self.epsilon = epsilonself.rho = rho# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, 'weight_cache'):layer.weight_cache = np.zeros_like(layer.weights)layer.bias_cache = np.zeros_like(layer.biases)# Update cache with squared current gradientslayer.weight_cache = self.rho * layer.weight_cache + (1 - self.rho) * layer.dweights**2layer.bias_cache = self.rho * layer.bias_cache + (1 - self.rho) * layer.dbiases**2# Vanilla SGD parameter update + normalization# with square rooted cachelayer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations += 1# Adam optimizer
class Optimizer_Adam:# Initialize optimizer - set settingsdef __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, beta_1=0.9, beta_2=0.999):self.learning_rate = learning_rateself.current_learning_rate = learning_rateself.decay = decayself.iterations = 0self.epsilon = epsilonself.beta_1 = beta_1self.beta_2 = beta_2# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))        # Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, 'weight_cache'):layer.weight_momentums = np.zeros_like(layer.weights)layer.weight_cache = np.zeros_like(layer.weights)layer.bias_momentums = np.zeros_like(layer.biases)layer.bias_cache = np.zeros_like(layer.biases)# Update momentum with current gradientslayer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweightslayer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases# Get corrected momentum# self.iteration is 0 at first pass# and we need to start with 1 hereweight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))# Update cache with squared current gradientslayer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2# Get corrected cacheweight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))# Vanilla SGD parameter update + normalization# with square rooted cachelayer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations += 1# Common loss class
class Loss:# Regularization loss calculationdef regularization_loss(self):        # 0 by defaultregularization_loss = 0# Calculate regularization loss# iterate all trainable layersfor layer in self.trainable_layers:# L1 regularization - weights# calculate only when factor greater than 0if layer.weight_regularizer_l1 > 0:regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))# L2 regularization - weightsif layer.weight_regularizer_l2 > 0:regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)# L1 regularization - biases# calculate only when factor greater than 0if layer.bias_regularizer_l1 > 0:regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))# L2 regularization - biasesif layer.bias_regularizer_l2 > 0:regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)return regularization_loss# Set/remember trainable layersdef remember_trainable_layers(self, trainable_layers):self.trainable_layers = trainable_layers# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularization=False):# Calculate sample lossessample_losses = self.forward(output, y)# Calculate mean lossdata_loss = np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum += np.sum(sample_losses)self.accumulated_count += len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()   # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularization=False):# Calculate mean lossdata_loss = self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()# Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum = 0self.accumulated_count = 0# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):# Forward passdef forward(self, y_pred, y_true):# Number of samples in a batchsamples = len(y_pred)# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)# Probabilities for target values -# only if categorical labelsif len(y_true.shape) == 1:correct_confidences = y_pred_clipped[range(samples),y_true]# Mask values - only for one-hot encoded labelselif len(y_true.shape) == 2:correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)# Lossesnegative_log_likelihoods = -np.log(correct_confidences)return negative_log_likelihoods# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples = len(dvalues)# Number of labels in every sample# We'll use the first sample to count themlabels = len(dvalues[0])# If labels are sparse, turn them into one-hot vectorif len(y_true.shape) == 1:y_true = np.eye(labels)[y_true]# Calculate gradientself.dinputs = -y_true / dvalues# Normalize gradientself.dinputs = self.dinputs / samples# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():  # # Creates activation and loss function objects# def __init__(self):#     self.activation = Activation_Softmax()#     self.loss = Loss_CategoricalCrossentropy()# # Forward pass# def forward(self, inputs, y_true):#     # Output layer's activation function#     self.activation.forward(inputs)#     # Set the output#     self.output = self.activation.output#     # Calculate and return loss value#     return self.loss.calculate(self.output, y_true)# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples = len(dvalues)     # If labels are one-hot encoded,# turn them into discrete valuesif len(y_true.shape) == 2:y_true = np.argmax(y_true, axis=1)# Copy so we can safely modifyself.dinputs = dvalues.copy()# Calculate gradientself.dinputs[range(samples), y_true] -= 1# Normalize gradientself.dinputs = self.dinputs / samples# Binary cross-entropy loss
class Loss_BinaryCrossentropy(Loss): # Forward passdef forward(self, y_pred, y_true):# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)# Calculate sample-wise losssample_losses = -(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))sample_losses = np.mean(sample_losses, axis=-1)# Return lossesreturn sample_losses       # Backward passdef backward(self, dvalues, y_true):# Number of samplessamples = len(dvalues)# Number of outputs in every sample# We'll use the first sample to count themoutputs = len(dvalues[0])# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valueclipped_dvalues = np.clip(dvalues, 1e-7, 1 - 1e-7)# Calculate gradientself.dinputs = -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs# Normalize gradientself.dinputs = self.dinputs / samples# Mean Squared Error loss
class Loss_MeanSquaredError(Loss): # L2 loss# Forward passdef forward(self, y_pred, y_true):# Calculate losssample_losses = np.mean((y_true - y_pred)**2, axis=-1)# Return lossesreturn sample_losses# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples = len(dvalues)# Number of outputs in every sample# We'll use the first sample to count themoutputs = len(dvalues[0])# Gradient on valuesself.dinputs = -2 * (y_true - dvalues) / outputs# Normalize gradientself.dinputs = self.dinputs / samples# Mean Absolute Error loss
class Loss_MeanAbsoluteError(Loss): # L1 lossdef forward(self, y_pred, y_true):# Calculate losssample_losses = np.mean(np.abs(y_true - y_pred), axis=-1)# Return lossesreturn sample_losses# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples = len(dvalues)# Number of outputs in every sample# We'll use the first sample to count themoutputs = len(dvalues[0])# Calculate gradientself.dinputs = np.sign(y_true - dvalues) / outputs# Normalize gradientself.dinputs = self.dinputs / samples# Common accuracy class
class Accuracy:# Calculates an accuracy# given predictions and ground truth valuesdef calculate(self, predictions, y):# Get comparison resultscomparisons = self.compare(predictions, y)# Calculate an accuracyaccuracy = np.mean(comparisons)# Add accumulated sum of matching values and sample countself.accumulated_sum += np.sum(comparisons)self.accumulated_count += len(comparisons)# Return accuracyreturn accuracy     # Calculates accumulated accuracydef calculate_accumulated(self):# Calculate an accuracyaccuracy = self.accumulated_sum / self.accumulated_count# Return the data and regularization lossesreturn accuracy# Reset variables for accumulated accuracydef new_pass(self):self.accumulated_sum = 0self.accumulated_count = 0# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):# No initialization is neededdef init(self, y):pass# Compares predictions to the ground truth valuesdef compare(self, predictions, y):if len(y.shape) == 2:y = np.argmax(y, axis=1)return predictions == y# Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):def __init__(self):# Create precision propertyself.precision = None# Calculates precision value# based on passed in ground truthdef init(self, y, reinit=False):if self.precision is None or reinit:self.precision = np.std(y) / 250# Compares predictions to the ground truth valuesdef compare(self, predictions, y):return np.absolute(predictions - y) < self.precision# Model class
class Model:def __init__(self):# Create a list of network objectsself.layers = []# Softmax classifier's output objectself.softmax_classifier_output = None# Add objects to the modeldef add(self, layer):self.layers.append(layer)# Set loss, optimizer and accuracydef set(self, *, loss, optimizer, accuracy):self.loss = lossself.optimizer = optimizerself.accuracy = accuracy# Finalize the modeldef finalize(self):# Create and set the input layerself.input_layer = Layer_Input()# Count all the objectslayer_count = len(self.layers)# Initialize a list containing trainable layers:self.trainable_layers = []# Iterate the objectsfor i in range(layer_count):# If it's the first layer,# the previous layer object is the input layerif i == 0:self.layers[i].prev = self.input_layerself.layers[i].next = self.layers[i+1]# All layers except for the first and the lastelif i < layer_count - 1:self.layers[i].prev = self.layers[i-1]self.layers[i].next = self.layers[i+1]# The last layer - the next object is the loss# Also let's save aside the reference to the last object# whose output is the model's outputelse:self.layers[i].prev = self.layers[i-1]self.layers[i].next = self.lossself.output_layer_activation = self.layers[i]# If layer contains an attribute called "weights",# it's a trainable layer -# add it to the list of trainable layers# We don't need to check for biases -# checking for weights is enough if hasattr(self.layers[i], 'weights'):self.trainable_layers.append(self.layers[i])# Update loss object with trainable layersself.loss.remember_trainable_layers(self.trainable_layers)# If output activation is Softmax and# loss function is Categorical Cross-Entropy# create an object of combined activation# and loss function containing# faster gradient calculationif isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):# Create an object of combined activation# and loss functionsself.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()# Train the model# def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):def train(self, X, y, *, epochs=1, batch_size=None, print_every=1, validation_data=None):# Initialize accuracy objectself.accuracy.init(y)# Default value if batch size is not being settrain_steps = 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps = 1# For better readabilityX_val, y_val = validation_data# Calculate number of stepsif batch_size is not None:train_steps = len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this won't include it# Add `1` to include this not full batchif train_steps * batch_size < len(X):train_steps += 1if validation_data is not None:validation_steps = len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this won't include it# Add `1` to include this not full batchif validation_steps * batch_size < len(X_val):validation_steps += 1# Main training loopfor epoch in range(1, epochs+1):# Print epoch numberprint(f'epoch: {epoch}')# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = Xbatch_y = y# Otherwise slice a batchelse:batch_X = X[step*batch_size:(step+1)*batch_size]batch_y = y[step*batch_size:(step+1)*batch_size]# Perform the forward passoutput = self.forward(batch_X, training=True)# Calculate lossdata_loss, regularization_loss = self.loss.calculate(output, batch_y, include_regularization=True)loss = data_loss + regularization_loss# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)accuracy = self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step == train_steps - 1:print(f'step: {step}, ' +f'acc: {accuracy:.3f}, ' +f'loss: {loss:.3f} (' +f'data_loss: {data_loss:.3f}, ' +f'reg_loss: {regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')# Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)epoch_loss = epoch_data_loss + epoch_regularization_lossepoch_accuracy = self.accuracy.calculate_accumulated()print(f'training, ' +f'acc: {epoch_accuracy:.3f}, ' +f'loss: {epoch_loss:.3f} (' +f'data_loss: {epoch_data_loss:.3f}, ' +f'reg_loss: {epoch_regularization_loss:.3f}), ' +f'lr: {self.optimizer.current_learning_rate}')# If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X = X_valbatch_y = y_val# Otherwise slice a batchelse:batch_X = X_val[step*batch_size:(step+1)*batch_size]batch_y = y_val[step*batch_size:(step+1)*batch_size]# Perform the forward passoutput = self.forward(batch_X, training=False)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions = self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss = self.loss.calculate_accumulated()validation_accuracy = self.accuracy.calculate_accumulated()# Print a summaryprint(f'validation, ' +f'acc: {validation_accuracy:.3f}, ' +f'loss: {validation_loss:.3f}')# Performs forward passdef forward(self, X, training):# Call forward method on the input layer# this will set the output property that# the first layer in "prev" object is expectingself.input_layer.forward(X, training)# Call forward method of every object in a chain# Pass output of the previous object as a parameterfor layer in self.layers:layer.forward(layer.prev.output, training)# "layer" is now the last object from the list,# return its outputreturn layer.output# Performs backward passdef backward(self, output, y):# If softmax classifierif self.softmax_classifier_output is not None:# First call backward method# on the combined activation/loss# this will set dinputs propertyself.softmax_classifier_output.backward(output, y)# Since we'll not call backward method of the last layer# which is Softmax activation# as we used combined activation/loss# object, let's set dinputs in this objectself.layers[-1].dinputs = self.softmax_classifier_output.dinputs# Call backward method going through# all the objects but last# in reversed order passing dinputs as a parameterfor layer in reversed(self.layers[:-1]):layer.backward(layer.next.dinputs)return# First call backward method on the loss# this will set dinputs property that the last# layer will try to access shortlyself.loss.backward(output, y)# Call backward method going through all the objects# in reversed order passing dinputs as a parameterfor layer in reversed(self.layers):layer.backward(layer.next.dinputs) # Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
X = X[keys]
y = y[keys]# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5# Instantiate the model
model = Model()# # Add layers
# model.add(Layer_Dense(X.shape[1], 64))
# model.add(Activation_ReLU())
# model.add(Layer_Dense(64, 64))
# model.add(Activation_ReLU())
# model.add(Layer_Dense(64, 10))
# model.add(Activation_Softmax())# # Set loss, optimizer and accuracy objects
# model.set(
#     loss=Loss_CategoricalCrossentropy(),
#     optimizer=Optimizer_Adam(decay=5e-5),
#     accuracy=Accuracy_Categorical()
#     )# # Finalize the model
# model.finalize()# # Train the model
# model.train(X, y, validation_data=(X_test, y_test), epochs=5, batch_size=128, print_every=100)# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Layer_Dense(128, 128))
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())# Set loss, optimizer and accuracy objects
model.set(loss=Loss_CategoricalCrossentropy(),optimizer=Optimizer_Adam(decay=1e-3),accuracy=Accuracy_Categorical())# Finalize the model
model.finalize()# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10, batch_size=128, print_every=100)





