01.神经网络和深度学习 W4.深层神经网络（作业：建立你的深度神经网络+图片猫预测）

文章目录

- 作业1. 建立你的深度神经网络
- - 1. 导入包
  - 2. 算法主要流程
  - 3. 初始化
  - - 3.1 两层神经网络
    - 3.2 多层神经网络
  - 4. 前向传播
  - - 4.1 线性模块
    - 4.2 线性激活模块
    - 4.3 多层模型
  - 5. 损失函数
  - 6. 反向传播
  - - 6.1 线性模块
    - 6.2 线性激活模块
    - 6.3 多层模型
    - 6.4 梯度下降、更新参数
- 作业2. 深度神经网络应用：图像分类
- - 1. 导入包
  - 2. 数据集
  - 3. 建立模型
  - - 3.1 两层神经网络
    - 3.2 多层神经网络
    - 3.3 一般步骤
  - 4. 两层神经网络
  - 5. 多层神经网络
  - 6. 结果分析
  - 7. 用自己的图片测试

测试题：参考博文

作业1. 建立你的深度神经网络

1. 导入包

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *
from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'%load_ext autoreload
%autoreload 2np.random.seed(1)

2. 算法主要流程

在这里插入图片描述

3. 初始化

第4节笔记：01.神经网络和深度学习 W4.深层神经网络

3.1 两层神经网络

模型结构：LINEAR -> RELU -> LINEAR -> SIGMOID
权重：np.random.randn(shape)*0.01
偏置：np.zeros(shape)

# GRADED FUNCTION: initialize_parametersdef initialize_parameters(n_x, n_h, n_y):"""Argument:n_x -- size of the input layern_h -- size of the hidden layern_y -- size of the output layerReturns:parameters -- python dictionary containing your parameters:W1 -- weight matrix of shape (n_h, n_x)b1 -- bias vector of shape (n_h, 1)W2 -- weight matrix of shape (n_y, n_h)b2 -- bias vector of shape (n_y, 1)"""np.random.seed(1)### START CODE HERE ### (≈ 4 lines of code)W1 = np.random.randn(n_h, n_x)*0.01b1 = np.zeros((n_h, 1))W2 = np.random.randn(n_y, n_h)*0.01b2 = np.zeros((n_y, 1))### END CODE HERE ###assert(W1.shape == (n_h, n_x))assert(b1.shape == (n_h, 1))assert(W2.shape == (n_y, n_h))assert(b2.shape == (n_y, 1))parameters = {"W1": W1,"b1": b1,"W2": W2,"b2": b2}return parameters

3.2 多层神经网络

模型结构：[LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID

# GRADED FUNCTION: initialize_parameters_deepdef initialize_parameters_deep(layer_dims):"""Arguments:layer_dims -- python array (list) containing the dimensions of each layer in our networkReturns:parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])bl -- bias vector of shape (layer_dims[l], 1)"""np.random.seed(3)parameters = {}L = len(layer_dims)            # number of layers in the networkfor l in range(1, L):### START CODE HERE ### (≈ 2 lines of code)parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))### END CODE HERE ###assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))return parameters

4. 前向传播

4.1 线性模块

向量化公式
$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$

其中 $A^{[0]} = X$

计算 $Z$ ，缓存 $A, W, b$

# GRADED FUNCTION: linear_forwarddef linear_forward(A, W, b):"""Implement the linear part of a layer's forward propagation.Arguments:A -- activations from previous layer (or input data): (size of previous layer, number of examples)W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)b -- bias vector, numpy array of shape (size of the current layer, 1)Returns:Z -- the input of the activation function, also called pre-activation parameter cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently"""### START CODE HERE ### (≈ 1 line of code)Z = np.dot(W, A) + b### END CODE HERE ###assert(Z.shape == (W.shape[0], A.shape[1]))cache = (A, W, b)return Z, cache

4.2 线性激活模块

计算激活输出 $A$ ，以及缓存 $Z$ （反向传播时要用到）（作业里的激活函数会返回这两项）
$A^{[l]} = g(Z^{[l]}) = g(W^{[l]}A^{[l-1]} +b^{[l]})$
其中 $g$ 是激活函数，可以是ReLu，Sigmoid等

# GRADED FUNCTION: linear_activation_forwarddef linear_activation_forward(A_prev, W, b, activation):"""Implement the forward propagation for the LINEAR->ACTIVATION layerArguments:A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)b -- bias vector, numpy array of shape (size of the current layer, 1)activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"Returns:A -- the output of the activation function, also called the post-activation value cache -- a python dictionary containing "linear_cache" and "activation_cache";stored for computing the backward pass efficiently"""if activation == "sigmoid":# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".### START CODE HERE ### (≈ 2 lines of code)Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = sigmoid(Z)### END CODE HERE ###elif activation == "relu":# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".### START CODE HERE ### (≈ 2 lines of code)Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = relu(Z)### END CODE HERE ###assert (A.shape == (W.shape[0], A_prev.shape[1]))cache = (linear_cache, activation_cache)return A, cache

4.3 多层模型

在这里插入图片描述
前面使用 $L - 1$ 层 ReLu，最后使用 1 层 Sigmoid

# GRADED FUNCTION: L_model_forwarddef L_model_forward(X, parameters):"""Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computationArguments:X -- data, numpy array of shape (input size, number of examples)parameters -- output of initialize_parameters_deep()Returns:AL -- last post-activation valuecaches -- list of caches containing:every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)the cache of linear_sigmoid_forward() (there is one, indexed L-1)"""caches = []A = XL = len(parameters) // 2   # number of layers in the neural network# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.for l in range(1, L):A_prev = A ### START CODE HERE ### (≈ 2 lines of code)A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], 'relu')caches.append(cache) # 每一层的 （A，W，b, Z)### END CODE HERE #### Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.### START CODE HERE ### (≈ 2 lines of code)AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)], 'sigmoid')caches.append(cache)### END CODE HERE ###assert(AL.shape == (1,X.shape[1]))return AL, caches

现在得到了一个完整的前向传播，AL 包含预测值，可以计算损失函数

5. 损失函数

计算损失：

$−1m∑i=1m(y(i)log⁡(a[L](i))+(1−y(i))log⁡(1−a[L](i)))-\frac{1}{m} \sum\limits_{i = 1}^{m} \bigg(y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right) \bigg)$

# GRADED FUNCTION: compute_costdef compute_cost(AL, Y):"""Implement the cost function defined by equation (7).Arguments:AL -- probability vector corresponding to your label predictions, shape (1, number of examples)Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)Returns:cost -- cross-entropy cost"""m = Y.shape[1]# Compute loss from aL and y.### START CODE HERE ### (≈ 1 lines of code)cost = np.sum(Y*np.log(AL)+(1-Y)*np.log(1-AL))/(-m)### END CODE HERE ###cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).assert(cost.shape == ())return cost

6. 反向传播

计算损失函数的梯度：

在这里插入图片描述

6.1 线性模块

在这里插入图片描述
$dW[l]=∂L∂W[l]=1mdZ[l]A[l−1]TdW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}$
$db[l]=∂L∂b[l]=1m∑i=1mdZ[l](i)db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}$
$dA[l−1]=∂L∂A[l−1]=W[l]TdZ[l]dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]}$

# GRADED FUNCTION: linear_backwarddef linear_backward(dZ, cache):"""Implement the linear portion of backward propagation for a single layer (layer l)Arguments:dZ -- Gradient of the cost with respect to the linear output (of current layer l)cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layerReturns:dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prevdW -- Gradient of the cost with respect to W (current layer l), same shape as Wdb -- Gradient of the cost with respect to b (current layer l), same shape as b"""A_prev, W, b = cachem = A_prev.shape[1]### START CODE HERE ### (≈ 3 lines of code)dW = np.dot(dZ, A_prev.T)/mdb = 1/m*np.sum(dZ, axis=1, keepdims=True)dA_prev = np.dot(W.T, dZ)### END CODE HERE ###assert (dA_prev.shape == A_prev.shape)assert (dW.shape == W.shape)assert (db.shape == b.shape)return dA_prev, dW, db

6.2 线性激活模块

$dZ^{[l]} = dA^{[l]} * g'(Z^{[l]})$

# GRADED FUNCTION: linear_activation_backwarddef linear_activation_backward(dA, cache, activation):"""Implement the backward propagation for the LINEAR->ACTIVATION layer.Arguments:dA -- post-activation gradient for current layer l cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficientlyactivation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"Returns:dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prevdW -- Gradient of the cost with respect to W (current layer l), same shape as Wdb -- Gradient of the cost with respect to b (current layer l), same shape as b"""linear_cache, activation_cache = cacheif activation == "relu":### START CODE HERE ### (≈ 2 lines of code)dZ = relu_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)### END CODE HERE ###elif activation == "sigmoid":### START CODE HERE ### (≈ 2 lines of code)dZ = sigmoid_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)### END CODE HERE ###return dA_prev, dW, db

6.3 多层模型

在这里插入图片描述
$d A L = - n p . d i v i d e (Y, A L) + n p . d i v i d e (1 - Y, 1 - A L)$

# GRADED FUNCTION: L_model_backwarddef L_model_backward(AL, Y, caches):"""Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID groupArguments:AL -- probability vector, output of the forward propagation (L_model_forward())Y -- true "label" vector (containing 0 if non-cat, 1 if cat)caches -- list of caches containing:every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])Returns:grads -- A dictionary with the gradientsgrads["dA" + str(l)] = ... grads["dW" + str(l)] = ...grads["db" + str(l)] = ... """grads = {}L = len(caches) # the number of layersm = AL.shape[1]Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL# Initializing the backpropagation### START CODE HERE ### (1 line of code)dAL = -np.divide(Y, AL) + np.divide(1-Y, 1-AL)### END CODE HERE #### Lth layer (SIGMOID -> LINEAR) gradients. # Inputs: "AL, Y, caches". # Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]### START CODE HERE ### (approx. 2 lines)current_cache = caches[L-1]grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, 'sigmoid')### END CODE HERE ###for l in reversed(range(L-1)):# lth layer: (RELU -> LINEAR) gradients.# Inputs: "grads["dA" + str(l + 2)], caches". # Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] ### START CODE HERE ### (approx. 5 lines)current_cache = caches[l]dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(l+2)], current_cache, 'relu')grads["dA" + str(l + 1)] = dA_prev_tempgrads["dW" + str(l + 1)] = dW_tempgrads["db" + str(l + 1)] = db_temp### END CODE HERE ###return grads

6.4 梯度下降、更新参数

$W[l]=W[l]−αdW[l]W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]}$
$b[l]=b[l]−αdb[l]b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]}$

# GRADED FUNCTION: update_parametersdef update_parameters(parameters, grads, learning_rate):"""Update parameters using gradient descentArguments:parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backwardReturns:parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ..."""L = len(parameters) // 2 # number of layers in the neural network# Update rule for each parameter. Use a for loop.### START CODE HERE ### (≈ 3 lines of code)for l in range(L):parameters["W" + str(l+1)] = parameters['W'+str(l+1)] - learning_rate * grads['dW'+str(l+1)]parameters["b" + str(l+1)] = parameters['b'+str(l+1)] - learning_rate * grads['db'+str(l+1)]### END CODE HERE ###return parameters

作业2. 深度神经网络应用：图像分类

使用上面的函数，建立深度神经网络，并对图片是不是猫进行预测。

1. 导入包

import time
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
from dnn_app_utils_v2 import *%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'%load_ext autoreload
%autoreload 2np.random.seed(1)

2. 数据集

01.神经网络和深度学习 W2.神经网络基础（作业：逻辑回归图片识别）

使用 01W2 作业里面的数据集，逻辑回归的准确率只有 70%

加载数据

train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

查看数据

# Example of a picture
index = 1
plt.imshow(train_x_orig[index])
print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") +  " picture.")

在这里插入图片描述

查看数据大小

# Explore your dataset 
m_train = train_x_orig.shape[0]
num_px = train_x_orig.shape[1]
m_test = test_x_orig.shape[0]print ("Number of training examples: " + str(m_train))
print ("Number of testing examples: " + str(m_test))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_x_orig shape: " + str(train_x_orig.shape))
print ("train_y shape: " + str(train_y.shape))
print ("test_x_orig shape: " + str(test_x_orig.shape))
print ("test_y shape: " + str(test_y.shape))

Number of training examples: 209
Number of testing examples: 50
Each image is of size: (64, 64, 3)
train_x_orig shape: (209, 64, 64, 3)
train_y shape: (1, 209)
test_x_orig shape: (50, 64, 64, 3)
test_y shape: (1, 50)

图片数据向量化

# Reshape the training and test examples 
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T   # The "-1" makes reshape flatten the remaining dimensions
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T# Standardize data to have feature values between 0 and 1.
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.print ("train_x's shape: " + str(train_x.shape))
print ("test_x's shape: " + str(test_x.shape))

train_x's shape: (12288, 209) # 12288 = 64 * 64 * 3
test_x's shape: (12288, 50)

3. 建立模型

3.1 两层神经网络

两层NN

3.2 多层神经网络

多层NN

3.3 一般步骤

初始化参数 / 定义超参数
n_iters次迭代循环：
– a. 正向传播
– b. 计算成本函数
– c. 反向传播
– d. 更新参数（使用参数、梯度）
使用训练好的参数预测

4. 两层神经网络

定义参数

### CONSTANTS DEFINING THE MODEL ####
n_x = 12288     # num_px * num_px * 3
n_h = 7  # 隐藏层单元个数
n_y = 1
layers_dims = (n_x, n_h, n_y)

组件模型

# GRADED FUNCTION: two_layer_modeldef two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):"""Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.Arguments:X -- input data, of shape (n_x, number of examples)Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)layers_dims -- dimensions of the layers (n_x, n_h, n_y)num_iterations -- number of iterations of the optimization looplearning_rate -- learning rate of the gradient descent update ruleprint_cost -- If set to True, this will print the cost every 100 iterations Returns:parameters -- a dictionary containing W1, W2, b1, and b2"""np.random.seed(1)grads = {}costs = []                              # to keep track of the costm = X.shape[1]                           # number of examples(n_x, n_h, n_y) = layers_dims# Initialize parameters dictionary, by calling one of the functions you'd previously implemented### START CODE HERE ### (≈ 1 line of code)parameters = initialize_parameters(n_x, n_h, n_y)### END CODE HERE #### Get W1, b1, W2 and b2 from the dictionary parameters.W1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. # Inputs: "X, W1, b1". # Output: "A1, cache1, A2, cache2".### START CODE HERE ### (≈ 2 lines of code)A1, cache1 = linear_activation_forward(X, W1, b1, 'relu')A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid')### END CODE HERE #### Compute cost### START CODE HERE ### (≈ 1 line of code)cost = compute_cost(A2, Y)### END CODE HERE #### Initializing backward propagationdA2 = - np.divide(Y, A2) + np.divide(1 - Y, 1 - A2)# Backward propagation. # Inputs: "dA2, cache2, cache1". # Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".### START CODE HERE ### (≈ 2 lines of code)dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid')dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu')### END CODE HERE #### Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2grads['dW1'] = dW1grads['db1'] = db1grads['dW2'] = dW2grads['db2'] = db2# Update parameters.### START CODE HERE ### (approx. 1 line of code)parameters = update_parameters(parameters, grads, learning_rate)### END CODE HERE #### Retrieve W1, b1, W2, b2 from parametersW1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]# Print the cost every 100 training exampleif print_cost and i % 100 == 0:print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))if print_cost and i % 100 == 0:costs.append(cost)# plot the costplt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters

训练

parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)

Cost after iteration 0: 0.693049735659989
Cost after iteration 100: 0.6464320953428849
Cost after iteration 200: 0.6325140647912678
Cost after iteration 300: 0.6015024920354665
Cost after iteration 400: 0.5601966311605747
Cost after iteration 500: 0.5158304772764729
Cost after iteration 600: 0.4754901313943325
Cost after iteration 700: 0.43391631512257495
Cost after iteration 800: 0.4007977536203887
Cost after iteration 900: 0.35807050113237976
Cost after iteration 1000: 0.33942815383664127
Cost after iteration 1100: 0.30527536361962654
Cost after iteration 1200: 0.2749137728213016
Cost after iteration 1300: 0.24681768210614846
Cost after iteration 1400: 0.19850735037466097
Cost after iteration 1500: 0.17448318112556657
Cost after iteration 1600: 0.1708076297809689
Cost after iteration 1700: 0.11306524562164715
Cost after iteration 1800: 0.09629426845937145
Cost after iteration 1900: 0.08342617959726863
Cost after iteration 2000: 0.07439078704319078
Cost after iteration 2100: 0.06630748132267933
Cost after iteration 2200: 0.0591932950103817
Cost after iteration 2300: 0.05336140348560554
Cost after iteration 2400: 0.04855478562877016

在这里插入图片描述

预测

训练集：Accuracy: 0.9999999999999998

predictions_train = predict(train_x, train_y, parameters)
# Accuracy: 0.9999999999999998

测试集：Accuracy: 0.72，比之前的逻辑回归 0.70 好一些

predictions_test = predict(test_x, test_y, parameters)
# Accuracy: 0.72

5. 多层神经网络

定义参数，5层 NN

### CONSTANTS ###
layers_dims = [12288, 20, 7, 5, 1] #  5-layer model

组件模型

# GRADED FUNCTION: L_layer_modeldef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009"""Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.Arguments:X -- data, numpy array of shape (number of examples, num_px * num_px * 3)Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).learning_rate -- learning rate of the gradient descent update rulenum_iterations -- number of iterations of the optimization loopprint_cost -- if True, it prints the cost every 100 stepsReturns:parameters -- parameters learnt by the model. They can then be used to predict."""np.random.seed(1)costs = []                         # keep track of cost# Parameters initialization.### START CODE HERE ###parameters = initialize_parameters_deep(layers_dims)### END CODE HERE #### Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.### START CODE HERE ### (≈ 1 line of code)AL, caches = L_model_forward(X, parameters)### END CODE HERE #### Compute cost.### START CODE HERE ### (≈ 1 line of code)cost = compute_cost(AL, Y)### END CODE HERE #### Backward propagation.### START CODE HERE ### (≈ 1 line of code)grads = L_model_backward(AL, Y, caches)### END CODE HERE #### Update parameters.### START CODE HERE ### (≈ 1 line of code)parameters = update_parameters(parameters, grads, learning_rate)### END CODE HERE #### Print the cost every 100 training exampleif print_cost and i % 100 == 0:print ("Cost after iteration %i: %f" %(i, cost))if print_cost and i % 100 == 0:costs.append(cost)# plot the costplt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters

训练

parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)

Cost after iteration 0: 0.771749
Cost after iteration 100: 0.672053
Cost after iteration 200: 0.648263
Cost after iteration 300: 0.611507
Cost after iteration 400: 0.567047
Cost after iteration 500: 0.540138
Cost after iteration 600: 0.527930
Cost after iteration 700: 0.465477
Cost after iteration 800: 0.369126
Cost after iteration 900: 0.391747
Cost after iteration 1000: 0.315187
Cost after iteration 1100: 0.272700
Cost after iteration 1200: 0.237419
Cost after iteration 1300: 0.199601
Cost after iteration 1400: 0.189263
Cost after iteration 1500: 0.161189
Cost after iteration 1600: 0.148214
Cost after iteration 1700: 0.137775
Cost after iteration 1800: 0.129740
Cost after iteration 1900: 0.121225
Cost after iteration 2000: 0.113821
Cost after iteration 2100: 0.107839
Cost after iteration 2200: 0.102855
Cost after iteration 2300: 0.100897
Cost after iteration 2400: 0.092878

在这里插入图片描述

预测

训练集：Accuracy: 0.9856459330143539

pred_train = predict(train_x, train_y, parameters)
# Accuracy: 0.9856459330143539

测试集：Accuracy: 0.8，比逻辑回归 0.70，两层NN 0.72 都要好

pred_test = predict(test_x, test_y, parameters)
# Accuracy: 0.8

下一门课将会系统的学习如何调参，使得模型的效果更好

6. 结果分析

def print_mislabeled_images(classes, X, y, p):"""Plots images where predictions and truth were different.X -- datasety -- true labelsp -- predictions"""a = p + ymislabeled_indices = np.asarray(np.where(a == 1)) # 0+1, 1+0, wrong caseplt.rcParams['figure.figsize'] = (40.0, 40.0) # set default size of plotsnum_images = len(mislabeled_indices[0])for i in range(num_images):index = mislabeled_indices[1][i]plt.subplot(2, num_images, i + 1)plt.imshow(X[:,index].reshape(64,64,3), interpolation='nearest')plt.axis('off')plt.title("Prediction: " + classes[int(p[0,index])].decode("utf-8") + " \n Class: " + classes[y[0,index]].decode("utf-8"))print_mislabeled_images(classes, test_x, test_y, pred_test)

在这里插入图片描述
错误特点：

猫的身体在一个不寻常的位置
猫出现在一个相似颜色的背景下
不常见的猫颜色和种类
照相机角度
图片的亮度
大小程度（猫在图像中非常大或很小）

7. 用自己的图片测试

## START CODE HERE ##
my_image = "my_image.jpg" # change this to the name of your image file 
my_label_y = [1] # the true class of your image (1 -> cat, 0 -> non-cat)
## END CODE HERE ##fname = "images/" + my_image
image = Image.open(fname)
my_image = np.array(image.resize((num_px,num_px))).reshape((num_px*num_px*3,1))
my_predicted_image = predict(my_image, my_label_y, parameters)plt.imshow(image)
print ("y = " + str(np.squeeze(my_predicted_image)) + ", your L-layer model predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

Accuracy: 1.0
y = 1.0, your L-layer model predicts a "cat" picture.

cat

我的CSDN博客地址 https://michael.blog.csdn.net/

长按或扫码关注我的公众号（Michael阿明），一起加油、一起学习进步！