6.深度学习练习:Initialization

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。

 

课程链接:https://www.deeplearning.ai/deep-learning-specialization/

目录

1 - Neural Network model

2 - Zero initialization

3 - Random initialization(掌握)

4 - He initialization(理解)


To get started, run the following cell to load the packages and the planar dataset you will try to classify.

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'# load image dataset: blue/red dots in circles
train_X, train_Y, test_X, test_Y = load_dataset()


1 - Neural Network model

You will use a 3-layer neural network (already implemented for you). Here are the initialization methods you will experiment with:

  • Zeros initialization -- setting initialization = "zeros" in the input argument.
  • Random initialization -- setting initialization = "random" in the input argument. This initializes the weights to large random values.
  • He initialization -- setting initialization = "he" in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015.

Instructions: Please quickly read over the code below, and run it. In the next part you will implement the three initialization methods that this model() calls.

def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):"""Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.Arguments:X -- input data, of shape (2, number of examples)Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)learning_rate -- learning rate for gradient descent num_iterations -- number of iterations to run gradient descentprint_cost -- if True, print the cost every 1000 iterationsinitialization -- flag to choose which initialization to use ("zeros","random" or "he")Returns:parameters -- parameters learnt by the model"""grads = {}costs = [] # to keep track of the lossm = X.shape[1] # number of exampleslayers_dims = [X.shape[0], 10, 5, 1]# Initialize parameters dictionary.if initialization == "zeros":parameters = initialize_parameters_zeros(layers_dims)elif initialization == "random":parameters = initialize_parameters_random(layers_dims)elif initialization == "he":parameters = initialize_parameters_he(layers_dims)# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.a3, cache = forward_propagation(X, parameters)# Losscost = compute_loss(a3, Y)# Backward propagation.grads = backward_propagation(X, Y, cache)# Update parameters.parameters = update_parameters(parameters, grads, learning_rate)# Print the loss every 1000 iterationsif print_cost and i % 1000 == 0:print("Cost after iteration {}: {}".format(i, cost))costs.append(cost)# plot the lossplt.plot(costs)plt.ylabel('cost')plt.xlabel('iterations (per hundreds)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters

2 - Zero initialization

There are two types of parameters to initialize in a neural network:

the weight matrices:$(W^{[1]}, W^{[2]}, W^{[3]}, ..., W^{[L-1]}, W^{[L]})$

the bias vectors:$(b^{[1]}, b^{[2]}, b^{[3]}, ..., b^{[L-1]}, b^{[L]})$

Exercise: Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry", but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

def initialize_parameters_zeros(layers_dims):"""Arguments:layer_dims -- python array (list) containing the size of each layer.Returns:parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])b1 -- bias vector of shape (layers_dims[1], 1)...WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])bL -- bias vector of shape (layers_dims[L], 1)"""parameters = {}L = len(layers_dims)            # number of layers in the networkfor l in range(1, L):parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))return parameters
parameters = initialize_parameters_zeros([3,2,1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))W1 = [[0. 0. 0.][0. 0. 0.]]
b1 = [[0.][0.]]
W2 = [[0. 0.]]
b2 = [[0.]]parameters = model(train_X, train_Y, initialization = "zeros")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)
plt.title("Model with Zeros initialization")
axes = plt.gca()
axes.set_xlim([-1.5,1.5])
axes.set_ylim([-1.5,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))

The model is predicting 0 for every example.

In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, and you might as well be training a neural network with ?[?]=1 for every layer, and the network is no more powerful than a linear classifier such as logistic regression.

**What you should remember**: - The weights W^{[l]} should be initialized randomly to break symmetry. - It is however okay to initialize the biases b^{[l]}to zeros. Symmetry is still broken so long as W^{[l]}is initialized randomly.


3 - Random initialization(掌握)

To break symmetry, lets intialize the weights randomly. Following random initialization, each neuron can then proceed to learn a different function of its inputs. In this exercise, you will see what happens if the weights are intialized randomly, but to very large values.

Exercise: Implement the following function to initialize your weights to large random values (scaled by *10) and your biases to zeros. Use np.random.randn(..,..) * 10 for weights and np.zeros((.., ..)) for biases. We are using a fixed np.random.seed(..) to make sure your "random" weights match ours, so don't worry if running several times your code gives you always the same initial values for the parameters.

def initialize_parameters_random(layers_dims):"""Arguments:layer_dims -- python array (list) containing the size of each layer.Returns:parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])b1 -- bias vector of shape (layers_dims[1], 1)...WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])bL -- bias vector of shape (layers_dims[L], 1)"""np.random.seed(3)               # This seed makes sure your "random" numbers will be the as oursparameters = {}L = len(layers_dims)            # integer representing the number of layersfor l in range(1, L):parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))return parameters
parameters = model(train_X, train_Y, initialization = "random")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

**In summary**: - Initializing weights to very large random values does not work well. - Hopefully intializing with small random values does better. The important question is: how small should be these random values be? Lets find out in the next part!


4 - He initialization(理解)

Finally, try "He Initialization"; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights ?[?]W[l] of sqrt(1./layers_dims[l-1]) where He initialization would use sqrt(2./layers_dims[l-1]).)

Exercise: Implement the following function to initialize your parameters with He initialization.

Hint: This function is similar to the previous initialize_parameters_random(...). The only difference is that instead of multiplying np.random.randn(..,..) by 10, you will multiply it by \sqrt{\frac{2}{\text{dimension of the previous layer}}},which is what He initialization recommends for layers with a ReLU activation.

# GRADED FUNCTION: initialize_parameters_hedef initialize_parameters_he(layers_dims):"""Arguments:layer_dims -- python array (list) containing the size of each layer.Returns:parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])b1 -- bias vector of shape (layers_dims[1], 1)...WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])bL -- bias vector of shape (layers_dims[L], 1)"""np.random.seed(3)parameters = {}L = len(layers_dims) - 1 # integer representing the number of layersfor l in range(1, L + 1):### START CODE HERE ### (≈ 2 lines of code)parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2 / layers_dims[l-1])parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))### END CODE HERE ###return parameters

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/439854.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【CodeForces - 602D】Lipshitz Sequence(思维,单调栈,斜率单调性)

题干: A function is called Lipschitz continuous if there is a real constant Ksuch that the inequality |f(x) - f(y)| ≤ K|x - y| holds for all . Well deal with a more... discrete version of this term. For an array , we define its Lipschi…

7.深度学习练习:Regularization

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1-Package 2 - Non-regularized model 3 - L2 Regularization(掌握) 4-Dropou…

深入详解JVM内存模型与JVM参数详细配置

本系列会持续更新。 JVM基本是BAT面试必考的内容,今天我们先从JVM内存模型开启详解整个JVM系列,希望看完整个系列后,可以轻松通过BAT关于JVM的考核。 BAT必考JVM系列专题 1.JVM内存模型 2.JVM垃圾回收算法 3.JVM垃圾回收器 4.JVM参数详解 5…

8.深度学习练习:Gradient Checking

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1) How does gradient checking work? 2) 1-dimensional gradient checking 3) N-dimensional gradie…

9.深度学习练习:Optimization Methods

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1 - Gradient Descent 2 - Mini-Batch Gradient descent 3 - Momentum 4 - Adam 5 - Model with dif…

一步步编写操作系统 22 硬盘操作方法

硬盘中的指令很多,各指令的用法也不同。有的指令直接往command寄存器中写就行了,有的还要在feature寄存器中写入参数,最权威的方法还是要去参考ATA手册。由于本书中用到的都是简单的指令,所以对此抽象出一些公共的步骤仅供参考之用…

10.深度学习练习:Convolutional Neural Networks: Step by Step(强烈推荐)

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1 - Packages 2 - Outline of the Assignment 3 - Convolutional Neural Networks 3.1 - Zero-Paddin…

一步步编写操作系统 23 重写主引导记录mbr

本节我们在之前MBR的基础上,做个稍微大一点的改进,经过这个改进后,我们的MBR可以读取硬盘。听上去这可是个大“手术”呢,我们要将之前学过的知识都用上啦。其实没那么大啦,就是加了个读写磁盘的函数而已,哈…

11.深度学习练习:Keras tutorial - the Happy House(推荐)

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ Welcome to the first assignment of week 2. In this assignment, you will: Learn to use Keras, a high-lev…

一步步编写操作系统 24 编写内核加载器

这一节的内容并不长,因为在进入保护模式之前,我们能做的不多,loader是要经过实模式到保护模式的过渡,并最终在保护模式下加载内核。本节只实现一个简单的loader,本loader只在实模式下工作,等学习了保护模式…

12.深度学习练习:Residual Networks(注定成为经典)

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1 - The problem of very deep neural networks 2 - Building a Residual Network 2.1 - The identity…

13.深度学习练习:Autonomous driving - Car detection(YOLO实战)

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ Welcome to your week 3 programming assignment. You will learn about object detection using the very pow…

14.深度学习练习:Face Recognition for the Happy House

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ Welcome to the first assignment of week 4! Here you will build a face recognition system. Many of the i…

java Integer 源码学习

转载自http://www.hollischuang.com/archives/1058 Integer 类在对象中包装了一个基本类型 int 的值。Integer 类型的对象包含一个 int 类型的字段。 此外,该类提供了多个方法,能在 int 类型和 String 类型之间互相转换,还提供了处理 int 类…

15.深度学习练习:Deep Learning Art: Neural Style Transfer

本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。 课程链接:https://www.deeplearning.ai/deep-learning-specialization/ 目录 1 - Problem Statement 2 - Transfer Learning 3 - Neural Style Transfer 3.1 - Computing the cont…

【2018icpc宁夏邀请赛现场赛】【Gym - 102222F】Moving On(Floyd变形,思维,离线处理)

https://nanti.jisuanke.com/t/41290 题干: Firdaws and Fatinah are living in a country with nn cities, numbered from 11 to nn. Each city has a risk of kidnapping or robbery. Firdawss home locates in the city uu, and Fatinahs home locates in the…

动手学PaddlePaddle(1):线性回归

你将学会: 机器学习的基本概念:假设函数、损失函数、优化算法数据怎么进行归一化处理paddlepaddle深度学习框架的一些基本知识如何用paddlepaddle深度学习框架搭建全连接神经网络参考资料:https://www.paddlepaddle.org.cn/documentation/doc…

Apollo进阶课程㉒丨Apollo规划技术详解——Motion Planning with Autonomous Driving

原文链接:进阶课程㉒丨Apollo规划技术详解——Motion Planning with Autonomous Driving 自动驾驶车辆的规划决策模块负责生成车辆的行驶行为,是体现车辆智慧水平的关键。规划决策模块中的运动规划环节负责生成车辆的局部运动轨迹,是决定车辆…

JVM核心之JVM运行和类加载全过程

为什么研究类加载全过程? 有助于连接JVM运行过程更深入了解java动态性(解热部署,动态加载),提高程序的灵活性类加载机制 JVM把class文件加载到内存,并对数据进行校验、解析和初始化,最终形成J…

【2018icpc宁夏邀请赛现场赛】【Gym - 102222A】Maximum Element In A Stack(动态的栈中查找最大元素)

https://nanti.jisuanke.com/t/41285 题干: As an ACM-ICPC newbie, Aishah is learning data structures in computer science. She has already known that a stack, as a data structure, can serve as a collection of elements with two operations: push, …