cs231n assignmen3 Extra Credit: Image Captioning with LSTMs

文章目录

  • 嫌墨迹直接看代码
  • Extra Credit: Image Captioning with LSTMs
    • lstm_step_forward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_step_backward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_forward
      • 题面
      • 解析
      • 代码
      • 输出
    • lstm_backward
      • 题面
      • 解析
      • 代码
      • 输出
    • CaptioningRNN.loss
      • 解析
      • 代码
      • 输出
    • 最后输出
    • 结语

嫌墨迹直接看代码

Extra Credit: Image Captioning with LSTMs

lstm_step_forward

题面

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
结合课程和上面的讲解,这部分就是让我们来实现lstm的前向操作,具体的操作流程在上面都写好了

解析

看代码注释吧

代码

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):"""Forward pass for a single timestep of an LSTM.The input data has dimension D, the hidden state has dimension H, and we usea minibatch size of N.Note that a sigmoid() function has already been provided for you in this file.Inputs:- x: Input data, of shape (N, D)- prev_h: Previous hidden state, of shape (N, H)- prev_c: previous cell state, of shape (N, H)- Wx: Input-to-hidden weights, of shape (D, 4H)- Wh: Hidden-to-hidden weights, of shape (H, 4H)- b: Biases, of shape (4H,)Returns a tuple of:- next_h: Next hidden state, of shape (N, H)- next_c: Next cell state, of shape (N, H)- cache: Tuple of values needed for backward pass."""next_h, next_c, cache = None, None, None############################################################################## TODO: Implement the forward pass for a single timestep of an LSTM.        ## You may want to use the numerically stable sigmoid implementation above.  ############################################################################### *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****# 计算aa = x.dot(Wx) + prev_h.dot(Wh) + b# 分割aai, af, ao, ag = np.split(a, 4, axis=1)# 计算i, f, o, gi = sigmoid(ai)f = sigmoid(af)o = sigmoid(ao)g = np.tanh(ag)# 计算next_cnext_c = f * prev_c + i * g# 计算next_hnext_h = o * np.tanh(next_c)cache = (x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h)# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****###############################################################################                               END OF YOUR CODE                             ###############################################################################return next_h, next_c, cache

输出

在这里插入图片描述

lstm_step_backward

题面

在这里插入图片描述
计算lstm的反向操作

解析

sigmoid求导
在这里插入图片描述

Tanh 求导

在这里插入图片描述
反向传播讲解可以看这个

然后结合代码注释,想想链式求导法则就好了

代码

def lstm_step_backward(dnext_h, dnext_c, cache):"""Backward pass for a single timestep of an LSTM.Inputs:- dnext_h: Gradients of next hidden state, of shape (N, H)- dnext_c: Gradients of next cell state, of shape (N, H)- cache: Values from the forward passReturns a tuple of:- dx: Gradient of input data, of shape (N, D)- dprev_h: Gradient of previous hidden state, of shape (N, H)- dprev_c: Gradient of previous cell state, of shape (N, H)- dWx: Gradient of input-to-hidden weights, of shape (D, 4H)- dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)- db: Gradient of biases, of shape (4H,)"""dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None############################################################################## TODO: Implement the backward pass for a single timestep of an LSTM.       ##                                                                           ## HINT: For sigmoid and tanh you can compute local derivatives in terms of  ## the output value from the nonlinearity.                                   ############################################################################### *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****(x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h) = cache# 计算dnext_cdnext_c += dnext_h * o * (1 - np.tanh(next_c) ** 2)# 计算dprev_cdprev_c = dnext_c * f# 计算dadai = dnext_c * g * i * (1 - i)daf = dnext_c * prev_c * f * (1 - f)dao = dnext_h * np.tanh(next_c) * o * (1 - o)dag = dnext_c * i * (1 - g ** 2)# 组合da = np.concatenate((dai, daf, dao, dag), axis=1)# 计算dxdx = da.dot(Wx.T)# 计算dprev_hdprev_h = da.dot(Wh.T)# 计算dWxdWx = x.T.dot(da)# 计算dWhdWh = prev_h.T.dot(da)# 计算dbdb = np.sum(da, axis=0)# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dprev_h, dprev_c, dWx, dWh, db

输出

在这里插入图片描述

lstm_forward

题面

在这里插入图片描述
在这里插入图片描述
让我们实现lstm整个的前向传播

解析

没啥好说的

代码

def lstm_forward(x, h0, Wx, Wh, b):"""Forward pass for an LSTM over an entire sequence of data.We assume an input sequence composed of T vectors, each of dimension D. The LSTM uses a hiddensize of H, and we work over a minibatch containing N sequences. After running the LSTM forward,we return the hidden states for all timesteps.Note that the initial cell state is passed as input, but the initial cell state is set to zero.Also note that the cell state is not returned; it is an internal variable to the LSTM and is notaccessed from outside.Inputs:- x: Input data of shape (N, T, D)- h0: Initial hidden state of shape (N, H)- Wx: Weights for input-to-hidden connections, of shape (D, 4H)- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)- b: Biases of shape (4H,)Returns a tuple of:- h: Hidden states for all timesteps of all sequences, of shape (N, T, H)- cache: Values needed for the backward pass."""h, cache = None, None############################################################################## TODO: Implement the forward pass for an LSTM over an entire timeseries.   ## You should use the lstm_step_forward function that you just defined.      ############################################################################### *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****(N, T, D) = x.shape(N, H) = h0.shape# 初始化c0c = np.zeros((N, H))# 初始化hh = np.zeros((N, T, H))# 初始化cachecache = []prev_h = h0prev_c = c# 遍历每个时间步for t in range(T):# 计算h和cnext_h, next_c, cache_t = lstm_step_forward(x[:, t, :], prev_h, prev_c, Wx, Wh, b)# 更新prev_h和prev_cprev_h = next_hprev_c = next_c# 添加hh[:, t, :] = next_h# 添加cachecache.append(cache_t)# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****###############################################################################                               END OF YOUR CODE                             ###############################################################################return h, cache

输出

在这里插入图片描述

lstm_backward

题面

在这里插入图片描述

解析

理解了上面的代码的话,这个写起来应该没压力

代码

def lstm_backward(dh, cache):"""Backward pass for an LSTM over an entire sequence of data.Inputs:- dh: Upstream gradients of hidden states, of shape (N, T, H)- cache: Values from the forward passReturns a tuple of:- dx: Gradient of input data of shape (N, T, D)- dh0: Gradient of initial hidden state of shape (N, H)- dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)- dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)- db: Gradient of biases, of shape (4H,)"""dx, dh0, dWx, dWh, db = None, None, None, None, None############################################################################## TODO: Implement the backward pass for an LSTM over an entire timeseries.  ## You should use the lstm_step_backward function that you just defined.     ############################################################################### *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****(x, prev_h, prev_c, Wx, Wh, b, a, i, f, o, g, next_c, next_h) = cache[0](N, T, H) = dh.shape(N, D) = x.shape# 初始化梯度dx = np.zeros((N, T, D))dnext_c = np.zeros((N, H))dnext_h = np.zeros((N, H))dWx = np.zeros((D, 4 * H))dWh = np.zeros((H, 4 * H))db = np.zeros((4 * H))# 反向传播for t in reversed(range(T)):# 计算梯度dnext_h += dh[:, t, :]dx[:, t, :], dnext_h, dnext_c, dWx_t, dWh_t, db_t = lstm_step_backward(dnext_h, dnext_c, cache[t])# 更新梯度dWx += dWx_tdWh += dWh_tdb += db_t# 计算dh0dh0 = dnext_h# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dh0, dWx, dWh, db

输出

在这里插入图片描述

CaptioningRNN.loss

解析

这个因为之前我就写好了,所以我直接吧代码贴上来了,其实之前写过普通RNN的话,不难理解这里的操作

代码

    def loss(self, features, captions):"""Compute training-time loss for the RNN. We input image features andground-truth captions for those images, and use an RNN (or LSTM) to computeloss and gradients on all parameters.Inputs:- features: Input image features, of shape (N, D)- captions: Ground-truth captions; an integer array of shape (N, T + 1) whereeach element is in the range 0 <= y[i, t] < VReturns a tuple of:- loss: Scalar loss- grads: Dictionary of gradients parallel to self.params"""# Cut captions into two pieces: captions_in has everything but the last word# and will be input to the RNN; captions_out has everything but the first# word and this is what we will expect the RNN to generate. These are offset# by one relative to each other because the RNN should produce word (t+1)# after receiving word t. The first element of captions_in will be the START# token, and the first element of captions_out will be the first word.captions_in = captions[:, :-1]captions_out = captions[:, 1:]# You'll need thismask = captions_out != self._null# Weight and bias for the affine transform from image features to initial# hidden stateW_proj, b_proj = self.params["W_proj"], self.params["b_proj"]# Word embedding matrixW_embed = self.params["W_embed"]# Input-to-hidden, hidden-to-hidden, and biases for the RNNWx, Wh, b = self.params["Wx"], self.params["Wh"], self.params["b"]# Weight and bias for the hidden-to-vocab transformation.W_vocab, b_vocab = self.params["W_vocab"], self.params["b_vocab"]loss, grads = 0.0, {}############################################################################# TODO: Implement the forward and backward passes for the CaptioningRNN.   ## In the forward pass you will need to do the following:                   ## (1) Use an affine transformation to compute the initial hidden state     ##     from the image features. This should produce an array of shape (N, H)## (2) Use a word embedding layer to transform the words in captions_in     ##     from indices to vectors, giving an array of shape (N, T, W).         ## (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to    ##     process the sequence of input word vectors and produce hidden state  ##     vectors for all timesteps, producing an array of shape (N, T, H).    ## (4) Use a (temporal) affine transformation to compute scores over the    ##     vocabulary at every timestep using the hidden states, giving an      ##     array of shape (N, T, V).                                            ## (5) Use (temporal) softmax to compute loss using captions_out, ignoring  ##     the points where the output word is <NULL> using the mask above.     ##                                                                          ##                                                                          ## Do not worry about regularizing the weights or their gradients!          ##                                                                          ## In the backward pass you will need to compute the gradient of the loss   ## with respect to all model parameters. Use the loss and grads variables   ## defined above to store loss and gradients; grads[k] should give the      ## gradients for self.params[k].                                            ##                                                                          ## Note also that you are allowed to make use of functions from layers.py   ## in your implementation, if needed.                                       ############################################################################## *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****# 第一步,使用全连接层,将图像特征转换为隐藏层的初始状态h0, cache_h0 = affine_forward(features, W_proj, b_proj)# 第二步,使用词嵌入层,将输入的单词转换为词向量word_vector, cache_word_vector = word_embedding_forward(captions_in, W_embed)# 第三步,使用RNN或者LSTM,将词向量序列转换为隐藏层状态序列if self.cell_type == "rnn":h, cache_h = rnn_forward(word_vector, h0, Wx, Wh, b)elif self.cell_type == "lstm":h, cache_h = lstm_forward(word_vector, h0, Wx, Wh, b)# 第四步,使用全连接层,将隐藏层状态序列转换为词汇表上的得分序列scores, cache_scores = temporal_affine_forward(h, W_vocab, b_vocab)# 第五步,使用softmax,计算损失loss, dscores = temporal_softmax_loss(scores, captions_out, mask)# 反向传播# 第四步,全连接层的反向传播dh, dW_vocab, db_vocab = temporal_affine_backward(dscores, cache_scores)# 第三步,RNN或者LSTM的反向传播if self.cell_type == "rnn":dword_vector, dh0, dWx, dWh, db = rnn_backward(dh, cache_h)elif self.cell_type == "lstm":dword_vector, dh0, dWx, dWh, db = lstm_backward(dh, cache_h)# 第二步,词嵌入层的反向传播dW_embed = word_embedding_backward(dword_vector, cache_word_vector)# 第一步,全连接层的反向传播dfeatures, dW_proj, db_proj = affine_backward(dh0, cache_h0)# 将梯度保存到grads中grads["W_proj"] = dW_projgrads["b_proj"] = db_projgrads["W_embed"] = dW_embedgrads["Wx"] = dWxgrads["Wh"] = dWhgrads["b"] = dbgrads["W_vocab"] = dW_vocabgrads["b_vocab"] = db_vocab# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****#############################################################################                             END OF YOUR CODE                             #############################################################################return loss, grads

输出

在这里插入图片描述

最后输出

在这里插入图片描述
在这里插入图片描述

结语

通过整个对cs231n的学习,让我们对整个深度学习有了个基础的认识,但是总体来说还是比较入门的讲解,对于深度学习的学习,还需要不断地钻研,这几个实验都挺好玩的,目前对于RNN虽然有了初步的印象,但是仍有一些地方比较模糊,还没有完全吃透。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/62116.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Dart】学习使用(二):基本类型

前言 基本类型是语言的基础。 Dart 语言支持以下基础类型&#xff1a;Numbers(int、double)&#xff0c; 整形Strings(String), 字符串Booleans(bool) , 布尔型Records((value1,value2)) 记录Lists(List ) 数组Sets(Set) 集合Maps(Map) 映射Runes(Runes,通常由 characters AP…

【STM32】学习笔记(串口通信)-江科大

串口通信 通信接口硬件电路电平标准USARTUSART框图 通信接口 串口是一种应用十分广泛的通讯接口&#xff0c;串口成本低、容易使用、通信线路简单&#xff0c;可实现两个设备的互相通信 单片机的串口可以使单片机与单片机、单片机与电脑、单片机与各式各样的模块互相通信&#…

2023京东酒类市场数据分析(京东数据开放平台)

根据鲸参谋平台的数据统计&#xff0c;今年7月份京东平台酒类环比集体下滑&#xff0c;接下来我们一起来看白酒、啤酒、葡萄酒的详情数据。 首先来看白酒市场。 鲸参谋数据显示&#xff0c;7月份京东平台白酒的销量为210万&#xff0c;环比下滑约49%&#xff1b;销售额将近19…

mfc140u.dll丢失如何修复?解析mfc140u.dll是什么文件跟修复方法分享

大家好&#xff01;今天&#xff0c;我将和大家分享一下关于计算机中mfc140u.dll丢失的6种解决方法。希望我的分享能对大家在计算机使用过程中遇到问题时提供一些帮助。 首先&#xff0c;我想请大家了解一下什么是mfc140u.dll文件。mfc140u.dll是一个动态链接库文件&#xff0…

Unity碰撞检测(3D和2D)

Unity碰撞检测3D和2D 前言准备材料3D2D 代码3D使用OnCollisionEnter()进行碰撞Collider状态代码 使用OnTriggerEnter()进行碰撞Collider状态代码 2D使用OnCollisionEnter2D()进行碰撞Collider2D状态代码 使用OnTriggerEnter2D()进行碰撞Collider2D状态代码 区别3D代码OnCollisi…

对于uts namespace共享的测试

前言 单单以下列命令运行虽然是root&#xff0c;还不行&#xff0c;我们需要加--privileged&#xff0c;不然会报 hostname: you must be root to change the host name docker run -it --utshost ubuntu:latest /bin/bash 如果加上--privileged后 docker run -it --priv…

Watermark 是怎么生成和传递的?

分析&回答 Watermark 介绍 Watermark 本质是时间戳&#xff0c;与业务数据一样无差别地传递下去&#xff0c;目的是衡量事件时间的进度&#xff08;通知 Flink 触发事件时间相关的操作&#xff0c;例如窗口&#xff09;。 Watermark 是一个时间戳, 它表示小于该时间戳的…

nodepad++ 插件的安装

nodepad 插件的安装 一、插件安装二、安装插件&#xff1a;Json Viewer nodepad 有 插件管理功能&#xff0c;其中有格式化json以及可以将json作为树查看的插件&#xff1a; Json Viewer 一、插件安装 1、首先下载最新的notepad 64位【https://notepad-plus.en.softonic.com…

frida动态调试入门01——定位关键代码

说明 frida是一款Python工具可以方便对内存进行hook修改代码逻辑在移动端安全和逆向过程中常用到。 实战 嘟嘟牛登录页面hook 使用到的工具 1&#xff0c;jadx-gui 2&#xff0c;frida 定位关键代码 使用jadx-gui 进行模糊搜索&#xff0c;例如搜索encyrpt之类的加密关键…

语言基础篇1——Python概述,Python是什么?Python能干什么?

概述 简介 Python&#xff0c;计算机高级语言&#xff0c;读作/ˈpaɪθən/&#xff08;英音&#xff09;、/ˈpaɪθɑːn/&#xff08;美音&#xff09;&#xff0c;意为蟒蛇&#xff0c;Python的logo为两条缠绕的蟒蛇 特点 Python以开发效率高而运行效率低著称 应用领域…

CASAIM与北京协和医院达成合作,通过CT重建和3D打印技术为医学实验提供技术辅助和研究样本

近期&#xff0c;CASAIM与北京协和医院达成合作&#xff0c;通过CT重建和3D打印技术为医学实验提供技术辅助和研究样本&#xff0c;在实验样本的一致性和实验研究的严谨性原则下设计方案&#xff0c;推动产学研一体化发展。 北京协和医院是集医疗、教学、科研于一体的现代化综合…

WPF工控机textbox获得焦点自动打开软键盘

1.通过nuget安装 osklib.wpf 2.在textbox getFoucs中敲入如下代码即可实现获得焦点弹出软键盘 private void txtPLC_IP_GotFocus(object sender, RoutedEventArgs e){try{// Osklib.OnScreenKeyboard.Close();Osklib.OnScreenKeyboard.Show();}catch (Exception ex){MessageB…

奥维转债上市价格预测

奥维转债 基本信息 转债名称&#xff1a;奥维转债&#xff0c;评级&#xff1a;AA-&#xff0c;发行规模&#xff1a;11.4亿元。 正股名称&#xff1a;奥特维&#xff0c;今日收盘价&#xff1a;168.31元&#xff0c;转股价格&#xff1a;180.9元。 当前转股价值 转债面值 / 转…

css中文本阴影特效

文字颜色渐变 .text-clip{color:transparent;font-size: 40px;font-weight: bold;background: linear-gradient(45deg, rgba(0,173,181,1) 0%, rgba(0,173,181,.4) 100%);-webkit-background-clip: text; } 文字模糊 .text-blurry{text-align: center;color: transparent;text-…

国产操作系统开放麒麟安装

国产操作系统 开放麒麟 银河麒麟 中科方德 统信UOS 红旗Linux 深度系统 优麒麟系统 开放麒麟操作系统 “开放麒麟1.0”是通过开放操作系统源代码的方式、由众多开发者共同参与研发的国产开源操作系统&#xff0c;系统的发布将有助于推动面向全场景的国产操作系统迭代更新&…

【LeetCode】剑指 Offer <二刷>(1)

目录 前言&#xff1a; 题目&#xff1a;剑指 Offer 03. 数组中重复的数字 - 力扣&#xff08;LeetCode&#xff09; 题目的接口&#xff1a; 解题思路&#xff1a; 代码&#xff1a; 过啦&#xff01;&#xff01;&#xff01; 写在最后&#xff1a; 前言&#xff1a; …

【GAN】pix2pix算法的数据集制作

一、A、B合并代码&#xff08;此代码由官方提供&#xff09; import os import numpy as np import cv2 import argparseparser argparse.ArgumentParser(create image pairs) parser.add_argument(--fold_A, destfold_A, helpinput directory for image A, typestr, default…

Android-关于页面卡顿的排查工具与监测方案

作者&#xff1a;一碗清汤面 前言 关于卡顿这件事已经是老生常谈了&#xff0c;卡顿对于用户来说是敏感的&#xff0c;容易被用户直接感受到的。那么究其原因&#xff0c;卡顿该如何定义&#xff0c;对于卡顿的发生该如何排查问题&#xff0c;当线上用户卡顿时&#xff0c;在线…

【taro react】(游戏) ---- 贪吃蛇

1. 预览 2. 实现思路 实现食物类&#xff0c;食物坐标和刷新食物的位置&#xff0c;以及获取食物的坐标点&#xff1b;实现计分面板类&#xff0c;实现吃食物每次的计分以及积累一定程度的等级&#xff0c;实现等级和分数的增加&#xff1b;实现蛇类&#xff0c;蛇类分为蛇头和…

OB Cloud上新,4.1版本现已全面开放

2022 年 8 月 10 日&#xff0c;OceanBase 宣布 OceanBase 公有云服务全球开服&#xff0c;帮助不同规模客户&#xff0c;在全球不同区域&#xff0c;享受同样优质的企业级数据库产品与服务。 经过近一年的发展&#xff0c;公有云业务取得了长足的发展&#xff0c;去年对客收入…