2G大小的GPU对深度学习的加速效果如何?

训练数据情况

总共42776张224*224*3张图片
Found 42776 files belonging to 9 classes.
Using 12833 files for training.

模型参数情况

Total params: 10,917,385
Trainable params: 10,860,745
Non-trainable params: 56,640

batch-size:12

GPU信息

    NVIDIA GeForce GT 730

    驱动程序版本:    27.21.14.6133
    驱动程序日期:    2021/1/19
    DirectX 版本:    12 (FL 11.0)
    物理位置:    PCI 总线 1、设备 0、功能 0

    利用率    11%
    专用 GPU 内存    0.3/2.0 GB
    共享 GPU 内存    0.0/31.9 GB
    GPU 内存    0.4/33.9 GB

训练情况分析

完全使用CPU进行训练的时候,每次训练大约需要2750s。

Epoch 1/65
2496/2496 [==============================] - 2937s 1s/step - loss: 0.4254 - accuracy: 0.8403 - val_loss: 0.3192 - val_accuracy: 0.8867
Epoch 2/65
2496/2496 [==============================] - 2756s 1s/step - loss: 0.2890 - accuracy: 0.8973 - val_loss: 0.4358 - val_accuracy: 0.8520
Epoch 3/65
2496/2496 [==============================] - 2737s 1s/step - loss: 0.2464 - accuracy: 0.9102 - val_loss: 0.2689 - val_accuracy: 0.9020

使用GPU加速进行训练的时候,每次训练的时间从2750s缩短到2100s左右,每次训练大约节省了650秒,效果也是比较明显的。

Epoch 1/65
2023-10-04 10:38:26.686146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-04 10:38:27.343524: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8101
2023-10-04 10:38:28.439803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-04 10:38:29.088670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-04 10:38:31.502277: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 606.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:31.805129: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.084683: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 599.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.129001: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 620.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.738828: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 620.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.801711: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 592.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.034554: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 592.19MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.056645: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 599.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.099135: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.124441: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
1070/1070 [==============================] - 2120s 2s/step - loss: 0.5235 - accuracy: 0.8034 - val_loss: 0.5122 - val_accuracy: 0.8171
Epoch 2/65
1070/1070 [==============================] - 2060s 2s/step - loss: 0.3620 - accuracy: 0.8668 - val_loss: 2.9616 - val_accuracy: 0.4629
Epoch 3/65124/1070 [==>...........................] - ETA: 18:02 - loss: 0.3194 - accuracy: 0.8844
Process finished with exit code -1

使用更小的数据集效果分析

数据集

10249
Found 10249 files belonging to 16 classes.
Using 3075 files for training.

参数

Total params: 6,143,760
Trainable params: 6,113,168
Non-trainable params: 30,592

只使用CPU

Epoch 1/65
684/684 [==============================] - 758s 1s/step - loss: 1.1408 - accuracy: 0.5963 - val_loss: 3.0769 - val_accuracy: 0.2738
Epoch 2/65
684/684 [==============================] - 744s 1s/step - loss: 0.7745 - accuracy: 0.7173 - val_loss: 1.0438 - val_accuracy: 0.6369
Epoch 3/65
684/684 [==============================] - 769s 1s/step - loss: 0.6504 - accuracy: 0.7602 - val_loss: 0.8624 - val_accuracy: 0.6964

使用GPU

小数据集速度节省了接近50%

Epoch 1/65
2023-10-04 16:58:19.928226: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-04 16:58:20.236817: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8101
2023-10-04 16:58:20.791072: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-04 16:58:21.096985: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-04 16:58:23.704576: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 16:58:24.962633: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 16:58:24.987354: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
256/257 [============================>.] - ETA: 0s - loss: 1.3775 - accuracy: 0.51432023-10-04 17:01:47.489983: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:01:48.530265: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:01:48.550469: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
257/257 [==============================] - ETA: 0s - loss: 1.3776 - accuracy: 0.51452023-10-04 17:04:21.899587: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 626.56MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:04:23.391704: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:04:23.801583: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 615.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
257/257 [==============================] - 368s 1s/step - loss: 1.3776 - accuracy: 0.5145 - val_loss: 5.9301 - val_accuracy: 0.2432
Epoch 2/65
257/257 [==============================] - 376s 1s/step - loss: 1.0042 - accuracy: 0.6237 - val_loss: 1.0432 - val_accuracy: 0.6183
Epoch 3/65


 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/97355.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Android之App跳转其他软件

文章目录 前言一、效果图二、实现步骤1.弹框xml(自己替换图标)2.弹框utils3.两个弹框动画4.封装方便调用5.调用6.长按事件方法7.跳转步骤8.复制utils 总结 前言 最近遇到一个需求,就是App内大面积需要长按复制并跳转指定App,没办法,只能埋头…

大语言模型之十六-基于LongLoRA的长文本上下文微调Llama-2

增加LLM上下文长度可以提升大语言模型在一些任务上的表现,这包括多轮长对话、长文本摘要、视觉-语言Transformer模型的高分辨4k模型的理解力以及代码生成、图像以及音频生成等。 对长上下文场景,在解码阶段,缓存先前token的Key和Value&#…

企业完善质量、环境、健康安全三体系认证的作用及其意义!

一、ISO三体系标准作用 ISO9001:质量管理体系,专门针对企业的质量管理,投标首选,很多大客户要求企业必备这项。 ISO14001:环境管理体系,针对企业的生产环境,排污,节能环保&#xf…

SpringCloud Alibaba - Sentinel 高级玩法,修改 Sentinel-dashboard 源码,实现 push 模式

目录 一、规则持久化 1.1、什么是规则持久化 1.1.1、使用背景 1.1.2、规则管理的三种模式 a)原始模式 b)pull 模式 c)push 模式 1.2、实现 push 模式 1.2.1、修改 order-service 服务,使其监听 Nacos 配置中心 1.2.2、修…

RocketMQ 5.0 新版版本新特性总结

1 架构变化 RocketMQ 5.0 架构上的变化主要是为了更好的走向云原生 RocketMQ 4.x 架构如下: Broker 向 Name Server 注册 Topic 路由信息,Producer 和 Consumer 则从 Name Server 获取路由信息,然后 Producer 根据路由信息向 Broker 发送消…

2023年中国石化行业节能减排发展措施分析:用精细化生产提高生产效率,降低能耗[图]

2022年,我国石油和化工行业克服诸多挑战取得了极其不易的经营业绩,行业生产基本稳定,营业收入和进出口总额增长较快,效益比上年略有下降但总额仍处高位。2022年,我国石油化工行业市场规模为191761.2亿元,同…

macbook电脑磁盘满了怎么删东西?

macbook是苹果公司的一款高性能笔记本电脑,受到很多用户的喜爱。但是,如果macbook的磁盘空间不足,可能会导致一些问题,比如无法开机、运行缓慢、应用崩溃等。那么,macbook磁盘满了无法开机怎么办,macbook磁…

Qt实现 图片处理器PictureEdit

目录 图片处理器PictureEdit1 创建工具栏2 打开图片3 显示图片4 灰度处理5 颜色反转6 马赛克 图片处理器PictureEdit 创建工程&#xff0c;添加资源文件 1 创建工具栏 widget.h中 #include <QWidget> #include<QPixmap> #include<QFileDialog> #include&l…

【赠书活动】如何让AI在企业多快好省的落地

&#x1f449;博__主&#x1f448;&#xff1a;米码收割机 &#x1f449;技__能&#x1f448;&#xff1a;C/Python语言 &#x1f449;公众号&#x1f448;&#xff1a;测试开发自动化【获取源码商业合作】 &#x1f449;荣__誉&#x1f448;&#xff1a;阿里云博客专家博主、5…

【LeetCode: 2034. 股票价格波动 | 有序表】

&#x1f680; 算法题 &#x1f680; &#x1f332; 算法刷题专栏 | 面试必备算法 | 面试高频算法 &#x1f340; &#x1f332; 越难的东西,越要努力坚持&#xff0c;因为它具有很高的价值&#xff0c;算法就是这样✨ &#x1f332; 作者简介&#xff1a;硕风和炜&#xff0c;…

[极客大挑战 2019]BabySQL 1

#做题方法# 进去之后做了简单的注入发现有错误回显&#xff0c;就进行注入发现过滤了sql语 后面进行了双写and payload&#xff1a; ?usernameadmin%27%20aandnd%20updatexml(1,concat(0x7e,dAtabase(),0x7e,version()),1)%20--&passwordadmi 接下来又 ?usernameadm…

yolov5及yolov7实战之剪枝

之前有讲过一次yolov5的剪枝&#xff1a;yolov5实战之模型剪枝_yolov5模型剪枝-CSDN博客 当时基于的是比较老的yolov5版本&#xff0c;剪枝对整个训练代码的改动也比较多。最近发现一个比较好用的剪枝库&#xff0c;可以在不怎么改动原有训练代码的情况下&#xff0c;实现剪枝的…

李沐深度学习记录5:13.Dropout

Dropout从零开始实现 import torch from torch import nn from d2l import torch as d2l# 定义Dropout函数 def dropout_layer(X, dropout):assert 0 < dropout < 1# 在本情况中&#xff0c;所有元素都被丢弃if dropout 1:return torch.zeros_like(X)# 在本情况中&…

超自动化加速落地,助力运营效率和用户体验显著提升|爱分析报告

RPA、iPaaS、AI、低代码、BPM、流程挖掘等在帮助企业实现自动化的同时&#xff0c;也在构建一座座“自动化烟囱”。自动化工具尚未融为一体&#xff0c;协同价值没有得到释放。Gartner于2019年提出超自动化&#xff08;Hyperautomation&#xff09;概念&#xff0c;主要从技术组…

【动手学深度学习】课程笔记 04 数据操作和数据预处理

目录 数据操作 N维数组样例 访问元素 数据操作实现 入门 运算符 广播机制 节省内存 转换为其他Python对象 数据预处理实现 数据操作 N维数组是机器学习和神经网路的主要数据结构。 N维数组样例 访问元素 数据操作实现 下面介绍一下本课程中需要用到的PyTorch相关操…

BJT晶体管

BJT晶体管也叫双极结型三极管&#xff0c;主要有PNP、NPN型两种&#xff0c;符号如下&#xff1a; 中间的是基极&#xff08;最薄&#xff0c;用于控制&#xff09;&#xff0c;带箭头的是发射极&#xff08;自由电子浓度高&#xff09;&#xff0c;剩下的就是集电极&#xff0…

no Go files in ...问题

golang项目&#xff0c;当我们微服务分模块开发时&#xff0c;习惯把main.go放在cmd目录下分模块放置&#xff0c;此时&#xff0c;我们在项目根目录下执行go test . 或go build . 时会报错“no Go files in ...”, 这是因为在.目录下找不到go程序&#xff0c;或者找不到程序入…

python之subprocess模块详解

介绍 subprocess是Python 2.4中新增的一个模块&#xff0c;它允许你生成新的进程&#xff0c;连接到它们的 input/output/error 管道&#xff0c;并获取它们的返回&#xff08;状态&#xff09;码。 这个模块的目的在于替换几个旧的模块和方法。 那么我们到底该用哪个模块、哪个…

SQL_ERROR_INFO: “Duplicate entry ‘9003‘ for key ‘examination_info.exam_id‘“

今天刷题的时候&#xff0c;往数据库中插入一条语句&#xff0c;但是这个语句已经存在于数据库中了&#xff0c;所以不能用insert into 语句来插入&#xff0c;应该使用replace into 来插入。 REPLACE INTO examination_info(exam_id,tag,difficulty,duration,release_time) V…

springboot整合pi支付开发

pi支付流程图&#xff1a; 使用Pi SDK功能发起支付由 Pi SDK 自动调用的回调函数&#xff08;让您的应用服务器知道它需要发出批准 API 请求&#xff09;从您的应用程序服务器到 Pi 服务器的 API 请求以批准付款&#xff08;让 Pi 服务器知道您知道此付款&#xff09;Pi浏览器向…