昇腾910使用记录

一. 压缩文件和解压文件

1. 压缩文件

tar -czvf UNITE-main.tar.gz ./UNITE-main/

2. 解压文件

tar -xvf ./UNITE-main/

二. CUDA更改为NPU

data['label'] = data['label'].cuda()
data['instance'] = data['instance'].cuda()
data['image'] = data['image'].cuda()

更改为

data['label'] = data['label'].npu()
data['instance'] = data['instance'].npu()
data['image'] = data['image'].npu()

三. 配置环境变量

1. 创建env.sh

touch env.sh

2. 打开env.sh

vi env.sh

3. 配置环境变量

# 配置CANN相关环境变量
CANN_INSTALL_PATH_CONF='/etc/Ascend/ascend_cann_install.info'
if [ -f $CANN_INSTALL_PATH_CONF ]; thenDEFAULT_CANN_INSTALL_PATH=$(cat $CANN_INSTALL_PATH_CONF | grep Install_Path | cut -d "=" -f 2)
elseDEFAULT_CANN_INSTALL_PATH="/usr/local/Ascend/"
fi
CANN_INSTALL_PATH=${1:-${DEFAULT_CANN_INSTALL_PATH}}
if [ -d ${CANN_INSTALL_PATH}/ascend-toolkit/latest ];thensource ${CANN_INSTALL_PATH}/ascend-toolkit/set_env.sh
elsesource ${CANN_INSTALL_PATH}/nnae/set_env.sh
fi
# 导入依赖库
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/openblas/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib64/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/aarch64_64-linux-gnu
# 配置自定义环境变量
export HCCL_WHITELIST_DISABLE=1
# log
export ASCEND_SLOG_PRINT_TO_STDOUT=0 # 日志打屏, 可选
export ASCEND_GLOBAL_LOG_LEVEL=3 # 日志级别常用 1 INFO级别; 3 ERROR级别
export ASCEND_GLOBAL_EVENT_ENABLE=0 # 默认不使能event日志信息

并输入

:wq!

4. 使用环境

source env.sh

四. RuntimeError: ACL stream synchronize failed, error code:507018

E39999: Inner Error, Please contact support engineer!
E39999  Aicpu kernel execute failed, device_id=0, stream_id=0, task_id=6394, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]TraceBack (most recent call last):rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[0] PID[41411]: 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:22:01.273.951, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=3, task id=6394, task type=aicpu kernel, recently received task id=6406, recently send task id=6393, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:41:20.661.958, function=proc_aicpu_task_done, line=972, error code=0x2a
Traceback (most recent call last):File "train.py", line 40, in <module>trainer.run_generator_one_step(data_i)File "/home/ma-user/work/SPADE-master/trainers/pix2pix_trainer.py", line 35, in run_generator_one_stepg_losses, generated = self.pix2pix_model(data, mode='generator')File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forwardreturn self.module(*inputs, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 43, in forwardinput_semantics, real_image = self.preprocess_input(data)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 113, in preprocess_inputdata['label'] = data['label'].npu()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/device_guard.py", line 38, in wrapperreturn func(*args, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/tensor_methods.py", line 66, in _npureturn torch_npu._C.npu(self, *args, **kwargs)
RuntimeError: ACL stream synchronize failed, error code:507018
THPModule_npu_shutdown success.

猜测可能是没有开混合精度

五. 开启混合精度

1. 在构建神经网络前,我们需要导入torch_npu中的AMP模块

import time
import torch
import torch.nn as nn
import torch_npu
from torch_npu.npu import amp    # 导入AMP模块

2. 在模型、优化器定义之后,定义AMP功能中的GradScaler

model = CNN().to(device)
train_dataloader = DataLoader(train_data, batch_size=batch_size)    # 定义DataLoader
loss_func = nn.CrossEntropyLoss().to(device)    # 定义损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)    # 定义优化器
scaler = amp.GradScaler()    # 在模型、优化器定义之后,定义GradScaler

3. 在训练代码中添加AMP功能相关的代码开启AMP

for epo in range(epochs):
for imgs, labels in train_dataloader:
imgs = imgs.to(device)labels = labels.to(device)with amp.autocast():outputs = model(imgs)    # 前向计算loss = loss_func(outputs, labels)    # 损失函数计算optimizer.zero_grad()# 进行反向传播前后的loss缩放、参数更新scaler.scale(loss).backward()    # loss缩放并反向传播scaler.step(optimizer)    # 更新参数(自动unscaling)scaler.update()    # 基于动态Loss Scale更新loss_scaling系数 

六. 未知错误

E39999: Inner Error, Please contact support engineer!
E39999  An exception occurred during AICPU execution, stream_id:78, task_id:742, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:673]TraceBack (most recent call last):Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task.cc][LINE:1068]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]op[Minimum], The Minimum op dtype is not same, type1:DT_FLOAT16, type2:DT_FLOAT[FUNC:CheckTwoInputDtypeSame][FILE:util.cc][LINE:116]Verifying Minimum failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:135]Call InferShapeAndType for node:Minimum(Minimum) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117]process pass InferShapePass on node:Minimum failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:530]build graph failed, graph id:894, ret:1343242270[FUNC:BuildModel][FILE:ge_generator.cc][LINE:1484][Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]DEVICE[0] PID[189368]: 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:12:22.763.259, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=3, task id=4347, task type=aicpu kernel, recently received task id=4354, recently send task id=4346, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:13:57.997.757, function=proc_aicpu_task_done, line=972, error code=0x2a
Aborted (core dumped)
(py38) [ma-user SPADE-master]$Process ForkServerProcess-2:
Traceback (most recent call last):File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrapself.run()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 61, in wrapperraise expFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 58, in wrapperfunc(*args, **kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 268, in task_distributekey, func_name, detail = resource_proxy[TASK_QUEUE].get()File "<string>", line 2, in getFile "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethodkind, result = conn.recv()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 250, in recvbuf = self._recv_bytes()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytesbuf = self._recv(4)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 383, in _recvraise EOFError
EOFError
/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 91 leaked semaphore objects to clean up at shutdownwarnings.warn('resource_tracker: There appear to be %d '

参考链接1
参考链接2:昇腾官网

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/102405.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

小程序中使用echarts的相关配置以及折线图案例(简单易懂)

第一步&#xff1a;引入echarts文件--此文件需要下载&#xff1a; 下载地址&#xff1a;点击此处进行下载echarts文件 点击Download ZIP下载压缩包&#xff0c;注意&#xff1a;e-canvas是我从完整的文件中剥离出来的有用的&#xff0c;不会影响项目。 第二步&#xff1a;把整…

ctfshow萌新计划web9-14(正则匹配绕过)

目录 web9 web10 web11 web12 web13 web14 web9 审一下代码&#xff0c;需要匹配到system|exec|highlight才会执行eval函数 先看一下当前目录下有什么 payload&#xff1a;?csystem(ls); index.php是首页&#xff0c;我们看看config.php payload&#xff1a;?csystem…

nodejs+vue+elementui医院挂号预约管理系统4n9w0

前端技术&#xff1a;nodejsvueelementui 前端&#xff1a;HTML5,CSS3、JavaScript、VUE 1、 node_modules文件夹(有npn install Express 框架于Node运行环境的Web框架, 开发语言 node.js 框架&#xff1a;Express 前端:Vue.js 数据库&#xff1a;mysql 数据库工具&#xff…

邮政编码,格式校验:@ZipCode(自定义注解)

目标 自定义一个用于校验邮政编码格式的注解ZipCode&#xff0c;能够和现有的 Validation 兼容&#xff0c;使用方式和其他校验注解保持一致&#xff08;使用 Valid 注解接口参数&#xff09;。 校验逻辑 有效格式 不能包含空格&#xff1b;应为6位数字&#xff1b; 不校验…

AI「鸟口普查」,康奈尔大学利用深度学习分析北美林莺分布

据世界自然基金会统计&#xff0c;1970-2016 年&#xff0c;全球代表物种种群数量减少了 68%&#xff0c;生物多样性不断下降。 保护生物多样性&#xff0c;需要对当地生态情况进行准确分析&#xff0c;制定合理的生态保护政策。然而&#xff0c;生态数据太过庞杂&#xff0c;统…

瀑布流布局(CSS flex实现)

关键点 使用了 vw 进行自适应缩放html: <div class="container"><div class="queue"><div

小程序中如何设置所服务地区的时区

在全球化的背景下&#xff0c;小程序除了在中国使用外&#xff0c;还为海外的华人地区提供服务。例如我们采云小程序为泰国、阿根廷、缅甸等国家的商家就提供过微信小程序。这些商家开通小程序&#xff0c;为本地的华人提供服务。但通常小程序的开发者/服务商位于中国&#xff…

蓝桥等考Python组别十七级006

第一部分:选择题 1、Python L17 (15分) 运行下面程序,输出的结果是( )。 def func(x, y): return (x - y) // 2 print(func(11, 4)) 2345正确答案:B 2、Python L17 (15

【MySQL】联合查询、子查询、合并查询

这里提供了三个表&#xff1a; 表1&#xff1a; mysql> select * from class; -------------- | id | name | -------------- | 1 | 一班 | | 2 | 二班 | | 3 | 三班 | -------------- 3 rows in set (0.01 sec) 表2&#xff1a; mysql> select * fro…

TCP/IP(十四)流量控制

一 流量控制 说明&#xff1a; 本文只是原理铺垫,没有用tcpdumpwiresahrk鲜活的案例讲解,后续补充 ① 基本概念 流量控制: TCP 通过接受方实际能接收的数据量来控制发送方的窗口大小 ② 正常传输过程 背景:1、客户端是接收方,服务端是发送方 --> 下载2、假设接收窗…

【0227】smgr设计机制之新建一个磁盘表文件

相关阅读: 【0222】存储管理器smgr设计机制,及SMgrRelation、SMgrRelationData的作用(1) 【0223】源码剖析smgr底层设计机制(3) 【0224】smgr设计机制之通过RelFileNode访问磁盘表文件(2) 【0225】源码分析postgres磁盘块(disk block)定义 【0226】smgr设计机制中,s…

OpenCV4(C++)——模板匹配

matchTemplate 模板匹配和卷积运算大致相同&#xff0c;模板图类似于卷积核&#xff0c;从原图的左上角开始进行滑动窗口的操作&#xff0c;最后得到一个特征图&#xff0c;这个特征图里的数值就是每次计算得到的相似度&#xff0c;通用匹配方式&#xff0c;相似值是&#xff…

强制结束subprocess.Popen开启的任务

强制结束subprocess.Popen开启的任务 需要使用到psutil库&#xff0c;可以通过pip安装 import subprocessimport psutildef kill(proc_pid):process psutil.Process(proc_pid)for proc in process.children(recursiveTrue):proc.kill()process.kill()proc subprocess.Popen([…

浅谈高速公路服务区分布式光伏并网发电

前言 今年的国家经济工作会议提出&#xff1a;将“做好碳达峰、碳中和工作”作为 2021年的主要任务之一&#xff0c;而我国高速公路里程 15.5万公里&#xff0c;对能源的需求与日俱增&#xff0c;碳排放量增速明显。 为了实现采用减少碳排放量&#xff0c;采用清洁能源替代的…

【Java 进阶篇】JavaScript 与 HTML 的结合方式

JavaScript是一种广泛应用于Web开发中的脚本语言&#xff0c;它与HTML&#xff08;Hypertext Markup Language&#xff09;结合使用&#xff0c;使开发人员能够创建交互式和动态的网页。在这篇博客中&#xff0c;我们将深入探讨JavaScript与HTML的结合方式&#xff0c;包括如何…

第一个 Python 程序

三、第一个 Python 程序 好了&#xff0c;说了那么多&#xff0c;现在我们可以来写一下第一个 Python 程序了。 一开始写 Python 程序&#xff0c;个人不太建议用专门的工具来写&#xff0c;不方便熟悉语法&#xff0c;所以这里我先用 Sublime Text 来写&#xff0c;后期可以…

RobotFramework自动化测试框架的基础关键字

1.1.1 如何搜索RobotFramework的关键字 有两种方式可以快速的打开RIDE的关键字搜索对话框 1、选择菜单栏Tools->Search Keywords&#xff0c;然后会出现如下的关键字搜索对话框&#xff0c;这个对话框就类似提供了一个关键字的API的功能&#xff0c;提供了关键字的…

如何建立线上线下相结合的数字化新零售体系?

身处今数字化时代&#xff0c;建立线上线下相结合的数字化新零售体系是企业成功的关键。蚓链数字化营销系统致力于帮助企业实现数字化转型&#xff0c;打通线上线下销售渠道&#xff0c;提升品牌影响力和用户黏性&#xff0c;那么具体是如何建立的&#xff1f; 1. 搭建数字化中…

P1886 滑动窗口 /【模板】单调队列

题目&#xff1a; P1091 [NOIP2004 提高组] 合唱队形 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn) n 位同学站成一排&#xff0c;音乐老师要请其中的 &#xfffd;−&#xfffd;n−k 位同学出列&#xff0c;使得剩下的 &#xfffd;k 位同学排成合唱队形。 合唱队形是指这…

创新学习方式,电大搜题助您迈向成功之路

近年来&#xff0c;随着信息技术的发展&#xff0c;互联网在教育领域发挥的作用越来越显著。贵州开放大学作为国内首家电视大学&#xff0c;一直致力于创新教学模式&#xff0c;帮助学生更好地获取知识。在学习过程中&#xff0c;学生常常遇到疑难问题&#xff0c;而解决这些问…