昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

目录

  • 前情提要
    • torch_npu框架
    • mindsport框架
    • mindnlp框架
  • 下载模型
    • 国外
    • 国内
  • 环境设置
  • 代码适配(非流式)
    • Main
    • Branch
    • 结果展示
  • 代码适配(流式)

前情提要

torch_npu框架

官方未适配
在这里插入图片描述

mindsport框架

官方未适配
在这里插入图片描述

mindnlp框架

官方适配了,但是速度非常非常慢,10秒一个字
在这里插入图片描述

下载模型

国外

Hugging FaceHugging Face

国内

在这里插入图片描述modelscope

环境设置

pip install transformers==4.39.2
pip3 install torch==2.1.0
pip3 install torch-npu==2.1.0.post4
pip3 install accelerate==0.24.1
pip3 install transformers-stream-generator==0.0.5

代码适配(非流式)

Main

import torch
import torch_npu
import os
import platform
torch_device = "npu:1" # 0~7
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
from transformers import AutoModelForCausalLM, AutoTokenizer
# device = "cuda" # the device to load the model onto
DEFAULT_CKPT_PATH = '/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct'
model = AutoModelForCausalLM.from_pretrained(DEFAULT_CKPT_PATH,torch_dtype=torch.float16,device_map=torch_device
).npu().eval()
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_CKPT_PATH)
while True:prompt = input("user:")if prompt == "exit":breakmessages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)model_inputs = tokenizer([text], return_tensors="pt").to(torch_device)generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]print("Qwen2-7B-Instruct:",response)

Branch

找到自己虚拟环境

which python

我的是/root/anaconda3/envs/sakura/bin/python
找到/lib/python3.9/site-packages/transformers/generation/utils.py示例:

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py

找到第2708行,注释掉2708行~2712行
在2709行添加

next_token_scores = outputs.logits[:, -1, :]

示例:
在这里插入图片描述
出错就是在这里,如果进行了pre-process distribution,就会报错

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/logits_process.py:455: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
Traceback (most recent call last):File "/root/Qwen_test.py", line 63, in <module>generated_ids = model.generate(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1576, in generateresult = self._sample(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 2736, in _samplenext_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: Sync:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158 NPU error, error code is 507018
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
E39999: Inner Error!
E39999: 2024-07-02-14:14:50.735.070  An exception occurred during AICPU execution, stream_id:23, task_id:2750, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730]TraceBack (most recent call last):rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[1] PID[864803]:
EXCEPTION TASK:Exception info:TGID=864803, model id=65535, stream id=23, stream phase=SCHEDULE, task id=2750, task type=aicpu kernel, recently received task id=2750, recently send task id=2749, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2024-07-02-14:14:50.091.974, function=proc_aicpu_task_done, line=970, error code=0x2a
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999: 2024-07-02-14:14:50.743.702  Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776]TraceBack (most recent call last):Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579]Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512]rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161](function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.745.695  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.747.300  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.814.377  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.816.023  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.817.628  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.819.236  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.820.843  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.822.422  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)

结果展示

最后运行Main文件
在这里插入图片描述

代码适配(流式)

未完待续

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/41457.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

力扣139 单词拆分 Java版本

文章目录 题目描述代码 题目描述 给你一个字符串 s 和一个字符串列表 wordDict 作为字典。如果可以利用字典中出现的一个或多个单词拼接出 s 则返回 true。 注意&#xff1a;不要求字典中出现的单词全部都使用&#xff0c;并且字典中的单词可以重复使用。 示例 1&#xff1a…

HTTP与HTTPS的主要区别

HTTP&#xff08;超文本传输协议&#xff09;与HTTPS&#xff08;超文本传输安全协议&#xff09;的主要区别在于安全性、数据传输方式、默认使用的端口以及对网站的影响。 一、安全性&#xff1a; HTTP是一种无加密的协议&#xff0c;数据在传输过程中以明文形式发送&#x…

堆和栈的区别及应用场景

堆和栈的区别及应用场景 大家好&#xff0c;我是免费搭建查券返利机器人省钱赚佣金就用微赚淘客系统3.0的小编&#xff0c;也是冬天不穿秋裤&#xff0c;天冷也要风度的程序猿&#xff01; 在计算机科学和编程领域&#xff0c;堆&#xff08;Heap&#xff09;和栈&#xff08…

InfluxDB时序数据库基本使用介绍

1、概要介绍 1.1、时序数据库使用场景 所谓时序数据库就是按照一定规则的时间序列进行数据读写操作的数据库。它们常被用于以下业务场景&#xff1a; 物联网IOT场景&#xff1a;可用于IOT设备的指标、状态监控数据存取。IT建设场景&#xff1a;可用于服务器、虚拟机、容器的…

等保测评需要什么SSL证书

在进行信息安全等级保护&#xff08;简称“等保”&#xff09;测评时&#xff0c;选择合适的HTTPS证书对于确保网站的安全性和合规性至关重要。以下是在等保测评中选择HTTPS证书时应考虑的因素&#xff1a; 国产证书&#xff1a; 等保测评倾向于使用国产品牌的SSL证书&#x…

代码随想录Day76(图论Part11)

97.小明逛公园&#xff08;Floyd&#xff09; 题目&#xff1a;97. 小明逛公园 (kamacoder.com) 思路&#xff1a; 答案 import java.util.*;public class Main {public static void main(String[] args) {Scanner scanner new Scanner(System.in);int n scanner.nextInt();…

传统sql查询痛点及衍生的项目设计思路(设计一款可视化查询工具)

背景 最近三年&#xff0c;工作中很大一部分时间是写sql报表。业务很复杂&#xff0c;几十个表。常用的就有十几个。经常表连接达七八个。写得多了&#xff0c;我也很熟练了&#xff0c;但就是足够熟练了&#xff0c;我就越感觉有以下问题困扰我&#xff1a; 无法串联excel表…

上网行为管理系统是什么?有哪些好用的上网行为管理系统?

IT经理&#xff08;ITM&#xff09;: 大家好&#xff0c;今天我们聚在这里&#xff0c;是为了讨论一个对我们公司来说越来越重要的议题&#xff1a;上网行为管理系统&#xff08;WBS&#xff09;。我们知道&#xff0c;员工的网络使用已经不仅仅是个人行为&#xff0c;它直接影…

序列化Serializable

一、传输对象的方式 将对象从内存传输到磁盘进行保存&#xff0c;或者进行网络传输&#xff0c;有两种方式&#xff1a; 实现Serializable接口&#xff0c;直接传输对象转成json字符串后&#xff0c;进行字符串传输 二、直接传输对象 implements Serializable Data Equal…

Java中的设计模式在实际项目中的应用

Java中的设计模式在实际项目中的应用 大家好&#xff0c;我是免费搭建查券返利机器人省钱赚佣金就用微赚淘客系统3.0的小编&#xff0c;也是冬天不穿秋裤&#xff0c;天冷也要风度的程序猿&#xff01; 设计模式是解决软件设计中常见问题的经验总结&#xff0c;它们提供了一种…

resp 无法连接 redis 服务器

问题原因可能是&#xff1a;防火墙 防火墙对这个端口没有开放&#xff0c;所以主机访问不到 解决方法&#xff1a; 步骤1&#xff1a;开发指定端口号 #放通6379/tcp端口 firewall-cmd --zonepublic --permanent --add-port6379/tcp 步骤2&#xff1a;重启防火墙 firewall-c…

.netcore微服务——项目搭建

在.NET Core中&#xff0c;微服务是一种架构风格&#xff0c;它将应用程序构造为一组小型服务的集合&#xff0c;这些服务都通过HTTP-based API进行通信。每个服务都是独立部署的&#xff0c;可以用不同的编程语言编写&#xff0c;并且可以使用不同的数据存储技术。 微服务的主…

什么是网络抓取|常见用例和问题

你可能听说过数据被称为现代信息社会的新石油。由于线上信息量庞大&#xff0c;能够有效地收集和分析网页数据已经成为企业、研究人员和开发人员的关键技能。这就是网页抓取技术的用武之地。网页抓取&#xff0c;也称为网页数据提取&#xff0c;是一种强大的技术&#xff0c;能…

el-dropdown的command方法添加自定义参数

代码 <div v-for"(item, index) in queryParams.changeParams" :key"index"><el-form-item prop"dataConditionSearch"><el-dropdown command"handleCommand" style"margin-right: 3px;"><span class…

【python基础】—pip与conda的区别

文章目录 一、 pip 与 conda1、支持语言2、Repo源3、包的内容4、环境隔离5、依赖关系6、总结 二、pip install 与 conda install1、库的存储位置2、总结 三、pip uninstall 与 conda uninstall 一、 pip 与 conda 1、支持语言 pip 是 官方推荐的 python 包管理器&#xff0c;…

IDEA 2018提交Git之后撤销commit

1、选择项目——>右击git——>找到Repostiory——>执行rest head 2、编辑reset head 3、回退到上一个版本&#xff08;HEAD~1&#xff09;&#xff0c;点击reset即可&#xff0c;如果还想继续回滚&#xff0c;再次执行即可

Linux平台x86_64|aarch64架构如何实现轻量级RTSP服务

技术背景 我们在做Linux平台x86_64架构或aarch64架构的推送模块的时候&#xff0c;有公司提出这样的技术需求&#xff0c;希望在Linux平台&#xff0c;实现轻量级RTSP服务&#xff0c;实现对摄像头或屏幕对外RTSP拉流&#xff0c;同步到大屏上去。 技术实现 废话不多说&…

硬链接和软链接

在Linux系统中&#xff0c;链接&#xff08;Link&#xff09;是一种特殊的文件&#xff0c;它指向另一个文件或目录。链接分为两种类型&#xff1a;硬链接&#xff08;Hard Link&#xff09;和软链接&#xff08;也称为符号链接&#xff0c;Symbolic Link&#xff09;。 1. 硬…

在 Baklib Experience 中实现混合 CMS 架构

“还记得 CMS 主要用于在网页上布局内容吗&#xff1f;当时&#xff0c;这满足了网站管理需求。然而&#xff0c;行业正在发生变化&#xff0c;数字体验平台 Baklib Digital Content Experience 正在引领潮流。继续阅读以了解如何以及详细了解可用于确保全渠道成功的两个原则。…

Laravel全尺寸表单:简化Web开发中的表单处理

引言 Laravel是一个功能丰富的PHP Web框架&#xff0c;它提供了许多工具来简化开发过程&#xff0c;包括处理表单数据。Laravel的全尺寸表单功能允许开发者轻松创建、验证和管理表单数据&#xff0c;同时保持代码的简洁性和可维护性。本文将深入探讨Laravel全尺寸表单的工作机…