昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

目录

  • 前情提要
    • torch_npu框架
    • mindsport框架
    • mindnlp框架
  • 下载模型
    • 国外
    • 国内
  • 环境设置
  • 代码适配(非流式)
    • Main
    • Branch
    • 结果展示
  • 代码适配(流式)

前情提要

torch_npu框架

官方未适配
在这里插入图片描述

mindsport框架

官方未适配
在这里插入图片描述

mindnlp框架

官方适配了,但是速度非常非常慢,10秒一个字
在这里插入图片描述

下载模型

国外

Hugging FaceHugging Face

国内

在这里插入图片描述modelscope

环境设置

pip install transformers==4.39.2
pip3 install torch==2.1.0
pip3 install torch-npu==2.1.0.post4
pip3 install accelerate==0.24.1
pip3 install transformers-stream-generator==0.0.5

代码适配(非流式)

Main

import torch
import torch_npu
import os
import platform
torch_device = "npu:1" # 0~7
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
from transformers import AutoModelForCausalLM, AutoTokenizer
# device = "cuda" # the device to load the model onto
DEFAULT_CKPT_PATH = '/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct'
model = AutoModelForCausalLM.from_pretrained(DEFAULT_CKPT_PATH,torch_dtype=torch.float16,device_map=torch_device
).npu().eval()
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_CKPT_PATH)
while True:prompt = input("user:")if prompt == "exit":breakmessages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)model_inputs = tokenizer([text], return_tensors="pt").to(torch_device)generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]print("Qwen2-7B-Instruct:",response)

Branch

找到自己虚拟环境

which python

我的是/root/anaconda3/envs/sakura/bin/python
找到/lib/python3.9/site-packages/transformers/generation/utils.py示例:

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py

找到第2708行,注释掉2708行~2712行
在2709行添加

next_token_scores = outputs.logits[:, -1, :]

示例:
在这里插入图片描述
出错就是在这里,如果进行了pre-process distribution,就会报错

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/logits_process.py:455: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
Traceback (most recent call last):File "/root/Qwen_test.py", line 63, in <module>generated_ids = model.generate(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1576, in generateresult = self._sample(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 2736, in _samplenext_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: Sync:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158 NPU error, error code is 507018
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
E39999: Inner Error!
E39999: 2024-07-02-14:14:50.735.070  An exception occurred during AICPU execution, stream_id:23, task_id:2750, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730]TraceBack (most recent call last):rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[1] PID[864803]:
EXCEPTION TASK:Exception info:TGID=864803, model id=65535, stream id=23, stream phase=SCHEDULE, task id=2750, task type=aicpu kernel, recently received task id=2750, recently send task id=2749, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2024-07-02-14:14:50.091.974, function=proc_aicpu_task_done, line=970, error code=0x2a
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999: 2024-07-02-14:14:50.743.702  Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776]TraceBack (most recent call last):Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579]Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512]rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161](function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.745.695  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.747.300  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.814.377  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.816.023  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.817.628  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.819.236  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.820.843  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.822.422  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)

结果展示

最后运行Main文件
在这里插入图片描述

代码适配(流式)

未完待续

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/41457.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

HTTP与HTTPS的主要区别

HTTP&#xff08;超文本传输协议&#xff09;与HTTPS&#xff08;超文本传输安全协议&#xff09;的主要区别在于安全性、数据传输方式、默认使用的端口以及对网站的影响。 一、安全性&#xff1a; HTTP是一种无加密的协议&#xff0c;数据在传输过程中以明文形式发送&#x…

InfluxDB时序数据库基本使用介绍

1、概要介绍 1.1、时序数据库使用场景 所谓时序数据库就是按照一定规则的时间序列进行数据读写操作的数据库。它们常被用于以下业务场景&#xff1a; 物联网IOT场景&#xff1a;可用于IOT设备的指标、状态监控数据存取。IT建设场景&#xff1a;可用于服务器、虚拟机、容器的…

等保测评需要什么SSL证书

在进行信息安全等级保护&#xff08;简称“等保”&#xff09;测评时&#xff0c;选择合适的HTTPS证书对于确保网站的安全性和合规性至关重要。以下是在等保测评中选择HTTPS证书时应考虑的因素&#xff1a; 国产证书&#xff1a; 等保测评倾向于使用国产品牌的SSL证书&#x…

上网行为管理系统是什么?有哪些好用的上网行为管理系统?

IT经理&#xff08;ITM&#xff09;: 大家好&#xff0c;今天我们聚在这里&#xff0c;是为了讨论一个对我们公司来说越来越重要的议题&#xff1a;上网行为管理系统&#xff08;WBS&#xff09;。我们知道&#xff0c;员工的网络使用已经不仅仅是个人行为&#xff0c;它直接影…

序列化Serializable

一、传输对象的方式 将对象从内存传输到磁盘进行保存&#xff0c;或者进行网络传输&#xff0c;有两种方式&#xff1a; 实现Serializable接口&#xff0c;直接传输对象转成json字符串后&#xff0c;进行字符串传输 二、直接传输对象 implements Serializable Data Equal…

resp 无法连接 redis 服务器

问题原因可能是&#xff1a;防火墙 防火墙对这个端口没有开放&#xff0c;所以主机访问不到 解决方法&#xff1a; 步骤1&#xff1a;开发指定端口号 #放通6379/tcp端口 firewall-cmd --zonepublic --permanent --add-port6379/tcp 步骤2&#xff1a;重启防火墙 firewall-c…

.netcore微服务——项目搭建

在.NET Core中&#xff0c;微服务是一种架构风格&#xff0c;它将应用程序构造为一组小型服务的集合&#xff0c;这些服务都通过HTTP-based API进行通信。每个服务都是独立部署的&#xff0c;可以用不同的编程语言编写&#xff0c;并且可以使用不同的数据存储技术。 微服务的主…

什么是网络抓取|常见用例和问题

你可能听说过数据被称为现代信息社会的新石油。由于线上信息量庞大&#xff0c;能够有效地收集和分析网页数据已经成为企业、研究人员和开发人员的关键技能。这就是网页抓取技术的用武之地。网页抓取&#xff0c;也称为网页数据提取&#xff0c;是一种强大的技术&#xff0c;能…

IDEA 2018提交Git之后撤销commit

1、选择项目——>右击git——>找到Repostiory——>执行rest head 2、编辑reset head 3、回退到上一个版本&#xff08;HEAD~1&#xff09;&#xff0c;点击reset即可&#xff0c;如果还想继续回滚&#xff0c;再次执行即可

Linux平台x86_64|aarch64架构如何实现轻量级RTSP服务

技术背景 我们在做Linux平台x86_64架构或aarch64架构的推送模块的时候&#xff0c;有公司提出这样的技术需求&#xff0c;希望在Linux平台&#xff0c;实现轻量级RTSP服务&#xff0c;实现对摄像头或屏幕对外RTSP拉流&#xff0c;同步到大屏上去。 技术实现 废话不多说&…

在 Baklib Experience 中实现混合 CMS 架构

“还记得 CMS 主要用于在网页上布局内容吗&#xff1f;当时&#xff0c;这满足了网站管理需求。然而&#xff0c;行业正在发生变化&#xff0c;数字体验平台 Baklib Digital Content Experience 正在引领潮流。继续阅读以了解如何以及详细了解可用于确保全渠道成功的两个原则。…

python笔记和练习----少儿编程课程

第1课&#xff1a; 认识新朋友-python 知识点&#xff1a; 1、在英文状态下编写Python语句。 2、内置函数print()将结果输出到标准的控制台上&#xff0c;它的基本语法格式如下&#xff1a; print("即将输出的内容") #输出的内容要用引号引起来&#xff0c;可…

遗漏知识点

什么是RAII&#xff1f; RAII是Resource Acquisition Is Initialization&#xff08;wiki上面翻译成 “资源获取就是初始化”&#xff09;的简称&#xff0c;是C语言的一种管理资源、避免泄漏的惯用法。利用的就是C构造的对象最终会被销毁的原则。RAII的做法是使用一个对象&am…

DC/AC电源模块在不同的电源类型之间进行转换

BOSHIDA DC/AC电源模块在不同的电源类型之间进行转换 电力转换是现代社会不可或缺的一部分&#xff0c;它使我们能够在不同的电源类型之间进行转换&#xff0c;从而满足各种设备和应用的需求。DC/AC电源模块是一种用于将直流电转换为交流电的设备&#xff0c;它在电子设备、太…

[单master节点k8s部署]19.监控系统构建(四)kube-state-metrics

kube-state-metrics 是一个Kubernetes的附加组件&#xff0c;它通过监听 Kubernetes API 服务器来收集和生成关于 Kubernetes 对象&#xff08;如部署、节点和Pod等&#xff09;的状态的指标。这些指标可供 Prometheus 进行抓取和存储&#xff0c;从而使你能够监控和分析Kubern…

软件是什么?一个软件到底是哪些部分组成的-软件到底有哪些分支呢?

https://doc.youyacao.com/117/2163 软件是什么&#xff1f;一个软件到底是哪些部分组成的-软件到底有哪些分支呢&#xff1f; 何为软件 软件定义 的本质是通过软件编程实现硬件资源的虚拟化、灵活、多样和定制化功能&#xff0c;以最大化系统运行效率和能量效率。它基于硬…

VirtualBox的windows server 2016设置主机和虚拟机共享文件夹

文章目录 安装步骤1. windows server 2016安装增强功能2.上述安装完成之后&#xff0c;需要做共享文件夹&#xff0c;在宿主机&#xff0c;新建一个test文件夹&#xff0c;做共享设置&#xff0c;如下图&#xff1a;3.然后打开虚拟机&#xff0c;设置文件共享 安装步骤 1. win…

kafka系列之消费后不提交offset情况的分析总结

概述 每当我们调用Kafka的poll()方法或者使用KafkaListener(其实底层也是poll()方法)时&#xff0c;它都会返回之前被写入Kafka的记录&#xff0c;即我们组中的消费者还没有读过的记录。 这意味着我们有一种方法可以跟踪该组消费者读取过的记录。 如前所述&#xff0c;Kafka的一…

Java数据结构-树的面试题

目录 一.谈谈树的种类 二.红黑树如何实现 三.二叉树的题目 1.求一个二叉树的高度&#xff0c;有两种方法。 2.寻找二叉搜索树当中第K大的值 3、查找与根节点距离K的节点 4.二叉树两个结点的公共最近公共祖先 本专栏全是博主自己收集的面试题&#xff0c;仅可参考&#xf…

如何在Qt使用uchardet库

如何在 Qt 中使用 uchardet 库 文章目录 如何在 Qt 中使用 uchardet 库一、简介二、uchardet库的下载三、在Qt中直接调用四、编译成库文件后调用4.1 编译工具下载4.2 uchardet源码编译4.3 测试编译文件4.4 Qt中使用 五、一些小问题5.1 测试文件存在的问题5.2 uchardet库相关 六…