NLP(六十四)使用FastChat计算LLaMA-2模型的token长度

LLaMA-2模型部署

  在文章NLP(五十九)使用FastChat部署百川大模型中,笔者介绍了FastChat框架,以及如何使用FastChat来部署百川模型。
  本文将会部署LLaMA-2 70B模型,使得其兼容OpenAI的调用风格。部署的Dockerfile文件如下:

FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat

Docker-compose.yml文件如下:

version: "3.9"services:fastchat-controller:build:context: .dockerfile: Dockerfileimage: fastchat:latestports:- "21001:21001"entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]fastchat-model-worker:build:context: .dockerfile: Dockerfilevolumes:- ./model:/root/modelimage: fastchat:latestports:- "21002:21002"deploy:resources:reservations:devices:- driver: nvidiadevice_ids: ['0', '1']capabilities: [gpu]entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "llama2-70b-chat", "--model-path", "/root/model/llama2/Llama-2-70b-chat-hf", "--num-gpus", "2", "--gpus",  "0,1", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]fastchat-api-server:build:context: .dockerfile: Dockerfileimage: fastchat:latestports:- "8000:8000"entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]

部署成功后,会占用2张A100,每张A100占用约66G显存。
  测试模型是否部署成功:

curl http://localhost:8000/v1/models

输出结果如下:

{"object": "list","data": [{"id": "llama2-70b-chat","object": "model","created": 1691504717,"owned_by": "fastchat","root": "llama2-70b-chat","parent": null,"permission": [{"id": "modelperm-3XG6nzMAqfEkwfNqQ52fdv","object": "model_permission","created": 1691504717,"allow_create_engine": false,"allow_sampling": true,"allow_logprobs": true,"allow_search_indices": true,"allow_view": true,"allow_fine_tuning": false,"organization": "*","group": null,"is_blocking": false}]}]
}

部署LLaMA-2 70B模型成功!

Prompt token长度计算

  在FastChat的Github开源项目中,项目提供了计算Prompt的token长度的API,文件路径为:fastchat/serve/model_worker.py,调用方法为:

curl --location 'localhost:21002/count_token' \
--header 'Content-Type: application/json' \
--data '{"prompt": "What is your name?"}'

输出结果如下:

{"count": 6,"error_code": 0
}

Conversation token长度计算

  在FastChat中计算Conversation(对话)的token长度较为麻烦。
  首先我们需要获取LLaMA-2 70B模型的对话配置,调用API如下:

curl --location --request POST 'http://localhost:21002/worker_get_conv_template'

输出结果如下:

{'conv': {'messages': [],'name': 'llama-2','offset': 0,'roles': ['[INST]', '[/INST]'],'sep': ' ','sep2': ' </s><s>','sep_style': 7,'stop_str': None,'stop_token_ids': [2],'system_message': 'You are a helpful, respectful and honest ''assistant. Always answer as helpfully as ''possible, while being safe. Your answers should ''not include any harmful, unethical, racist, ''sexist, toxic, dangerous, or illegal content. ''Please ensure that your responses are socially ''unbiased and positive in nature.\n''\n''If a question does not make any sense, or is not ''factually coherent, explain why instead of '"answering something not correct. If you don't ""know the answer to a question, please don't share "'false information.','system_template': '[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n'}}

  在FastChat中的对话文件(fastchat/conversation.py)中,提供了对话加工的代码,这里不再展示,使用时直接复制整个文件即可,该文件不依赖任何第三方模块。
  我们需要将对话按照OpenAI的方式加工成对应的Prompt,输入的对话(messages)如下:

messages = [{“role”: “system”, “content”: “You are Jack, you are 20 years old, answer questions with humor.”}, {“role”: “user”, “content”: “What is your name?”},{“role”: “assistant”, “content”: " Well, well, well! Look who’s asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!“}, {“role”: “user”, “content”: “How old are you?”}, {“role”: “assistant”, “content”: " Oh, you want to know my age? Well, let’s just say I’m older than a bottle of wine but younger than a bottle of whiskey. I’m like a fine cheese, getting better with age, but still young enough to party like it’s 1999!”}, {“role”: “user”, “content”: “Where is your hometown?”}]

Python代码如下:

# -*- coding: utf-8 -*-
# @place: Pudong, Shanghai 
# @file: prompt.py
# @time: 2023/8/8 19:24
from conversation import Conversation, SeparatorStylemessages = [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999!"}, {"role": "user", "content": "Where is your hometown?"}]llama2_conv = {"conv":{"name":"llama-2","system_template":"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n","system_message":"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.","roles":["[INST]","[/INST]"],"messages":[],"offset":0,"sep_style":7,"sep":" ","sep2":" </s><s>","stop_str":None,"stop_token_ids":[2]}}
conv = llama2_conv['conv']conv = Conversation(name=conv["name"],system_template=conv["system_template"],system_message=conv["system_message"],roles=conv["roles"],messages=list(conv["messages"]),  # prevent in-place modificationoffset=conv["offset"],sep_style=SeparatorStyle(conv["sep_style"]),sep=conv["sep"],sep2=conv["sep2"],stop_str=conv["stop_str"],stop_token_ids=conv["stop_token_ids"],)if isinstance(messages, str):prompt = messages
else:for message in messages:msg_role = message["role"]if msg_role == "system":conv.set_system_message(message["content"])elif msg_role == "user":conv.append_message(conv.roles[0], message["content"])elif msg_role == "assistant":conv.append_message(conv.roles[1], message["content"])else:raise ValueError(f"Unknown role: {msg_role}")# Add a blank message for the assistant.conv.append_message(conv.roles[1], None)prompt = conv.get_prompt()print(repr(prompt))

加工后的Prompt如下:

"[INST] <<SYS>>\nYou are Jack, you are 20 years old, answer questions with humor.\n<</SYS>>\n\nWhat is your name?[/INST]  Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend! </s><s>[INST] How old are you? [/INST]  Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999! </s><s>[INST] Where is your hometown? [/INST]"

  最后再调用计算Prompt的API(参考上节的Prompt token长度计算),输出该对话的token长度为199.
  我们使用FastChat提供的对话补充接口(v1/chat/completions)验证输入的对话token长度,请求命令为:

curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{"model": "llama2-70b-chat","messages": [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who'\''s asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let'\''s just say I'\''m older than a bottle of wine but younger than a bottle of whiskey. I'\''m like a fine cheese, getting better with age, but still young enough to party like it'\''s 1999!"}, {"role": "user", "content": "Where is your hometown?"}]
}'

输出结果为:

{"id": "chatcmpl-mQxcaQcNSNMFahyHS7pamA","object": "chat.completion","created": 1691506768,"model": "llama2-70b-chat","choices": [{"index": 0,"message": {"role": "assistant","content": " Ha! My hometown? Well, that's a tough one. I'm like a bird, I don't have a nest, I just fly around and land wherever the wind takes me. But if you really want to know, I'm from a place called \"The Internet\". It's a magical land where memes and cat videos roam free, and the Wi-Fi is always strong. It's a beautiful place, you should visit sometime!"},"finish_reason": "stop"}],"usage": {"prompt_tokens": 199,"total_tokens": 302,"completion_tokens": 103}
}

注意,输出的prompt_tokens为199,这与我们刚才计算的对话token长度的结果是一致的!

总结

  本文主要介绍了如何在FastChat中部署LLaMA-2 70B模型,并详细介绍了Prompt token长度计算以及对话(conversation)的token长度计算。希望能对读者有所帮助~
  笔者的一点心得是:阅读源码真的很重要。
  笔者的个人博客网址为:https://percent4.github.io/ ,欢迎大家访问~

参考网址

  1. NLP(五十九)使用FastChat部署百川大模型: https://blog.csdn.net/jclian91/article/details/131650918
  2. FastChat: https://github.com/lm-sys/FastChat

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/32956.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

VMware上的Centos设置静态IP

服务器环境一般都是Centos7&#xff0c;而且很多软件在Linux环境上也能支持得更好&#xff0c;所以我需要在本机上使用虚拟机安装Linux&#xff0c;因为需要访问Linux上安装的软件&#xff0c;所以需要固定IP&#xff0c;不然每次更改也不方便。 基础环境准备 安装VMware在VM…

Linux 基础(五)常用命令-文件属性

文件属性 文件权限文件属性修改文件权限属性 文件所有者 文件权限 文件属性 Linux中文件权限 可以通过文件属性体现&#xff1b; 使用 ll 查看文件列表 最前面的 l d 表示文件类型 1 5 表示硬链接数 或者 子文件夹个数 所属用户 所属用户组 文件大小 创建/更新时间 文件&…

java使用正则表达式时遇到的问题

标准的正则表达式是什么样的 Node.js(JavaScript) 在正则表达式中&#xff0c;斜杠&#xff08;/&#xff09;用来表示正则表达式的开始和结束。在JavaScript中&#xff0c;正则表达式可以使用斜杠包裹起来&#xff0c;以表示这是一个正则表达式的字面量。 在Node.js中&…

Linux基础命令

命令 //关闭防火墙1 systemctl stop firewalld.service //关闭防火墙程序 systemctl disable firewalld.service//关闭开机自启动 //关闭防火墙2 firewall-cmd--set-defaut-zonetrusted//关闭虚拟化 systemctl stop libvirtd.service //关闭虚拟化程序 systemctl disable ibvi…

Flutter系列文章-Flutter UI进阶

在本篇文章中&#xff0c;我们将深入学习 Flutter UI 的进阶技巧&#xff0c;涵盖了布局原理、动画实现、自定义绘图和效果、以及 Material 和 Cupertino 组件库的使用。通过实例演示&#xff0c;你将更加了解如何创建复杂、令人印象深刻的用户界面。 第一部分&#xff1a;深入…

c51单片机串行通信示例代码(单片机--单片机通信)(附带proteus线路图)

//这个发送端代码 #include "reg51.h" #include "myheader.h" #define uchar unsigned char long int sleep_i0; long int main_i0; void main() {uchar sendx[6]{2,0,2,3,8,1};sleep(2000);TMOD0x20;TH10XF4;//根据波特率计算公式这里需要设置为这么多才能…

用vim打开后中文乱码怎么办

Vim中打开文件乱码主要是文件编码问题。用户可以参考如下解决方法。 1、用vim打开.vimrc配置文件 vim ~/.vimrc**注意&#xff1a;**如果用户根目录下没有.vimrc文件就把/etc/vim/vimrc文件复制过来直接用 cp /etc/vim/vimrc ~/.vimrc2、在.vimrc中加入如下内容 set termen…

将达梦DM8数据库JDBC驱动注册到本地Maven仓库安装

这里写目录标题 1下载DmJdbcDriver18.jar2,以管理员身份运行CMD窗口&#xff0c;然后执行如下安装命令&#xff1a;3.pom文件添加引用 1下载DmJdbcDriver18.jar 我本地放在d:\DmJdbcDriver.jar 2,以管理员身份运行CMD窗口&#xff0c;然后执行如下安装命令&#xff1a; mvn …

【雕爷学编程】Arduino动手做(13)---TTP223B电容式触摸按键模块之点动型篮板、AB款红板、AT款篮板与带背光板锁存款

37款传感器与模块的提法&#xff0c;在网络上广泛流传&#xff0c;其实Arduino能够兼容的传感器模块肯定是不止37种的。鉴于本人手头积累了一些传感器和执行器模块&#xff0c;依照实践出真知&#xff08;一定要动手做&#xff09;的理念&#xff0c;以学习和交流为目的&#x…

揭秘 | 我为什么要写这本书?

过去二十年&#xff0c;我一直从事着云计算、大数据、高性能存储与计算系统架构的研发推广等工作&#xff0c;时常会遇到行业内外的人对云计算与大数据有五花八门的观点、需求与问题&#xff0c;并且发现有些观点、看法与理解是被“误导”的&#xff0c;很多业务需求和对问题的…

docker镜像push到仓库

镜像可以很方便直接 push 到 docker 的公共仓库或阿里云仓库 一、Dockerpush指定仓库是什么&#xff1f; Dockerpush是Docker的一个命令&#xff0c;用于将本地的Docker镜像推送到Docker官方公共仓库或用户私人仓库。而指定仓库则是将这个Docker镜像推送到指定的仓库中。 通过D…

GPT-3.5 人工智能还是人工智障?——西红柿炒钢丝球!!

人工智能还是人工智障&#xff1f;——西红柿炒钢丝球 西红柿炒钢丝球的 基本信息西红柿炒钢丝球的 详细制作方法材料步骤 备注幕后花絮。。。。。。。。。关于GPT-3.5&#xff0c;你的看法&#xff1a; 西红柿炒钢丝球的 基本信息 西红柿炒钢丝球是一道具有悠久历史的传统中式…

不基于比较的排序:基数排序

本篇只是讨论桶排序的具体实现&#xff0c;想了解更多算法内容可以在我的博客里搜&#xff0c;建议大家看看这篇排序算法总结&#xff1a;排序算法总结_鱼跃鹰飞的博客-CSDN博客 桶排序的原理&#xff1a; 代码&#xff1a;sort1是一个比较二逼的实现方式浪费空间&#xff0c;s…

Linux Day08

内存申请与释放 前面的内存为实际内存&#xff0c;后面的交换空间为虚拟内存 当申请空间小于等于内存时&#xff0c;先使用内存。 当申请空间d大于内存时&#xff0c;使用内存虚拟内存 1、判断依据 申请1个G的空间 #include<stdio.h> #include<stdlib.h> #inc…

neo4j终端操作

1】进入容器 (base) xiaokkkxiaokkkdeMacBook-Pro ~ % docker exec -it 77ed5fe2b52e /bin/bash 2】启动、停止neo4j root77ed5fe2b52e:/var/lib/neo4j/bin# ./neo4j start Neo4j is already running (pid:7). Run with --verbose for a more detailed error message.root7…

MySQL目录结构与源码

目录 1.1 主要的目录结构 1.2 MySQL源码获取 1.1 主要的目录结构 MySQL的目录结构说明bin目录所有MySQL的可执行文件。如&#xff1a;mysql.exeMySQLInstanceConfig.exe数据库的配置向导&#xff0c;在安装时出现的内容data目录系统数据库所在的目录my.ini文件MySQL的主要配置…

采用人工智能进行软件开发:解决方案策略和实施

本文提供了将人工智能无缝集成到软件开发、解决流程、选择工具和克服挑战的全面指南。 人工智能&#xff08;AI&#xff09;曾经只是未来预测领域的一个概念&#xff0c;现已成为我们日常生活中不可或缺的元素&#xff0c;极大地改变了全球各行各业。人工智能 彻底改变竞技场的…

Git Cherry-pick使用

概述 无论项目大小&#xff0c;当你和一群程序员一起工作时&#xff0c;处理多个 Git 分支之间的变更都会变得很困难。有时&#xff0c;与其把整个 Git 分支合并到另一个分支&#xff0c;不如选择并移动几个特定的提交。这个过程被称为 "挑拣", 即 Cherry-pick。 本…

【vue3】解决scope.row.id套标签太多无法随着点击按钮而变化

实现要求:再点击每一行的修改按钮时&#xff0c;动态拿取该行的id传给后端作为pk(主键)实现数据库数据的修改&#xff0c;并显示在vue前端&#xff1b; 我遇到的问题&#xff1a;在2处使用 scope 作用域插槽拿取每一行的数据&#xff0c;在3处&#xff0c;删除按钮那一行代码&a…

python函数、运算符等简单介绍2(无顺序)

list&#xff08;列表&#xff09; 列表是Python的一种内置数据类型&#xff0c;列表是可以装各种数据类 型的容器 # 第一种list创建方式 list_name [晓东,小刚,明明,小红,123,123.4,123] print(list_name) print(type(list_name)) # 输出结果&#xff1a; [晓东, 小刚, 明明…