手把手教你:LLama2原始权重转HF模型

LLama2是meta最新开源的语言大模型,训练数据集2万亿token,上下文长度由llama的2048扩展到4096,可以理解和生成更长的文本,包括7B、13B和70B三个模型,在各种基准集的测试上表现突出,该模型可用于研究和商业用途。

LLama2模型权重和tokenizer下载需要申请访问。

申请链接:https://ai.meta.com/resources/models-and-libraries/llama-downloads/

由于下载的原始LLama2模型权重文件不能直接调用huggingface的transformers库进行使用,如果要使用huggingface transformer训练LLaMA2,需要使用额外的转换脚本。

转换脚本:https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

现在huggingface上已发布了llama的hf版本,可以直接使用。

现在介绍LLama2模型的原始权重获取和转换脚本。

LLama2模型原始权重获取

在MetaAI申请通过后将会在邮件中提及到PRESIGNED_URL,运行download.sh,按照提示输入即可。

set -eread -p "Enter the URL from email: " PRESIGNED_URL 
echo ""
read -p "Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: " MODEL_SIZE  
TARGET_FOLDER="../target/file"             # where all files should end up
mkdir -p ${TARGET_FOLDER}if [[ $MODEL_SIZE == "" ]]; thenMODEL_SIZE="7B,13B,70B,7B-chat,13B-chat,70B-chat"
fiecho "Downloading LICENSE and Acceptable Usage Policy"
wget --continue ${PRESIGNED_URL/'*'/"LICENSE"} -O ${TARGET_FOLDER}"/LICENSE"
wget --continue ${PRESIGNED_URL/'*'/"USE_POLICY.md"} -O ${TARGET_FOLDER}"/USE_POLICY.md"echo "Downloading tokenizer"
wget --continue ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"
wget --continue ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"
CPU_ARCH=$(uname -m)if [ "$CPU_ARCH" = "arm64" ]; then(cd ${TARGET_FOLDER} && md5 tokenizer_checklist.chk)else(cd ${TARGET_FOLDER} && md5sum -c tokenizer_checklist.chk)fifor m in ${MODEL_SIZE//,/ }
doif [[ $m == "7B" ]]; thenSHARD=0MODEL_PATH="llama-2-7b"elif [[ $m == "7B-chat" ]]; thenSHARD=0MODEL_PATH="llama-2-7b-chat"elif [[ $m == "13B" ]]; thenSHARD=1MODEL_PATH="llama-2-13b"elif [[ $m == "13B-chat" ]]; thenSHARD=1MODEL_PATH="llama-2-13b-chat"elif [[ $m == "70B" ]]; thenSHARD=7MODEL_PATH="llama-2-70b"elif [[ $m == "70B-chat" ]]; thenSHARD=7MODEL_PATH="llama-2-70b-chat"fiecho "Downloading ${MODEL_PATH}"mkdir -p ${TARGET_FOLDER}"/${MODEL_PATH}"for s in $(seq -f "0%g" 0 ${SHARD})dowget ${PRESIGNED_URL/'*'/"${MODEL_PATH}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/consolidated.${s}.pth"donewget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/params.json"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/params.json"wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/checklist.chk"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/checklist.chk"echo "Checking checksums"if [ "$CPU_ARCH" = "arm64" ]; then(cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5 checklist.chk)else(cd ${TARGET_FOLDER}"/${MODEL_PATH}" && md5sum -c checklist.chk)fi
done

运行download.sh:

sh download.sh

代码注释

# 导入包
import argparse
import gc
import json
import os
import shutil
import warnings
import torch
from transformers import LlamaConfig, LlamaForCausalLM, LlamaTokenizer# 判断LlamaTokenizerFast是否可用,LlamaTokenizerFast可以加速tokenization
try:from transformers import LlamaTokenizerFast
except ImportError as e:warnings.warn(e)warnings.warn("The converted tokenizer will be the `slow` tokenizer. To use the fast, update your `tokenizers` library and re-run the tokenizer conversion")LlamaTokenizerFast = None# 不同版本的LLama模型的分片数目
NUM_SHARDS = {"7B": 1,"7Bf": 1,"13B": 2,"13Bf": 2,"34B": 4,"30B": 4,"65B": 8,"70B": 8,"70Bf": 8,
}# 计算中间层大小,优化计算效率
def compute_intermediate_size(n, ffn_dim_multiplier=1, multiple_of=256):return multiple_of * ((int(ffn_dim_multiplier * int(8 * n / 3)) + multiple_of - 1) // multiple_of)# 读取json文件
def read_json(path):with open(path, "r") as f:return json.load(f)# 写入json文件
def write_json(text, path):with open(path, "w") as f:json.dump(text, f)def write_model(model_path, input_base_path, model_size, tokenizer_path=None, safe_serialization=True):# 检查参数文件路径if not os.path.isfile(os.path.join(input_base_path, "params.json")):input_base_path = os.path.join(input_base_path, model_size)# 创建模型临时保存目录os.makedirs(model_path, exist_ok=True)tmp_model_path = os.path.join(model_path, "tmp")os.makedirs(tmp_model_path, exist_ok=True)# 读取参数params = read_json(os.path.join(input_base_path, "params.json"))num_shards = NUM_SHARDS[model_size]n_layers = params["n_layers"]n_heads = params["n_heads"]n_heads_per_shard = n_heads // num_shardsdim = params["dim"]dims_per_head = dim // n_headsbase = params.get("rope_theta", 10000.0)inv_freq = 1.0 / (base ** (torch.arange(0, dims_per_head, 2).float() / dims_per_head))if base > 10000.0:max_position_embeddings = 16384else:max_position_embeddings = 2048# 初始化tokenizertokenizer_class = LlamaTokenizer if LlamaTokenizerFast is None else LlamaTokenizerFastif tokenizer_path is not None:tokenizer = tokenizer_class(tokenizer_path)tokenizer.save_pretrained(model_path)vocab_size = tokenizer.vocab_size if tokenizer_path is not None else 32000# 处理键值对头信息if "n_kv_heads" in params:num_key_value_heads = params["n_kv_heads"]  # for GQA / MQAnum_local_key_value_heads = n_heads_per_shard // num_key_value_headskey_value_dim = dim // num_key_value_headselse:  # compatibility with other checkpointsnum_key_value_heads = n_headsnum_local_key_value_heads = n_heads_per_shardkey_value_dim = dim# 张量变换def permute(w, n_heads=n_heads, dim1=dim, dim2=dim):return w.view(n_heads, dim1 // n_heads // 2, 2, dim2).transpose(1, 2).reshape(dim1, dim2)print(f"Fetching all parameters from the checkpoint at {input_base_path}.")# 加载权重if num_shards == 1:loaded = torch.load(os.path.join(input_base_path, "consolidated.00.pth"), map_location="cpu")else:loaded = [torch.load(os.path.join(input_base_path, f"consolidated.{i:02d}.pth"), map_location="cpu")for i in range(num_shards)]param_count = 0index_dict = {"weight_map": {}}# 处理每一层的原始权重,并转化为bin文件for layer_i in range(n_layers):filename = f"pytorch_model-{layer_i + 1}-of-{n_layers + 1}.bin"if num_shards == 1:# Unshardedstate_dict = {f"model.layers.{layer_i}.self_attn.q_proj.weight": permute(loaded[f"layers.{layer_i}.attention.wq.weight"]),f"model.layers.{layer_i}.self_attn.k_proj.weight": permute(loaded[f"layers.{layer_i}.attention.wk.weight"]),f"model.layers.{layer_i}.self_attn.v_proj.weight": loaded[f"layers.{layer_i}.attention.wv.weight"],f"model.layers.{layer_i}.self_attn.o_proj.weight": loaded[f"layers.{layer_i}.attention.wo.weight"],f"model.layers.{layer_i}.mlp.gate_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w1.weight"],f"model.layers.{layer_i}.mlp.down_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w2.weight"],f"model.layers.{layer_i}.mlp.up_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w3.weight"],f"model.layers.{layer_i}.input_layernorm.weight": loaded[f"layers.{layer_i}.attention_norm.weight"],f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[f"layers.{layer_i}.ffn_norm.weight"],}else:# Sharded# Note that attention.w{q,k,v,o}, feed_fordward.w[1,2,3], attention_norm.weight and ffn_norm.weight share# the same storage object, saving attention_norm and ffn_norm will save other weights too, which is# redundant as other weights will be stitched from multiple shards. To avoid that, they are cloned.state_dict = {f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][f"layers.{layer_i}.attention_norm.weight"].clone(),f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[0][f"layers.{layer_i}.ffn_norm.weight"].clone(),}state_dict[f"model.layers.{layer_i}.self_attn.q_proj.weight"] = permute(torch.cat([loaded[i][f"layers.{layer_i}.attention.wq.weight"].view(n_heads_per_shard, dims_per_head, dim)for i in range(num_shards)],dim=0,).reshape(dim, dim))state_dict[f"model.layers.{layer_i}.self_attn.k_proj.weight"] = permute(torch.cat([loaded[i][f"layers.{layer_i}.attention.wk.weight"].view(num_local_key_value_heads, dims_per_head, dim)for i in range(num_shards)],dim=0,).reshape(key_value_dim, dim),num_key_value_heads,key_value_dim,dim,)state_dict[f"model.layers.{layer_i}.self_attn.v_proj.weight"] = torch.cat([loaded[i][f"layers.{layer_i}.attention.wv.weight"].view(num_local_key_value_heads, dims_per_head, dim)for i in range(num_shards)],dim=0,).reshape(key_value_dim, dim)state_dict[f"model.layers.{layer_i}.self_attn.o_proj.weight"] = torch.cat([loaded[i][f"layers.{layer_i}.attention.wo.weight"] for i in range(num_shards)], dim=1)state_dict[f"model.layers.{layer_i}.mlp.gate_proj.weight"] = torch.cat([loaded[i][f"layers.{layer_i}.feed_forward.w1.weight"] for i in range(num_shards)], dim=0)state_dict[f"model.layers.{layer_i}.mlp.down_proj.weight"] = torch.cat([loaded[i][f"layers.{layer_i}.feed_forward.w2.weight"] for i in range(num_shards)], dim=1)state_dict[f"model.layers.{layer_i}.mlp.up_proj.weight"] = torch.cat([loaded[i][f"layers.{layer_i}.feed_forward.w3.weight"] for i in range(num_shards)], dim=0)state_dict[f"model.layers.{layer_i}.self_attn.rotary_emb.inv_freq"] = inv_freqfor k, v in state_dict.items():index_dict["weight_map"][k] = filenameparam_count += v.numel()torch.save(state_dict, os.path.join(tmp_model_path, filename))# 处理最后一层权重,并保存filename = f"pytorch_model-{n_layers + 1}-of-{n_layers + 1}.bin"if num_shards == 1:state_dict = {"model.embed_tokens.weight": loaded["tok_embeddings.weight"],"model.norm.weight": loaded["norm.weight"],"lm_head.weight": loaded["output.weight"],}else:state_dict = {"model.norm.weight": loaded[0]["norm.weight"],"model.embed_tokens.weight": torch.cat([loaded[i]["tok_embeddings.weight"] for i in range(num_shards)], dim=1),"lm_head.weight": torch.cat([loaded[i]["output.weight"] for i in range(num_shards)], dim=0),}for k, v in state_dict.items():index_dict["weight_map"][k] = filenameparam_count += v.numel()torch.save(state_dict, os.path.join(tmp_model_path, filename))# 写入配置文件index_dict["metadata"] = {"total_size": param_count * 2}write_json(index_dict, os.path.join(tmp_model_path, "pytorch_model.bin.index.json"))ffn_dim_multiplier = params["ffn_dim_multiplier"] if "ffn_dim_multiplier" in params else 1multiple_of = params["multiple_of"] if "multiple_of" in params else 256config = LlamaConfig(hidden_size=dim,intermediate_size=compute_intermediate_size(dim, ffn_dim_multiplier, multiple_of),num_attention_heads=params["n_heads"],num_hidden_layers=params["n_layers"],rms_norm_eps=params["norm_eps"],num_key_value_heads=num_key_value_heads,vocab_size=vocab_size,rope_theta=base,max_position_embeddings=max_position_embeddings,)config.save_pretrained(tmp_model_path)# 释放内存空间,以便正确加载模型del state_dictdel loadedgc.collect()print("Loading the checkpoint in a Llama model.")# 从临时文件中加载模型model = LlamaForCausalLM.from_pretrained(tmp_model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)# 避免将此作为配置的一部分保存del model.config._name_or_pathmodel.config.torch_dtype = torch.float16print("Saving in the Transformers format.")# 保存LLama模型到指定的路径model.save_pretrained(model_path, safe_serialization=safe_serialization)# 删除临时文件中的所有内容shutil.rmtree(tmp_model_path)# 保存tokenizer
def write_tokenizer(tokenizer_path, input_tokenizer_path):# Initialize the tokenizer based on the `spm` modeltokenizer_class = LlamaTokenizer if LlamaTokenizerFast is None else LlamaTokenizerFastprint(f"Saving a {tokenizer_class.__name__} to {tokenizer_path}.")tokenizer = tokenizer_class(input_tokenizer_path)tokenizer.save_pretrained(tokenizer_path)def main():# 参数处理parser = argparse.ArgumentParser()parser.add_argument("--input_dir",help="Location of LLaMA weights, which contains tokenizer.model and model folders",)parser.add_argument("--model_size",choices=["7B", "7Bf", "13B", "13Bf", "30B", "34B", "65B", "70B", "70Bf", "tokenizer_only"],help="'f' models correspond to the finetuned versions, and are specific to the Llama2 official release. For more details on Llama2, checkout the original repo: https://huggingface.co/meta-llama",)parser.add_argument("--output_dir",help="Location to write HF model and tokenizer",)parser.add_argument("--safe_serialization", type=bool, help="Whether or not to save using `safetensors`.")args = parser.parse_args()spm_path = os.path.join(args.input_dir, "tokenizer.model")# 判断转换的对象if args.model_size != "tokenizer_only":write_model(model_path=args.output_dir,input_base_path=args.input_dir,model_size=args.model_size,safe_serialization=args.safe_serialization,tokenizer_path=spm_path,)else:write_tokenizer(args.output_dir, spm_path)if __name__ == "__main__":main()

脚本运行

python convert_llama_weights_to_hf.py --input_dir raw-llama2-7b --output_dir llama2_7b_hf

raw-llama2-7b文件夹内容:

llama2_7b_hf转换文件内容:

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/136456.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Docker实战

一、Docker安装 以下均以CentOS 7为例 1、安装Docker yum install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin 2、启动和校验 # 启动Docker systemctl start docker# 停止Docker systemctl stop docker# 重启 systemctl resta…

系统架构设计】计算机公共基础知识: 5 数学与经济管理

目录 一 图论应用 1 最小生成树 2 最短路径 3 网络与最大流量 二 运筹方法

【JMeter】定时器分类以及场景介绍

1. 定时器分类 固定定时器 作用:请求之间设置等待时间应用场景:查询商品列表后,去查看列表商品详情页。针对商品列表数据量比较大的,响应时间会比较长,就需要设置等待时间然后去查看商详 2.定时器的作用域&#xff1…

Laravel事件监听器

在Laravel中,事件监听器(Event Listeners)用于监听特定事件的发生,并在事件发生时执行相应的处理逻辑。下面是使用事件监听器的基本步骤: 创建事件:首先需要创建一个事件类,在该类中定义了事件…

10-26 maven配置

打开idea 打开setting 基于Idea创建idea项目 加载jar包:(一般需要自己去手动加入,本地仓库是没有的)

Linux 命令:PS(进程状态)

1. 写在前面 本文主要介绍:Linux 下常用命令 PS —— 进程状态; 公众号: 滑翔的纸飞机 2. PS — 介绍(进程状态) ps 命令:显示 Linux 系统中运行进程有关的信息。 rootdev:~# psPID TTY TIME C…

AlGaN/GaN HEMT 中缓冲区相关电流崩溃的缓冲区电位模拟表征

标题:Characterization of Buffer-Related Current Collapse by Buffer Potential Simulation in AlGaN/GaN HEMTs 来源:IEEE TRANSACTIONS ON ELECTRON DEVICES (18年) 摘要 - 在本文中,通过使用脉冲 I-V 测量和二维漂移扩散模拟研究了 Al…

Go语言的Json序列化与反序列化、Goto语法、Tcp Socket通信

目录标题 一、Json序列化与反序列化1. 序列化2. 反序列化 二、Goto语法三、Tcp Socket1. 单客户端发送信息到服务端2. 服务端客户端通信 一、Json序列化与反序列化 1. 序列化 package mainimport ("encoding/json""fmt")type Person struct {Name string…

Redis查看集群状态有节点显示fail,连接失败

故障现象 现有6台redis,为集群,3主3从,由于两台服务器故障重装系统之后进行重新安装2台redis,安装完成之后 查询当前redis状态,发现有存留的节点,但连接为fai。 故障恢复 查询fail节点的node_id redis-cli -c -a p…

商城系统分布式下单

一、锁定库存的sql select * from ware where id{id} and total-lock>0 update ware set locklock{num} where id{id} and total-lock>{num} 二、下单服务要用分布式事务,因为seat的二阶段提交要说很多资源,会造成处理变成串行化,高并发…

pg14-sql基础(二)-排序与条件

排序 SELECT employee_id, first_name, last_name, hire_date, salary FROM employees ORDER BY first_name; --按字母,默认升序 ORDER BY hire_date ASC; --升序 ORDER BY hire_date DESC; --降序SELECT employee_id, first_name, last_name, hire_date, salary F…

Stable Diffusion webui 源码调试(三)

Stable Diffusion webui 源码调试(三) 个人模型主页:LibLibai stable-diffusion-webui 版本:v1.4.1 内容更新随机,看心情调试代码~ shared 变量 shared变量,简直是一锅大杂烩,shared变量存放…

数据结构与算法C语言版学习笔记(3)-线性表的链式结构:链表

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档 文章目录 前言:回顾顺序表的优缺点:为什么要引入链式结构的线性表? 一、什么是链表?二、链表的分类①为什么要设置头节点&…

案例-注册页面(css)

html页面用css控制样式&#xff0c;画一个注册页面。 页面最终效果如下&#xff1a; 页面代码&#xff1a; <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><title>注册页面</title> <style>*{…

excel表的筛选后自动求和

一般都使用subtotal函数。 通过看一个大佬的视频&#xff0c;发现可以有更简单的方法。 首先任意筛选数据(ctrlshiftl)&#xff0c; 然后选中需要求和的列的最下方的空白单元格&#xff0c;再按alt。 回车即可。 实质它还是用的subtotal函数

【技巧】并发读取Mysql数据保证读取到的数据不重复

【技巧】并发读取Mysql数据保证读取到的数据不重复 使用场景: 并发场景下, 保证不获取到重复的数据 思路: 先通过 MYSQL锁 去占位打标识,然后再去取数据 相当于几个人抢蛋糕, A先把蛋糕打上记号 蛋糕是A的, 然后再慢慢吃 表结构 表 t_userid name val used_flag 是否使用…

Jekyll框架编译GithubPages,提示没有docs

Jekyll Converters::Scss build issue: No such file or directory dir_chdir - /github/workspace/docs Error: No such file or directory dir_chdir - /github/workspace/docs 解决方案&#xff1a; 修改github page仓库中–> 设置—> pages 把里面的\docs&#xf…

Redis Java 开发简单示例

文章目录 一、概述二、Jedis 开发示例2.1 导入 maven 依赖2.2 使用连接池读写2.3 使用集群读写2.4 完整示例代码2.5 测试集群的搭建 三、Lettuce 开发示例3.1 导入 maven 依赖3.2 读写数据 四、Spring Boot Redis 开发示例4.1 导入 maven 依赖4.2 配置Redis服务地址4.3 基于 Re…

从零开始搭建React+TypeScript+webpack开发环境-性能优化

前言 当我们开发React应用时&#xff0c;性能始终是一个重要的考虑因素。随着应用规模的增长&#xff0c;React组件的数量和复杂性也会相应增加&#xff0c;这可能会导致性能问题的出现。在这篇博文中&#xff0c;我们将探讨如何通过一系列的技巧和最佳实践来优化React应用的性…

大厂秋招真题【模拟】阿里蚂蚁20231010秋招T2-奇偶操作

题目描述与示例 题目描述 小红有一个长度为n的数组a&#xff0c;她将对数组进行m次操作&#xff0c;每次操作有两种类型&#xff1a; 将数组中所有值为奇数的元素加上x将数组中所有值为偶数的元素加上x 请你输出m次操作后的数组 输入描述 第一行两个整数n和m&#xff0c;…