DeepSpeed训练得到checkpoint如何像Huggingface模型一样评测evaluation？zero_to

DeepSpeed训练得到checkpoint如何像Huggingface模型一样评测evaluation？

具体步骤

首先看一个样例：

在这里插入图片描述

这是我用open-instruct框架，使用DeepSpeed训练1728个steps得到的一个checkpoint。请注意，下文我演示用的例子是基于step_1152，主要方法和step_1728这个名字没有任何区别。

在使用 DeepSpeed 训练得到的 checkpoint 时，直接加载保存的模型路径可能需要一些额外的步骤，而不仅仅是像指定 model_name 一样加载 Hugging Face 提供的预训练模型。

以下是加载和评估 DeepSpeed Checkpoint 的相关信息和步骤：

1. DeepSpeed Checkpoint 结构

根据图片中的文件结构，DeepSpeed checkpoint 包括以下内容：

pytorch_model/: 分布式权重的保存路径（通常是 sharded weights）。
random_states_*.pkl: 保存训练过程中随机状态的文件，用于确保再现性。
scheduler.bin: 保存优化器调度器的状态。
latest: 用于记录当前最新的 step。
zero_to_fp32.py: 一个脚本，用于将分布式权重（Zero Partitioned Weights）合并为一个完整的模型权重文件（FP32）。

2. 是否可以直接加载路径

如果想直接像 Hugging Face 的 model_name 那样指定路径加载模型，需要确认以下两点：

是否已将分布式权重合并为完整的权重文件（FP32）。
是否提供了模型配置文件（主要是模型的config和tokenizer相关的文件，一个经典的例子是config.json, special_tokens_map.json, tokenizer_config.json以及tokenizer.json文件）。

3. 步骤：合并权重并评估性能

(1) 合并权重

DeepSpeed 的 zero_to_fp32.py 脚本可以将分布式权重合并为单个文件：

运行以下命令：

python zero_to_fp32.py . pytorch_model.bin

. 是 checkpoint 的路径。
pytorch_model.bin 是输出的完整 FP32 权重文件。

运行后，将得到一个完整的模型文件（例如 pytorch_model.bin），可以直接用于评估。

这里的是一个示例，这里需要参考 https://www.deepspeed.ai/tutorials/zero/里面的参考代码如下：

If you’d like to get the fp32 weights, we supply a special script that can do offline consolidation. It requires no configuration files or GPUs. Here is an example of its usage:
$ cd /path/to/checkpoint_dir
$ ./zero_to_fp32.py . pytorch_model.bin
Processing zero checkpoint at global_step1
Detected checkpoint of type zero stage 3, world_size: 2
Saving fp32 state dict to pytorch_model.bin (total_numel=60506624)
The zero_to_fp32.py script gets created automatically when you save a checkpoint.

下面是我的实际例子：

(peft_study) (base) xxx@test:~/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152$ python zero_to_fp32.py . pytorch_model.bin
[2025-01-01 03:54:14,913] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint './pytorch_model'
/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152/zero_to_fp32.py:146: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.state_dict = torch.load(f, map_location=device)
Detected checkpoint of type zero stage 3, world_size: 4
/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152/zero_to_fp32.py:98: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.state_dict = torch.load(file, map_location=device)
Parsing checkpoint created by deepspeed==0.15.0
Reconstructed Trainable fp32 state dict with 288 params 2614341888 elements
Saving fp32 state dict to pytorch_model.bin

(2) 检查模型配置

DeepSpeed 不会自动生成 Hugging Face 格式的config.json, special_tokens_map.json, tokenizer_config.json以及tokenizer.json文件。你需要提供一个适配的配置文件。

需要提供的文件如下图所示：

在这里插入图片描述

例如，假设你用的是类似 gemma-2-2b 的模型结构进行的训练，可以把它的相关的配置文件复制过来。路径一般类似于~/.cache/huggingface/hub/models--google--gemma-2-2b/snapshots/c5ebcd40d208330abc697524c919956e692655cf这样。请注意，原始hf下载下来的snapshots文件下面都是软连接（指向blobs中实际的存储文件，各个文件名字一般都是hash值），你无法直接复制文件，需要手动创建，然后从这里的内容复制粘贴过去。

在这里插入图片描述

示例配置文件：

{"architectures": ["Gemma2ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"attn_logit_softcapping": 50.0,"bos_token_id": 2,"cache_implementation": "hybrid","eos_token_id": 1,"final_logit_softcapping": 30.0,"head_dim": 256,"hidden_act": "gelu_pytorch_tanh","hidden_activation": "gelu_pytorch_tanh","hidden_size": 2304,"initializer_range": 0.02,"intermediate_size": 9216,"max_position_embeddings": 8192,"model_type": "gemma2","num_attention_heads": 8,"num_hidden_layers": 26,"num_key_value_heads": 4,"pad_token_id": 0,"query_pre_attn_scalar": 256,"rms_norm_eps": 1e-06,"rope_theta": 10000.0,"sliding_window": 4096,"torch_dtype": "float32","transformers_version": "4.42.4","use_cache": true,"vocab_size": 256000
}

将该文件命名为 config.json 并放在与 pytorch_model.bin 同一目录下。

(3) 使用 Hugging Face Transformers 加载模型

你可以使用以下代码加载合并后的模型：

from transformers import AutoModelForCausalLM, AutoTokenizer# 指定模型路径
model_path = "~/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152"# 加载模型
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")# 加载分词器（确保与模型对应）
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b")# 生成测试
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

当然，如果读者像我一样使用别的evaluation的框架，比如olmes，直接把MODEL_NAME换成自己的checkpoint路径即可：

export CUDA_VISIBLE_DEVICES="4,5"MODEL_NAME=/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b_backup/step_1152
# meta-llama/Llama-3.1-8B BATCH_SIZE=4echo "Running the command and measuring execution time..."TASK_NAME_02=ifeval::tulu
OUTPUT_DIR_02=eval-gemma-2-2b-ifeval::tulumkdir -p $OUTPUT_DIR_02{ time olmes \--model $MODEL_NAME  \--task $TASK_NAME_02 \--batch-size $BATCH_SIZE \--output-dir $OUTPUT_DIR_02 \--model-args '{"max_length": 4096}'; } 2>&1 | tee $OUTPUT_DIR_02/time_output.log

这样就可以eval自己本地的deepspeed保存的checkpoint啦。

4. 其他注意事项

如果你使用的是 DeepSpeed Zero Stage 3 分布式优化器，评估时无需再使用 DeepSpeed，只需加载合并后的权重即可。

总结

使用 zero_to_fp32.py 合并权重文件。
提供对应的模型配置文件（如 config.json）。
使用 Hugging Face 的 from_pretrained 方法加载模型。
确保分词器和模型对应。

完成这些步骤后，就可以像加载 Hugging Face 模型一样评估你的模型性能了。

文件下有多个bin文件，会加载哪个？

问GPT4o：

我现在已经合并文件为pytorh_model.bin，然后把tokenizer的配置已经放好了，但是我的疑问是这里有两个bin文件，一个是pytorh_model.bin，另一个是scheduler.bin，像加载 Hugging Face 模型一样评估你的模型性能的时候，它是如何确定加载哪一个bin文件呢？

在评估时，Hugging Face 的 from_pretrained 方法只会加载模型的主要权重文件（例如 pytorch_model.bin 或其他明确指定的文件），而不会加载优化器调度器文件（如 scheduler.bin）。

以下是具体细节以及如何确保正确加载：

1. Hugging Face 默认加载逻辑

当你调用
```
from_pretrained
```
时，Hugging Face 会在指定路径中寻找一个模型权重文件。默认情况下，它会寻找以下文件：
- pytorch_model.bin 或类似的文件名（例如 model.bin）。
- 一个配置文件（如 config.json）。
如果有多个 .bin 文件，只有名称明确为主权重文件（如 pytorch_model.bin）的文件会被加载。其他文件（如 scheduler.bin）会被忽略，因为它们不属于模型权重的一部分，而是训练过程中调度器的状态文件。

2. `scheduler.bin` 的作用

scheduler.bin 保存的是优化器调度器的状态（如学习率调度等），用于在训练中断后继续训练时恢复进度。
在推理或评估时，不需要加载 scheduler.bin，它仅用于训练时的恢复（checkpoint 恢复）。

3. 确保正确加载权重

你只需要确保以下几点：

已将分布式权重合并为 pytorch_model.bin。
提供 config.json 文件，定义模型架构。
提供正确的分词器配置（如 tokenizer.json 和 tokenizer_config.json）。
在加载时明确指定路径。

代码示例如下：

from transformers import AutoModelForCausalLM, AutoTokenizer# 指定模型路径
model_path = "./sft_gemma_2b_backup/step_1152"# 加载模型
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")# 加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_path)# 测试推理
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

这里，from_pretrained 方法会自动读取 pytorch_model.bin 和 config.json 文件，忽略其他文件（如 scheduler.bin）。

4. 多个 `.bin` 文件的场景

如果目录中有多个 .bin 文件，但你明确知道 pytorch_model.bin 是主权重文件，你不需要担心 Hugging Face 会加载错误的文件。它会按照以下优先级处理：

首先查找标准命名的模型文件（如 pytorch_model.bin）。
加载与 config.json 文件匹配的模型权重。

对于 scheduler.bin，因为它并不是模型结构的一部分，Hugging Face 不会尝试加载它。

总结

scheduler.bin 是训练相关的辅助文件，评估和推理时不会被加载。
确保 pytorch_model.bin 和 config.json 在同一目录即可正确加载。
使用 from_pretrained 方法时，Hugging Face 会自动忽略与评估无关的文件。

如果命名不是pytorch_model.bin，而是命名为其他的文件呢？

继续问：

如果不是pytorch_model.bin，而是命名为其他的文件呢？比如gemma2_combined.bin

如果你的模型权重文件不是命名为 pytorch_model.bin，而是像 gemma2_combined.bin 这样的自定义文件名，Hugging Face 的 from_pretrained 方法不会自动找到这个文件。你需要通过参数显式指定模型权重文件的路径。

以下是如何加载自定义命名权重文件的方法：

1. 解决方法：显式指定权重文件路径

使用 from_pretrained 的 state_dict 参数显式加载自定义命名的权重文件。

示例代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch# 指定模型路径
model_path = "./sft_gemma_2b_backup/step_1152"
custom_weight_file = f"{model_path}/gemma2_combined.bin"# 加载配置文件和分词器
config = model_path  # 自动寻找 config.json
tokenizer = AutoTokenizer.from_pretrained(model_path)# 加载自定义命名的权重文件
state_dict = torch.load(custom_weight_file, map_location="cpu")
model = AutoModelForCausalLM.from_pretrained(config, state_dict=state_dict, torch_dtype="auto")# 测试推理
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

关键点：

torch.load(custom_weight_file, map_location="cpu") 手动加载权重。
将权重通过 state_dict 参数传递给 from_pretrained 方法。

2. 替代方法：重命名权重文件

如果你希望继续使用 Hugging Face 的自动加载机制，可以简单地将文件重命名为 pytorch_model.bin，并放置在相同路径下。这样就无需手动加载 state_dict。

命令示例：

mv ./sft_gemma_2b_backup/step_1152/gemma2_combined.bin ./sft_gemma_2b_backup/step_1152/pytorch_model.bin

然后直接运行代码：

from transformers import AutoModelForCausalLM, AutoTokenizermodel_path = "./sft_gemma_2b_backup/step_1152"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

3. 常见错误及解决方案

错误 1：`KeyError: 'Unexpected key(s) in state_dict'`

原因：

模型的权重文件和配置文件（config.json）不匹配，例如：权重是 LoRA 或其他自定义微调方法生成的。

解决方法：确保 config.json 定义的模型结构与权重文件一致。如果使用了自定义微调方法（如 LoRA），需要加载对应的库或方法。

错误 2：`OSError: File not found`

原因：

from_pretrained 无法找到模型权重文件。

解决方法：确保路径正确，或者通过 state_dict 明确加载文件。

总结

如果权重文件命名为其他名称（如 gemma2_combined.bin），需要显式加载权重文件到 state_dict，并传递给 from_pretrained 方法。
为了方便，可以直接重命名权重文件为 pytorch_model.bin，并确保其位于 config.json 的同一目录下。

后记

2025年1月1日12点57分于上海，在GPT4o大模型辅助下完成。

构与权重文件一致。如果使用了自定义微调方法（如 LoRA），需要加载对应的库或方法。

错误 2：`OSError: File not found`

原因：

from_pretrained 无法找到模型权重文件。

解决方法：确保路径正确，或者通过 state_dict 明确加载文件。

总结

如果权重文件命名为其他名称（如 gemma2_combined.bin），需要显式加载权重文件到 state_dict，并传递给 from_pretrained 方法。
为了方便，可以直接重命名权重文件为 pytorch_model.bin，并确保其位于 config.json 的同一目录下。