问题:微调的各种问题 · THUDM/ChatGLM3 · Discussion #253 · GitHubshi
Traceback (most recent call last):File "/opt/projects/chatglm3-test/scripts/finetune.py", line 171, in <module>main()File "/opt/projects/chatglm3-test/scripts/finetune.py", line 137, in mainprint(train_dataset[0]['input_ids'])File "/opt/projects/chatglm3-test/scripts/preprocess_utils.py", line 127, in __getitem__a_ids = self.tokenizer.encode(text=data_item['prompt'], add_special_tokens=True, truncation=True, KeyError: 'prompt'
实际上看看preprocess_utils.py对应行数的代码就知道了,单纯的对话模型,数据格式不是按照官方给定的如下格式:
```json
[
{
"conversations": [
{
"role": "system",
"content": "<system prompt text>"
},
{
"role": "user",
"content": "<user prompt text>"
},
{
"role": "assistant",
"content": "<assistant response text>"
},
// ... Muti Turn
{
"role": "user",
"content": "<user prompt text>"
},
{
"role": "assistant",
"content": "<assistant response text>"
}
]
}
// ...
]
```
“prompt”键名并不存在,最新的官方微调脚本已于,改天尝试一下。