微调实战 - 使用 Unsloth 微调 QwQ 32B 4bit （单卡4090）

本文参考视频教程：赋范课堂 – 只需20G显存，QwQ-32B高效微调实战！4大微调工具精讲！知识灌注+问答风格微调，DeepSeek R1类推理模型微调+Cot数据集创建实战打造定制大模型！
https://www.bilibili.com/video/BV1YoQoYQEwF/
课件资料：https://kq4b3vgg5b.feishu.cn/wiki/LxI9wmuFmiaLCkkoiCIcKvOan7Q
在此之上有删改

赋范课堂有非常好的课程，推荐大家去学习观看

文章目录

- 一、基本准备
- - 1、安装unsloth
  - 2、wandb 安装与注册
  - 3、下载模型
  - - 安装 huggingface_hub
    - 使用screen开启持久化会话
    - 设置模型国内访问镜像
    - 下载模型
    - 修改模型默认下载地址
- 二、模型调用测试
- - modelscope 调用
  - Ollama 调用
  - vLLM 调用
  - - 请求测试
- 三、下载微调数据集
- - 下载 NuminaMath CoT 数据集
  - 下载 medical-o1-reasoning-SFT数据集
- 四、加载模型
- 五、微调前测试
- - 基本问答测试
  - 复杂问题测试
  - 原始模型的医疗问题问答
- 六、最小可行性实验
- - 定义提示词
  - 定义数据集处理函数
  - 整理数据
  - 开启微调
  - 微调说明
  - - 相关库
    - 模型微调 **参数解析**
    - - ① `SFTTrainer` 部分
      - ② `TrainingArguments` 部分
  - 设置 wandb、开始微调
  - 查看效果
  - 模型合并
  - 保存为 GGUF
- 七、完整高效微调实验
- - 测试

一、基本准备

1、安装unsloth

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

2、wandb 安装与注册

wandb 类似于 tensorboard，但比它稳定

注册和使用，详见：https://blog.csdn.net/lovechris00/article/details/146437418

安装库

pip install wandb

wandb login

3、下载模型

https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit

安装 huggingface_hub

pip install huggingface_hub

使用screen开启持久化会话

模型下载时间可能持续0.5-1个小时。避免因为关闭会话导致下载中断

安装 screen

sudo apt install screen

screen -S qwq

设置模型国内访问镜像

Linux 上 ~/.bashrc 添加环境变量

export HF_ENDPOINT='https://hf-mirror.com'

下载模型

huggingface-cli download --resume-download  unsloth/QwQ-32B-unsloth-bnb-4bit

修改模型默认下载地址

模型默认下载到 ~/.cache/huggingface/hub/，如果想改到其它地方，可以设置 HF_HOME 键

export HF_HOME="/root/xx/HF_download"

二、模型调用测试

modelscope 调用

from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name = "unsloth/QwQ-32B-unsloth-bnb-4bit"model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)prompt = "你好，好久不见！"
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs,max_new_tokens=32768
)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Ollama 调用

from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama',  # required but ignored
)prompt = "你好，好久不见！"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(messages=messages,model='qwq-32b-bnb',
)print(response.choices[0].message.content)

模型注册

查看是否注册成功

ollama list

使用 openai 库请求

from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama',  # required but ignored
)prompt = "你好，好久不见！"
messages = [{"role": "user", "content": prompt}
]

vLLM 调用

vllm serve /root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit \
--quantization bitsandbytes \
--load-format bitsandbytes \
--max-model-len 2048

请求测试

from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)prompt = "你好，好久不见！"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(model="/root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit",messages=messages,
)print(response.choices[0].message.content)

三、下载微调数据集

推理类模型回复结构与微调数据集结构要求

QwQ-32B模型和DeepSeek R1类似，推理过程的具体体现就是在回复内容中，会同时包含推理部分内容和最终回复部分内容，并且其推理部分内容会通过（一种在模型训练过程中注入的特殊标记）来进行区分。

下载 NuminaMath CoT 数据集

https://huggingface.co/datasets/AI-MO/NuminaMath-CoT

huggingface-cli download AI-MO/NuminaMath-CoT --repo-type dataset

除了NuminaMath CoT数据集外，还有APPs（编程数据集）、TACO（编程数据集）、long_form_thought_data_5k（通用问答数据集）等，都是CoT数据集，均可用于推理模型微调。相关数据集介绍，详见公开课：《借助DeepSeek R1进行模型蒸馏，模型蒸馏入门实战！》| https://www.bilibili.com/video/BV1X1FoeBEgW/

下载 medical-o1-reasoning-SFT数据集

https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT

huggingface-cli download FreedomIntelligence/medical-o1-reasoning-SFT --repo-type dataset

你也可以使用 Python - datasets 库来下载

from datasets import load_dataset# 此处先下载前500条数据即可完成实验
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)# 查看数据集情况
dataset[0]

四、加载模型

from unsloth import FastLanguageModel max_seq_length = 2048 
dtype = None 
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)

此时消耗 GPU : 22016MB

五、微调前测试

查看模型信息

>>> model
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 5120, padding_idx=151654)(layers): ModuleList((0): Qwen2DecoderLayer(...(62): Qwen2DecoderLayer(...)(63): Qwen2DecoderLayer(...)(norm): Qwen2RMSNorm((5120,), eps=1e-05)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=5120, out_features=152064, bias=False)
)

tokenizer 信息

>>> tokenizer
Qwen2TokenizerFast(name_or_path='unsloth/QwQ-32B-unsloth-bnb-4bit', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),...151667: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151668: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)

基本问答测试

# 将模型调整为推理模式
FastLanguageModel.for_inference(model)  # 带入问答模板进行回答 prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
***
### Instruction:
你是一名助人为乐的助手。
***
### Question:
{}
***
### Response:
<think>{}"""question = "你好，好久不见！"
prompt = [prompt_style_chat.format(question, "")] inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=2048,use_cache=True,
)# GPU 消耗到 22412 mb 
'''
>>> outputs
tensor([[ 14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949, 105051,...35946, 106128,  99245, 101037,  11319, 144236, 151645]],device='cuda:0')
'''response = tokenizer.batch_decode(outputs)
# response --> ['请写出一个恰当的回答来完成当前对话任务。\n***\n### Instruction:\n你是一名助人为乐的助手。\n***\n### Question:\n你好，好久不见！\n***\n### Response:\n<think>:\n好的，用户发来问候“你好，好久不见！”，我需要回应并延续对话。首先，应该友好回应他们的问候，比如“你好！确实很久没联系了，希望你一切都好！”这样既回应了对方，也表达了关心。接下来，可能需要询问对方近况，或者引导对话继续下去。比如可以问：“最近有什么新鲜事吗？或者你有什么需要帮助的吗？”这样可以让对话更自然，也符合助人为乐的角色设定。还要注意语气要亲切，保持口语化，避免过于正式。另外，用户可能希望得到情感上的回应，所以需要体现出关心和愿意帮助的态度。检查有没有语法错误，确保句子流畅。最后，确定回应简洁但足够友好，符合对话的流程。\n</think>\n\n你好！确实好久不见了，希望你一切都好！最近有什么新鲜事分享，或者需要我帮忙什么吗？😊<|im_end|>']print(response[0].split("### Response:")[1])

复杂问题测试

question = "请证明根号2是无理数。"inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,
)# GPU 用到 22552MiBresponse = tokenizer.batch_decode(outputs)print(response[0].split("### Response:")[1])

原始模型的医疗问题问答

# 重新设置问答模板 
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>{}"""question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs1 = model.generate(input_ids=inputs1.input_ids,max_new_tokens=1200,use_cache=True,
)response1 = tokenizer.batch_decode(outputs1)print(response1[0].split("### Response:")[1])

inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs2 = model.generate(input_ids=inputs2.input_ids,max_new_tokens=1200,use_cache=True,
)
# GPU 22842 MiB response2 = tokenizer.batch_decode(outputs2)print(response2[0].split("### Response:")[1])

六、最小可行性实验

接下来我们尝试进行模型微调

对于当前数据集而言，我们可以带入原始数据集的部分数据进行微调，也可以带入全部数据并遍历多次进行微调。

对于大多数的微调实验，我们都可以从最小可行性实验入手进行微调，也就是先尝试带入少量数据进行微调，并观测微调效果。

若微调可以顺利执行，并能够获得微调效果，再考虑带入更多的数据进行更大规模微调。

定义提示词

import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'

定义数据集处理函数

用于对medical-o1-reasoning-SFT数据集进行修改，Complex_CoT 列和 Response 列进行拼接，并加上文本结束标记：

def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}

整理数据

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)  
''' 
{'Question': 'A 61-year-old ... contractions?','Complex_CoT': "Okay, let's ... incontinence.",'Response': 'Cystometry in ... the test.' 
}
'''# 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) # 查看  
dataset["text"][0]
'''
Below is an instruction that ... response.
***
### Instruction:
You are a medical ... medical question. 
***
### Question:
A 61-year-old woman ... contractions?
***
### Response:
<think>
Okay,...Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
</think>
Cystometry ... is primarily related to physical e
'''

开启微调

model = FastLanguageModel.get_peft_model(model,r=16,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)

微调说明

这段代码主要是用 SFTTrainer 进行 监督微调（Supervised Fine-Tuning, SFT），适用于 transformers 和 Unsloth 生态中的模型微调：

模型微调参数解析

① `SFTTrainer` 部分

参数	作用
`model=model`	指定需要进行微调的预训练模型
`tokenizer=tokenizer`	指定分词器，用于处理文本数据
`train_dataset=dataset`	传入训练数据集
`dataset_text_field="text"`	指定数据集中哪一列包含训练文本（在 `formatting_prompts_func` 里处理）
`max_seq_length=max_seq_length`	最大序列长度，控制输入文本的最大 Token 数量
`dataset_num_proc=2`	数据加载的并行进程数，提高数据预处理效率

② `TrainingArguments` 部分

参数	作用
`per_device_train_batch_size=2`	每个 GPU/设备的训练批量大小（较小值适合大模型）
`gradient_accumulation_steps=4`	梯度累积步数（相当于 `batch_size=2 × 4 = 8`）
`warmup_steps=5`	预热步数（初始阶段学习率较低，然后逐步升高）
`max_steps=60`	最大训练步数（控制训练的总步数，此处总共约消耗60*8=480条数据）
`learning_rate=2e-4`	学习率（`2e-4` = 0.0002，控制权重更新幅度）
`fp16=not is_bfloat16_supported()`	如果 GPU 不支持 `bfloat16`，则使用 `fp16`（16位浮点数）
`bf16=is_bfloat16_supported()`	如果 GPU 支持 `bfloat16`，则启用 `bfloat16`（训练更稳定）
`logging_steps=10`	每 10 步记录一次训练日志
`optim="adamw_8bit"`	使用 `adamw_8bit`（8-bit AdamW优化器）减少显存占用
`weight_decay=0.01`	权重衰减（L2 正则化），防止过拟合
`lr_scheduler_type="linear"`	学习率调度策略（线性衰减）
`seed=3407`	随机种子（保证实验结果可复现）
`output_dir="outputs"`	训练结果的输出目录

设置 wandb、开始微调

import wandb
wandb.login(key="8c7...242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()

如果出现 CUDA out of memory 的情况，可以酌情修改参数。

试试如下代码(仅用于测试，不保证效果)：

import torch
torch.cuda.empty_cache()import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" from unsloth import FastLanguageModel max_seq_length = 1024
dtype = None 
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # '<|im_end|>'  def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:200]",trust_remote_code=True)  # 结构化处理 
dataset = dataset.map(formatting_prompts_func, batched = True,) # 开启微调 
model = FastLanguageModel.get_peft_model(model,r=8,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=8,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=1,gradient_accumulation_steps=8,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=20,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)import wandb
wandb.login(key="8c7b98e4f525793b228b04fcc3596acd9e7242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()

查看效果

unsloth在微调结束后，会自动更新模型权重（在缓存中），因此无需手动合并模型权重即可直接调用微调后的模型：

trainer_stats
# TrainOutput(global_step=60, training_loss=1.3152311007181803, metrics={'train_runtime': 709.9004, 'train_samples_per_second': 0.676, 'train_steps_per_second': 0.085, 'total_flos': 6.676294205826048e+16, 'train_loss': 1.3152311007181803})# 到推理状态 
FastLanguageModel.for_inference(model)# 再次查看问答效果 
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])inputs = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

模型合并

save_path = 'QwQ-Medical-COT-Tiny'
model.save_pretrained_merged(save_path, tokenizer, save_method = "merged_4bit",)

保存为 GGUF

方便使用ollama进行推理

导出与合并需要较长时间（约20分钟左右）

save_path = 'QwQ-Medical-COT-Tiny-GGUF'
model.save_pretrained_gguf(save_path, tokenizer, quantization_method = "q4_k_m")

七、完整高效微调实验

最后，带入全部数据进行高效微调，以提升模型微调效果。

# 设置训练的提示词模板 
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKENdef formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}# 读取全部数据 
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]# 加载模型 
model = FastLanguageModel.get_peft_model(model,r=16,  target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0,  bias="none",  use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False,  loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 设置epoch为3，遍历3次数据集：
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,num_train_epochs = 3,warmup_steps=5,# max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)# Map (num_proc=2):   0%| | 0/25371 [00:00<?, ? examples/s] trainer_stats = trainer.train()

[ 389/9513 13:44 < 5:24:01, 0.47 it/s, Epoch 0.12/3]

Step	Training Loss
10	1.285900
20	1.262500
…	…
370	1.201200
380	1.215600

这里总共训练约15个小时。

trainer_stats

TrainOutput(global_step=9513, training_loss=1.0824475168592858, metrics={'train_runtime': 20193.217, 'train_samples_per_second': 3.769, 'train_steps_per_second': 0.471, 'total_flos': 2.7936033274397737e+18, 'train_loss': 1.0824475168592858, 'epoch': 2.9992117294655527})

测试

带入两个问题进行测试，均有较好的回答效果：

question = "A 61-year-old ... contractions?"FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

question = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])