本文参考视频教程:赋范课堂 – 只需20G显存,QwQ-32B高效微调实战!4大微调工具精讲!知识灌注+问答风格微调,DeepSeek R1类推理模型微调+Cot数据集创建实战打造定制大模型!
https://www.bilibili.com/video/BV1YoQoYQEwF/
课件资料:https://kq4b3vgg5b.feishu.cn/wiki/LxI9wmuFmiaLCkkoiCIcKvOan7Q
在此之上有删改
赋范课堂 有非常好的课程,推荐大家去学习观看
文章目录
- 一、基本准备
- 1、安装unsloth
- 2、wandb 安装与注册
- 3、下载模型
- 安装 huggingface_hub
- 使用screen开启持久化会话
- 设置模型国内访问镜像
- 下载模型
- 修改模型默认下载地址
- 二、模型调用测试
- modelscope 调用
- Ollama 调用
- vLLM 调用
- 请求测试
- 三、下载微调数据集
- 下载 NuminaMath CoT 数据集
- 下载 medical-o1-reasoning-SFT数据集
- 四、加载模型
- 五、微调前测试
- 基本问答测试
- 复杂问题测试
- 原始模型的医疗问题问答
- 六、最小可行性实验
- 定义提示词
- 定义数据集处理函数
- 整理数据
- 开启微调
- 微调说明
- 相关库
- 模型微调 **参数解析**
- ① `SFTTrainer` 部分
- ② `TrainingArguments` 部分
- 设置 wandb、开始微调
- 查看效果
- 模型合并
- 保存为 GGUF
- 七、完整高效微调实验
- 测试
一、基本准备
1、安装unsloth
pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
2、wandb 安装与注册
wandb 类似于 tensorboard,但比它稳定
注册:https://wandb.ai/site
API Key : https://wandb.ai/ezcode/t0322?product=models
注册和使用,详见:https://blog.csdn.net/lovechris00/article/details/146437418
安装 库
pip install wandb
登录,输入 API key
wandb login
3、下载模型
https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit
安装 huggingface_hub
pip install huggingface_hub
使用screen开启持久化会话
模型下载时间可能持续0.5-1个小时。避免因为关闭会话导致下载中断
安装 screen
sudo apt install screen
screen -S qwq
设置模型国内访问镜像
Linux 上 ~/.bashrc
添加环境变量
export HF_ENDPOINT='https://hf-mirror.com'
下载模型
huggingface-cli download --resume-download unsloth/QwQ-32B-unsloth-bnb-4bit
修改模型默认下载地址
模型默认下载到 ~/.cache/huggingface/hub/
,如果想改到其它地方,可以设置 HF_HOME
键
export HF_HOME="/root/xx/HF_download"
二、模型调用测试
modelscope 调用
from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name = "unsloth/QwQ-32B-unsloth-bnb-4bit"model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs,max_new_tokens=32768
)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Ollama 调用
from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama', # required but ignored
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(messages=messages,model='qwq-32b-bnb',
)print(response.choices[0].message.content)
模型注册
查看是否注册成功
ollama list
使用 openai 库请求
from openai import OpenAIclient = OpenAI(base_url='http://localhost:11434/v1/',api_key='ollama', # required but ignored
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]
vLLM 调用
vllm serve /root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit \
--quantization bitsandbytes \
--load-format bitsandbytes \
--max-model-len 2048
请求测试
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)prompt = "你好,好久不见!"
messages = [{"role": "user", "content": prompt}
]response = client.chat.completions.create(model="/root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit",messages=messages,
)print(response.choices[0].message.content)
三、下载微调数据集
推理类模型 回复结构 与 微调数据集结构 要求
QwQ-32B模型和DeepSeek R1类似,推理过程的具体体现就是 在回复内容中,会同时包含推理部分内容 和 最终回复部分内容,并且其推理部分内容会通过(一种在模型训练过程中注入的特殊标记)来进行区分。
下载 NuminaMath CoT 数据集
https://huggingface.co/datasets/AI-MO/NuminaMath-CoT
huggingface-cli download AI-MO/NuminaMath-CoT --repo-type dataset
除了NuminaMath CoT数据集外,还有APPs(编程数据集)、TACO(编程数据集)、long_form_thought_data_5k(通用问答数据集)等,都是CoT数据集,均可用于推理模型微调。相关数据集介绍,详见公开课:《借助DeepSeek R1进行模型蒸馏,模型蒸馏入门实战!》| https://www.bilibili.com/video/BV1X1FoeBEgW/
下载 medical-o1-reasoning-SFT数据集
https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT
huggingface-cli download FreedomIntelligence/medical-o1-reasoning-SFT --repo-type dataset
你也可以 使用 Python - datasets 库来下载
from datasets import load_dataset# 此处先下载前500条数据即可完成实验
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)# 查看数据集情况
dataset[0]
四、加载模型
from unsloth import FastLanguageModel max_seq_length = 2048
dtype = None
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)
此时消耗 GPU : 22016MB
五、微调前测试
查看模型信息
>>> model
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 5120, padding_idx=151654)(layers): ModuleList((0): Qwen2DecoderLayer(...(62): Qwen2DecoderLayer(...)(63): Qwen2DecoderLayer(...)(norm): Qwen2RMSNorm((5120,), eps=1e-05)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=5120, out_features=152064, bias=False)
)
tokenizer 信息
>>> tokenizer
Qwen2TokenizerFast(name_or_path='unsloth/QwQ-32B-unsloth-bnb-4bit', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|vision_pad|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),...151667: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151668: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)
基本问答测试
# 将模型调整为推理模式
FastLanguageModel.for_inference(model) # 带入问答模板进行回答 prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。
***
### Instruction:
你是一名助人为乐的助手。
***
### Question:
{}
***
### Response:
<think>{}"""question = "你好,好久不见!"
prompt = [prompt_style_chat.format(question, "")] inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=2048,use_cache=True,
)# GPU 消耗到 22412 mb
'''
>>> outputs
tensor([[ 14880, 112672, 46944, 112449, 111423, 36407, 60548, 67949, 105051,...35946, 106128, 99245, 101037, 11319, 144236, 151645]],device='cuda:0')
'''response = tokenizer.batch_decode(outputs)
# response --> ['请写出一个恰当的回答来完成当前对话任务。\n***\n### Instruction:\n你是一名助人为乐的助手。\n***\n### Question:\n你好,好久不见!\n***\n### Response:\n<think>:\n好的,用户发来问候“你好,好久不见!”,我需要回应并延续对话。首先,应该友好回应他们的问候,比如“你好!确实很久没联系了,希望你一切都好!”这样既回应了对方,也表达了关心。接下来,可能需要询问对方近况,或者引导对话继续下去。比如可以问:“最近有什么新鲜事吗?或者你有什么需要帮助的吗?”这样可以让对话更自然,也符合助人为乐的角色设定。还要注意语气要亲切,保持口语化,避免过于正式。另外,用户可能希望得到情感上的回应,所以需要体现出关心和愿意帮助的态度。检查有没有语法错误,确保句子流畅。最后,确定回应简洁但足够友好,符合对话的流程。\n</think>\n\n你好!确实好久不见了,希望你一切都好!最近有什么新鲜事分享,或者需要我帮忙什么吗?😊<|im_end|>']print(response[0].split("### Response:")[1])
复杂问题测试
question = "请证明根号2是无理数。"inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,max_new_tokens=1200,use_cache=True,
)# GPU 用到 22552MiBresponse = tokenizer.batch_decode(outputs)print(response[0].split("### Response:")[1])
原始模型的医疗问题问答
# 重新设置问答模板
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
<think>{}"""question_1 = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"question_2 = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"inputs1 = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs1 = model.generate(input_ids=inputs1.input_ids,max_new_tokens=1200,use_cache=True,
)response1 = tokenizer.batch_decode(outputs1)print(response1[0].split("### Response:")[1])
inputs2 = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs2 = model.generate(input_ids=inputs2.input_ids,max_new_tokens=1200,use_cache=True,
)
# GPU 22842 MiB response2 = tokenizer.batch_decode(outputs2)print(response2[0].split("### Response:")[1])
六、最小可行性实验
接下来我们尝试进行模型微调
对于当前数据集而言,我们可以带入 原始数据集 的部分数据 进行微调,也可以带入 全部数据 并遍历多次进行微调。
对于大多数的微调实验,我们都可以从 最小可行性实验 入手进行微调,也就是先尝试带入少量数据进行微调,并观测微调效果。
若微调可以顺利执行,并能够获得微调效果,再考虑带入更多的数据进行更大规模微调。
定义提示词
import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token # '<|im_end|>'
定义数据集处理函数
用于对medical-o1-reasoning-SFT数据集进行修改,Complex_CoT
列 和 Response
列 进行拼接,并加上文本结束标记:
def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}
整理数据
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
'''
{'Question': 'A 61-year-old ... contractions?','Complex_CoT': "Okay, let's ... incontinence.",'Response': 'Cystometry in ... the test.'
}
'''# 结构化处理
dataset = dataset.map(formatting_prompts_func, batched = True,) # 查看
dataset["text"][0]
'''
Below is an instruction that ... response.
***
### Instruction:
You are a medical ... medical question.
***
### Question:
A 61-year-old woman ... contractions?
***
### Response:
<think>
Okay,...Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
</think>
Cystometry ... is primarily related to physical e
'''
开启微调
model = FastLanguageModel.get_peft_model(model,r=16, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False, loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)
微调说明
这段代码主要是用 SFTTrainer
进行 监督微调(Supervised Fine-Tuning, SFT),适用于 transformers
和 Unsloth
生态中的模型微调:
相关库
SFTTrainer
(来自trl
库):trl
(Transformer Reinforcement Learning)是 Hugging Face 旗下的trl
库,提供 监督微调(SFT) 和 强化学习(RLHF) 相关的功能。SFTTrainer
主要用于 有监督微调(Supervised Fine-Tuning),适用于LoRA
等低秩适配微调方式。
TrainingArguments
(来自transformers
库):- 这个类用于定义 训练超参数,比如批量大小、学习率、优化器、训练步数等。
is_bfloat16_supported()
(来自unsloth
):- 这个函数检查 当前 GPU 是否支持
bfloat16
(BF16),如果支持,则返回True
,否则返回False
。 bfloat16
是一种更高效的数值格式,在 新款 NVIDIA A100/H100 等 GPU 上表现更优。
- 这个函数检查 当前 GPU 是否支持
模型微调 参数解析
① SFTTrainer
部分
参数 | 作用 |
---|---|
model=model | 指定需要进行微调的 预训练模型 |
tokenizer=tokenizer | 指定 分词器,用于处理文本数据 |
train_dataset=dataset | 传入 训练数据集 |
dataset_text_field="text" | 指定数据集中哪一列包含 训练文本(在 formatting_prompts_func 里处理) |
max_seq_length=max_seq_length | 最大序列长度,控制输入文本的最大 Token 数量 |
dataset_num_proc=2 | 数据加载的并行进程数,提高数据预处理效率 |
② TrainingArguments
部分
参数 | 作用 |
---|---|
per_device_train_batch_size=2 | 每个 GPU/设备 的训练批量大小(较小值适合大模型) |
gradient_accumulation_steps=4 | 梯度累积步数(相当于 batch_size=2 × 4 = 8 ) |
warmup_steps=5 | 预热步数(初始阶段学习率较低,然后逐步升高) |
max_steps=60 | 最大训练步数(控制训练的总步数,此处总共约消耗60*8=480条数据) |
learning_rate=2e-4 | 学习率(2e-4 = 0.0002,控制权重更新幅度) |
fp16=not is_bfloat16_supported() | 如果 GPU 不支持 bfloat16 ,则使用 fp16 (16位浮点数) |
bf16=is_bfloat16_supported() | 如果 GPU 支持 bfloat16 ,则启用 bfloat16 (训练更稳定) |
logging_steps=10 | 每 10 步记录一次训练日志 |
optim="adamw_8bit" | 使用 adamw_8bit (8-bit AdamW优化器)减少显存占用 |
weight_decay=0.01 | 权重衰减(L2 正则化),防止过拟合 |
lr_scheduler_type="linear" | 学习率调度策略(线性衰减) |
seed=3407 | 随机种子(保证实验结果可复现) |
output_dir="outputs" | 训练结果的输出目录 |
设置 wandb、开始微调
import wandb
wandb.login(key="8c7...242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()
如果 出现 CUDA out of memory
的情况,可以酌情修改参数。
试试如下代码(仅用于测试,不保证效果):
import torch
torch.cuda.empty_cache()import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" from unsloth import FastLanguageModel max_seq_length = 1024
dtype = None
load_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/QwQ-32B-unsloth-bnb-4bit",max_seq_length = max_seq_length,dtype = dtype,load_in_4bit = load_in_4bit,
)import os
from datasets import load_datasettrain_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token # '<|im_end|>' def formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:200]",trust_remote_code=True) # 结构化处理
dataset = dataset.map(formatting_prompts_func, batched = True,) # 开启微调
model = FastLanguageModel.get_peft_model(model,r=8, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=8,lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False, loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length, dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=1,gradient_accumulation_steps=8,# Use num_train_epochs = 1, warmup_ratio for full training runs!warmup_steps=5,max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=20,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)import wandb
wandb.login(key="8c7b98e4f525793b228b04fcc3596acd9e7242bd")
run = wandb.init(project='Fine-tune-QwQ-32B-4bit on Medical COT Dataset', )# 开始微调
trainer_stats = trainer.train()
查看效果
unsloth在微调结束后,会自动更新模型权重(在缓存中),因此无需手动合并模型权重 即可直接调用微调后的模型:
trainer_stats
# TrainOutput(global_step=60, training_loss=1.3152311007181803, metrics={'train_runtime': 709.9004, 'train_samples_per_second': 0.676, 'train_steps_per_second': 0.085, 'total_flos': 6.676294205826048e+16, 'train_loss': 1.3152311007181803})# 到推理状态
FastLanguageModel.for_inference(model)# 再次查看问答效果
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])inputs = tokenizer([prompt_style.format(question_2, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=2048,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])
模型合并
save_path = 'QwQ-Medical-COT-Tiny'
model.save_pretrained_merged(save_path, tokenizer, save_method = "merged_4bit",)
保存为 GGUF
方便使用ollama进行推理
导出与合并需要较长时间(约20分钟左右)
save_path = 'QwQ-Medical-COT-Tiny-GGUF'
model.save_pretrained_gguf(save_path, tokenizer, quantization_method = "q4_k_m")
七、完整高效微调实验
最后,带入全部数据进行高效微调,以提升模型微调效果。
# 设置训练的提示词模板
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
<think>
{}
</think>
{}"""EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKENdef formatting_prompts_func(examples):inputs = examples["Question"]cots = examples["Complex_CoT"]outputs = examples["Response"]texts = []for input, cot, output in zip(inputs, cots, outputs):text = train_prompt_style.format(input, cot, output) + EOS_TOKENtexts.append(text)return {"text": texts,}# 读取全部数据
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]# 加载模型
model = FastLanguageModel.get_peft_model(model,r=16, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj",],lora_alpha=16,lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long contextrandom_state=3407,use_rslora=False, loftq_config=None,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 设置epoch为3,遍历3次数据集:
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=2,gradient_accumulation_steps=4,num_train_epochs = 3,warmup_steps=5,# max_steps=60,learning_rate=2e-4,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),logging_steps=10,optim="adamw_8bit",weight_decay=0.01,lr_scheduler_type="linear",seed=3407,output_dir="outputs",),
)# Map (num_proc=2): 0%| | 0/25371 [00:00<?, ? examples/s] trainer_stats = trainer.train()
[ 389/9513 13:44 < 5:24:01, 0.47 it/s, Epoch 0.12/3]
Step | Training Loss |
---|---|
10 | 1.285900 |
20 | 1.262500 |
… | … |
370 | 1.201200 |
380 | 1.215600 |
这里总共训练约15个小时。
trainer_stats
TrainOutput(global_step=9513, training_loss=1.0824475168592858, metrics={'train_runtime': 20193.217, 'train_samples_per_second': 3.769, 'train_steps_per_second': 0.471, 'total_flos': 2.7936033274397737e+18, 'train_loss': 1.0824475168592858, 'epoch': 2.9992117294655527})
测试
带入两个问题进行测试,均有较好的回答效果:
question = "A 61-year-old ... contractions?"FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])
question = "Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?"FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")outputs = model.generate(input_ids=inputs.input_ids,attention_mask=inputs.attention_mask,max_new_tokens=1200,use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])
2025-03-22(六)