创网科技/seo综合查询工具有什么功能

创网科技,seo综合查询工具有什么功能,招聘运营专员,科技袁人袁岚峰1. 背景介绍 之前我们在《大模型训练/微调的一些经验分享》、《利用DeepSeek-R1数据微调蒸馏ChatGLM32B让大模型具备思考能力》中做了相关模型微调的介绍。目前在基座大模型能力还没有达到足够牛的情况下,大模型微调在商业化、垂直领域应用依然是不可或缺&#xff0…

1. 背景介绍

        之前我们在《大模型训练/微调的一些经验分享》、《利用DeepSeek-R1数据微调蒸馏ChatGLM32B让大模型具备思考能力》中做了相关模型微调的介绍。目前在基座大模型能力还没有达到足够牛的情况下,大模型微调在商业化、垂直领域应用依然是不可或缺,即使是使用DeepSeek-R1、QwQ-32B也难以保证商业应用的要求。

        今天我们将介绍利用ms-swift框架对近期开源的阿里QwQ-32B推理模型进行lora微调的实验。

2. 模型微调方案

        ms-swift是modelscope提供的大模型与多模态大模型微调部署框架,支持语言大模型与多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中语言大模型包括:Qwen2.5、GLM4、Llama3.3、DeepSeek-R1等模型,多模态大模型包括:Qwen2.5-VL、Qwen2-Audio、Llama3.2-Vision、Llava、GLM4v、Xcomposer2.5、DeepSeek-VL2等模型。

        QwQ-32B模型参数大小约为62G,如果选择全参微调,保守估计需要20倍的显存资源,也就是需要1240G显存,即使8卡的A800集群,大概在640G显存,也不足以做微调,因此为了落地方便,选择Lora微调方式。最终我们利用4卡-A800完成Lora的训练。

       题外话:ms-swift的文档不全,这点做的不是特别好。

3. 模型微调实验

3.1 微调数据格式

        格式如下所示,但我们微调使用的是带推理过程的数据,这个需要注意一下。关于think的语料可以参考 《利用DeepSeek-R1数据微调蒸馏ChatGLM32B让大模型具备思考能力》。

{"messages": [{"role": "system", "content": "你是一个乐于助人且知识渊博的AI助手。"},{"role": "user", "content": "请解释AlphaGo的基本原理。"}, {"role": "assistant", "content": "AlphaGo结合深度神经网络与树搜索算法,通过棋局数据进行训练,逐步提升棋力,能够判断围棋局势并选择最佳着法。"}]}{"messages": [{"role": "system", "content": "你是一个乐于助人且知识渊博的AI助手。"},{"role": "user", "content": "列举一些健康饮食的重要原则。"}, {"role": "assistant", "content": "健康饮食的重要原则包括:均衡饮食,合理摄入各类营养素;多吃新鲜蔬果,适量饮水;少吃糖、油、盐,忌吃过量快餐;养成规律生活作息习惯,适量运动。"}]} {"messages": [{"role": "system", "content": "你是一个乐于助人且知识渊博的AI助手。"},{"role": "user", "content": "给出一些改善睡眠质量的建议。"}, {"role": "assistant", "content": "改善睡眠质量的建议包括:保证睡眠时间,避免睡眠不足;睡前放松,避免刺激;养成规律作息时间,不要频繁熬夜;适量运动,但不要运动过度;睡前可以喝一杯热牛奶等温和饮料。"}]}

3.2 训练脚本

NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \--model /data/QwQ-32B \--train_type lora \--dataset '/data/qwq32b_sft_lora/rl-v0312.jsonl' \--torch_dtype bfloat16 \--num_train_epochs 5 \--per_device_train_batch_size 1 \--per_device_eval_batch_size 1 \--learning_rate 1e-4 \--lora_rank 8 \--lora_alpha 32 \--target_modules all-linear \--gradient_accumulation_steps 8 \--eval_steps 50 \--save_steps 50 \--save_total_limit 5 \--logging_steps 5 \--max_length 8192 \--output_dir /data/qwq32b_sft_lora/output \--warmup_ratio 0.05 \--dataloader_num_workers 4 \--model_author swift \--model_name swift-robot \--deepspeed zero3

3.3 训练日志

        从训练日志可以清晰看到,整个微调阶段的loss逐步收敛。另外框架会输出最佳的模型checkpoint模型参数。

[2025-03-11 19:28:37,083] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 4
[2025-03-11 19:28:37,084] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 4
[2025-03-11 19:28:37,092] [INFO] [config.py:734:__init__] Config mesh_device None world_size = 4
[2025-03-11 19:28:37,401] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 771, num_elems = 32.76B
Loading checkpoint shards: 100%|██████████| 14/14 [00:16<00:00,  1.17s/it]
Loading checkpoint shards: 100%|██████████| 14/14 [00:16<00:00,  1.17s/it]
Loading checkpoint shards: 100%|██████████| 14/14 [00:16<00:00,  1.17s/it]
Loading checkpoint shards: 100%|██████████| 14/14 [00:17<00:00,  1.22s/it]
[INFO:swift] model_info: ModelInfo(model_type='qwq', model_dir='/data/QwQ-32B', torch_dtype=torch.bfloat16, max_model_len=131072, quant_method=None, quant_bits=None, rope_scaling=None, config=Qwen2Config {
  "_name_or_path": "/data/QwQ-32B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27648,
  "max_position_embeddings": 131072,
  "max_window_layers": 64,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}
, task_type='causal_lm', num_labels=None)
[INFO:swift] model.generation_config: GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": [
    151645,
    151643
  ],
  "max_new_tokens": 64,
  "pad_token_id": 151643
}

[INFO:swift] default_system: None
[INFO:swift] The TrainArguments will be saved in: /data/qwq32b_sft_lora/output/v9-20250311-192834/args.json
[INFO:swift] Start time of running main: 2025-03-11 19:28:54.707260
Map: 100%|██████████| 2645/2645 [00:00<00:00, 8426.21 examples/s] 
Map: 100%|██████████| 2645/2645 [00:00<00:00, 7697.28 examples/s] 
Map: 100%|██████████| 2645/2645 [00:00<00:00, 6463.52 examples/s] 
Map:   0%|          | 0/2619 [00:00<?, ? examples/s][INFO:swift] create tmp_dir: /.cache/modelscope/hub/tmp/hf_datasets-i15lb3_o
Map: 100%|██████████| 2645/2645 [00:00<00:00, 9980.89 examples/s] 
[INFO:swift] train_dataset: Dataset({
    features: ['messages'],
    num_rows: 2619
})
[INFO:swift] val_dataset: Dataset({
    features: ['messages'],
    num_rows: 26
    
    
[INFO:swift] [INPUT] <|im_start|>system
## 角色
你是一名AI客服,你很专业、友善、礼貌,擅长为客户提供导购服务。

## 目标
结合历史对话和用户当前问题判断对话场景(如:商品推荐、商品咨询、拒答范围、闲聊),针对不同场景作亲和的“原始回复”,然后根据<红线>内容,对”原始回复“作详细的安全审查改写,优化后直接输出。

## 红线
- 不提供任何无关的信息
- 不提供任何电话号码和网站链接
- 不提供任何知识库中不存在的商品信息

## 限制
- 避免谈论竞品、具体价格或优惠、商品链接、商品维修、订单、售后、物流。
- 注意区分商品型号,若商品型号指代不明,在回复中需主动向用户确认。
- 涉及商品信息时,你只能使用<知识库>中的内容回复。
- 当<知识库>为空或无法回答问题时,需要礼貌的婉拒回答。
- 在婉拒回答时,必须带这句话“建议您咨询官方商城的人工客服或访问官网”。
- 不要主动解释专业词汇。
- 不要暴露任何思考过程,直接回答。
- 回答中不要暴露“原始回复”,直接输出安全审查改写后的内容。

## 建议
- 介绍商品时推荐使用markdown格式
- 善用追问去引导客户把诉求表达清楚,但历史对话已经表达清楚的诉求不要在追问。
<|im_end|>
<|im_start|>user
这款KD-85X85K和85x90L有什么区别?<|im_end|>
<|im_start|>assistant
X85K采用X1芯片,而X90L搭载XR认知芯片,能在画面整体优化、焦点主体强化和音画同步方面更出色。X90L通过多维度数据分析匹配画质数据库,重点优化用户关注区域。如果您追求更好画质建议选X90L,预算有限的话X85K也不错。<|im_end|>
<|im_start|>user
尺寸一样大吗?<|im_end|>
<|im_start|>assistant
<think>
好的,用户问的是“尺寸一样大吗?”,我需要先回顾历史对话。之前的对话中,用户询问了KD-85X85K和85X90L的区别,客服提到两者的芯片不同以及画质优化方面的差异。现在用户进一步询问尺寸是否相同。

首先,检查知识库中的商品信息。KD-85X85K的型号中“85”代表85英寸,而XR-85X90L同样有“85”在型号中,所以尺寸应该都是85英寸。确认知识库中两者的描述,确实都是85寸的电视。因此,用户的问题应该回答尺寸相同,都是85英寸。同时,根据红线规则,不能提供无关信息,所以不需要涉及其他方面,比如价格或功能差异,除非用户进一步询问。另外,确保回复中没有链接或电话号码,符合安全审查的要求。因此,直接确认两者尺寸相同,并保持礼貌和简洁。
</think>

两款电视的尺寸相同,都是85英寸大屏设计哦~ 如果您还有其他疑问,欢迎随时告诉我。建议您咨询官方商城的人工客服或访问官网了解更多信息。<|im_end|>

[INFO:swift] lora_config: LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='/data/QwQ-32B', revision=None, inference_mode=False, r=8, target_modules={'k_proj', 'down_proj', 'q_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj'}, exclude_modules=None, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=[], init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, eva_config=None, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False, lora_dtype=None, lorap_lr_ratio=None, lorap_emb_lr=1e-06)
[INFO:swift] model: PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(152064, 5120)
        (layers): ModuleList(
          (0-63): 64 x Qwen2DecoderLayer(
            (self_attn): Qwen2Attention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=5120, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=5120, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=1024, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=1024, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (o_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=5120, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=5120, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
            )
            (mlp): Qwen2MLP(
              (gate_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=27648, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=27648, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (up_proj): lora.Linear(
                (base_layer): Linear(in_features=5120, out_features=27648, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=5120, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=27648, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (down_proj): lora.Linear(
                (base_layer): Linear(in_features=27648, out_features=5120, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=27648, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=5120, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): Qwen2RMSNorm((0,), eps=1e-05)
            (post_attention_layernorm): Qwen2RMSNorm((0,), eps=1e-05)
          )
        )
        (norm): Qwen2RMSNorm((0,), eps=1e-05)
        (rotary_emb): Qwen2RotaryEmbedding()
      )
      (lm_head): Linear(in_features=5120, out_features=152064, bias=False)
    )
  )
)
[INFO:swift] model_parameter_info: PeftModelForCausalLM: 32830.9852M Params (67.1089M Trainable [0.2044%]), 0.0001M Buffers.

Parameter Offload: Total persistent parameters: 25760768 in 1025 params
{'loss': 1.32348752, 'token_acc': 0.70985222, 'grad_norm': 0.80846994, 'learning_rate': 4.76e-06, 'memory(GiB)': 60.01, 'train_speed(iter/s)': 0.01743, 'epoch': 0.01, 'global_step/max_steps': '1/405', 'percentage': '0.25%', 'elapsed_time': '53s', 'remaining_time': '6h 2m 10s'}
Train:   0%|          | 2/405 [01:46<5:56:12, 53.03s/it][2025-03-11 19:32:17,225] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 1.24938524, 'token_acc': 0.69148486, 'grad_norm': 0.86987531, 'learning_rate': 2.381e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.018535, 'epoch': 0.06, 'global_step/max_steps': '5/405', 'percentage': '1.23%', 'elapsed_time': '4m 26s', 'remaining_time': '5h 54m 54s'}
{'loss': 1.22446156, 'token_acc': 0.69278702, 'grad_norm': 0.77689102, 'learning_rate': 4.762e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019271, 'epoch': 0.12, 'global_step/max_steps': '10/405', 'percentage': '2.47%', 'elapsed_time': '8m 35s', 'remaining_time': '5h 39m 15s'}
{'loss': 1.13267899, 'token_acc': 0.71570596, 'grad_norm': 0.40197327, 'learning_rate': 7.143e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.01927, 'epoch': 0.18, 'global_step/max_steps': '15/405', 'percentage': '3.70%', 'elapsed_time': '12m 54s', 'remaining_time': '5h 35m 45s'}
{'loss': 0.97332687, 'token_acc': 0.72897148, 'grad_norm': 0.34967286, 'learning_rate': 9.524e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019607, 'epoch': 0.24, 'global_step/max_steps': '20/405', 'percentage': '4.94%', 'elapsed_time': '16m 56s', 'remaining_time': '5h 26m 7s'}
{'loss': 0.95233335, 'token_acc': 0.71795399, 'grad_norm': 0.32512059, 'learning_rate': 9.997e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.01971, 'epoch': 0.31, 'global_step/max_steps': '25/405', 'percentage': '6.17%', 'elapsed_time': '21m 4s', 'remaining_time': '5h 20m 24s'}
{'loss': 0.92778244, 'token_acc': 0.72106543, 'grad_norm': 0.22549374, 'learning_rate': 9.986e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019805, 'epoch': 0.37, 'global_step/max_steps': '30/405', 'percentage': '7.41%', 'elapsed_time': '25m 11s', 'remaining_time': '5h 14m 49s'}
{'loss': 0.91093416, 'token_acc': 0.73585944, 'grad_norm': 0.21213417, 'learning_rate': 9.967e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019825, 'epoch': 0.43, 'global_step/max_steps': '35/405', 'percentage': '8.64%', 'elapsed_time': '29m 21s', 'remaining_time': '5h 10m 25s'}
{'loss': 0.86407394, 'token_acc': 0.73746765, 'grad_norm': 0.22134356, 'learning_rate': 9.94e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019853, 'epoch': 0.49, 'global_step/max_steps': '40/405', 'percentage': '9.88%', 'elapsed_time': '33m 31s', 'remaining_time': '5h 5m 52s'}
{'loss': 0.86335802, 'token_acc': 0.73666894, 'grad_norm': 0.236291, 'learning_rate': 9.904e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019929, 'epoch': 0.55, 'global_step/max_steps': '45/405', 'percentage': '11.11%', 'elapsed_time': '37m 34s', 'remaining_time': '5h 0m 35s'}
{'loss': 0.81436214, 'token_acc': 0.76214197, 'grad_norm': 0.19902774, 'learning_rate': 9.86e-05, 'memory(GiB)': 74.21, 'train_speed(iter/s)': 0.019918, 'epoch': 0.61, 'global_step/max_steps': '50/405', 'percentage': '12.35%', 'elapsed_time': '41m 46s', 'remaining_time': '4h 56m 37s'}
Train:  12%|█▏        | 50/405 [41:46<5:01:46, 51.00s/it]
{'eval_loss': 0.82470703, 'eval_token_acc': 0.75927635, 'eval_runtime': 15.7907, 'eval_samples_per_second': 1.647, 'eval_steps_per_second': 0.443, 'epoch': 0.61, 'global_step/max_steps': '50/405', 'percentage': '12.35%', 'elapsed_time': '42m 2s', 'remaining_time': '4h 58m 30s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.87s/it]00s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-50
***********************************************
[2025-03-11 20:12:35,271] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.84376278, 'token_acc': 0.74929837, 'grad_norm': 0.29243814, 'learning_rate': 9.808e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019739, 'epoch': 0.67, 'global_step/max_steps': '55/405', 'percentage': '13.58%', 'elapsed_time': '46m 22s', 'remaining_time': '4h 55m 8s'}
{'loss': 0.82531147, 'token_acc': 0.75041408, 'grad_norm': 0.29134859, 'learning_rate': 9.748e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019703, 'epoch': 0.73, 'global_step/max_steps': '60/405', 'percentage': '14.81%', 'elapsed_time': '50m 41s', 'remaining_time': '4h 51m 29s'}
{'loss': 0.8170001, 'token_acc': 0.75919308, 'grad_norm': 0.24516849, 'learning_rate': 9.68e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019724, 'epoch': 0.8, 'global_step/max_steps': '65/405', 'percentage': '16.05%', 'elapsed_time': '54m 51s', 'remaining_time': '4h 46m 59s'}
{'loss': 0.81388254, 'token_acc': 0.75490298, 'grad_norm': 0.28124103, 'learning_rate': 9.604e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019781, 'epoch': 0.86, 'global_step/max_steps': '70/405', 'percentage': '17.28%', 'elapsed_time': '58m 55s', 'remaining_time': '4h 41m 58s'}
{'loss': 0.81019135, 'token_acc': 0.74177519, 'grad_norm': 0.28694744, 'learning_rate': 9.52e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019784, 'epoch': 0.92, 'global_step/max_steps': '75/405', 'percentage': '18.52%', 'elapsed_time': '1h 3m 7s', 'remaining_time': '4h 37m 44s'}
{'loss': 0.76696019, 'token_acc': 0.7639197, 'grad_norm': 0.311834, 'learning_rate': 9.429e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019813, 'epoch': 0.98, 'global_step/max_steps': '80/405', 'percentage': '19.75%', 'elapsed_time': '1h 7m 14s', 'remaining_time': '4h 33m 8s'}
{'loss': 0.76195569, 'token_acc': 0.76973895, 'grad_norm': 0.43021317, 'learning_rate': 9.33e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019856, 'epoch': 1.04, 'global_step/max_steps': '85/405', 'percentage': '20.99%', 'elapsed_time': '1h 11m 17s', 'remaining_time': '4h 28m 22s'}
{'loss': 0.7821136, 'token_acc': 0.74735605, 'grad_norm': 0.41759374, 'learning_rate': 9.224e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019888, 'epoch': 1.1, 'global_step/max_steps': '90/405', 'percentage': '22.22%', 'elapsed_time': '1h 15m 21s', 'remaining_time': '4h 23m 45s'}
Train:  23%|██▎       | 92/405 [1:17:03<4:18:27, 49.54s/it][2025-03-11 20:46:52,504] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.74946299, 'token_acc': 0.76573743, 'grad_norm': 0.31465808, 'learning_rate': 9.111e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019875, 'epoch': 1.16, 'global_step/max_steps': '95/405', 'percentage': '23.46%', 'elapsed_time': '1h 19m 36s', 'remaining_time': '4h 19m 45s'}
{'loss': 0.75774355, 'token_acc': 0.76279737, 'grad_norm': 0.34568468, 'learning_rate': 8.992e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019874, 'epoch': 1.22, 'global_step/max_steps': '100/405', 'percentage': '24.69%', 'elapsed_time': '1h 23m 48s', 'remaining_time': '4h 15m 35s'}
Train:  25%|██▍       | 100/405 [1:23:48<4:17:26, 50.64s/it]
{'eval_loss': 0.720375, 'eval_token_acc': 0.77822903, 'eval_runtime': 15.6988, 'eval_samples_per_second': 1.656, 'eval_steps_per_second': 0.446, 'epoch': 1.22, 'global_step/max_steps': '100/405', 'percentage': '24.69%', 'elapsed_time': '1h 24m 3s', 'remaining_time': '4h 16m 23s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.86s/it]50.64s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-100
**********************************************
{'loss': 0.72672591, 'token_acc': 0.76866752, 'grad_norm': 0.64908534, 'learning_rate': 8.865e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019801, 'epoch': 1.28, 'global_step/max_steps': '105/405', 'percentage': '25.93%', 'elapsed_time': '1h 28m 19s', 'remaining_time': '4h 12m 20s'}
{'loss': 0.72024732, 'token_acc': 0.76941662, 'grad_norm': 0.36116413, 'learning_rate': 8.732e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019815, 'epoch': 1.34, 'global_step/max_steps': '110/405', 'percentage': '27.16%', 'elapsed_time': '1h 32m 27s', 'remaining_time': '4h 7m 58s'}
{'loss': 0.68267331, 'token_acc': 0.7761134, 'grad_norm': 0.38293342, 'learning_rate': 8.593e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019806, 'epoch': 1.4, 'global_step/max_steps': '115/405', 'percentage': '28.40%', 'elapsed_time': '1h 36m 42s', 'remaining_time': '4h 3m 52s'}
{'loss': 0.71170344, 'token_acc': 0.78053525, 'grad_norm': 0.3713337, 'learning_rate': 8.448e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019831, 'epoch': 1.46, 'global_step/max_steps': '120/405', 'percentage': '29.63%', 'elapsed_time': '1h 40m 47s', 'remaining_time': '3h 59m 22s'}
{'loss': 0.70673256, 'token_acc': 0.77159011, 'grad_norm': 0.36822507, 'learning_rate': 8.297e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019845, 'epoch': 1.53, 'global_step/max_steps': '125/405', 'percentage': '30.86%', 'elapsed_time': '1h 44m 55s', 'remaining_time': '3h 55m 1s'}
{'loss': 0.67356033, 'token_acc': 0.7921583, 'grad_norm': 0.4612934, 'learning_rate': 8.14e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019851, 'epoch': 1.59, 'global_step/max_steps': '130/405', 'percentage': '32.10%', 'elapsed_time': '1h 49m 5s', 'remaining_time': '3h 50m 45s'}
Train:  33%|███▎      | 132/405 [1:50:48<3:50:21, 50.63s/it][2025-03-11 21:20:37,710] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.68124514, 'token_acc': 0.78771819, 'grad_norm': 0.46047566, 'learning_rate': 7.978e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019853, 'epoch': 1.65, 'global_step/max_steps': '135/405', 'percentage': '33.33%', 'elapsed_time': '1h 53m 16s', 'remaining_time': '3h 46m 32s'}
{'loss': 0.67308445, 'token_acc': 0.78043745, 'grad_norm': 0.46205863, 'learning_rate': 7.812e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019881, 'epoch': 1.71, 'global_step/max_steps': '140/405', 'percentage': '34.57%', 'elapsed_time': '1h 57m 18s', 'remaining_time': '3h 42m 2s'}
{'loss': 0.65709753, 'token_acc': 0.794716, 'grad_norm': 0.46728156, 'learning_rate': 7.64e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019893, 'epoch': 1.77, 'global_step/max_steps': '145/405', 'percentage': '35.80%', 'elapsed_time': '2h 1m 25s', 'remaining_time': '3h 37m 43s'}
{'loss': 0.66156731, 'token_acc': 0.78602904, 'grad_norm': 0.45510392, 'learning_rate': 7.464e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019899, 'epoch': 1.83, 'global_step/max_steps': '150/405', 'percentage': '37.04%', 'elapsed_time': '2h 5m 34s', 'remaining_time': '3h 33m 28s'}
Train:  37%|███▋      | 150/405 [2:05:34<3:31:14, 49.70s/it]
{'eval_loss': 0.65251857, 'eval_token_acc': 0.79853547, 'eval_runtime': 15.6574, 'eval_samples_per_second': 1.661, 'eval_steps_per_second': 0.447, 'epoch': 1.83, 'global_step/max_steps': '150/405', 'percentage': '37.04%', 'elapsed_time': '2h 5m 50s', 'remaining_time': '3h 33m 55s'}
Val: 100%|██████████| 7/7 [00:12<00:00,  1.86s/it]49.70s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-150
**********************************************
{'loss': 0.65750132, 'token_acc': 0.78596818, 'grad_norm': 0.47214887, 'learning_rate': 7.285e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019847, 'epoch': 1.89, 'global_step/max_steps': '155/405', 'percentage': '38.27%', 'elapsed_time': '2h 10m 6s', 'remaining_time': '3h 29m 50s'}
{'loss': 0.63944697, 'token_acc': 0.80483245, 'grad_norm': 0.49222756, 'learning_rate': 7.101e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019853, 'epoch': 1.95, 'global_step/max_steps': '160/405', 'percentage': '39.51%', 'elapsed_time': '2h 14m 15s', 'remaining_time': '3h 25m 35s'}
{'loss': 0.63674178, 'token_acc': 0.80768833, 'grad_norm': 0.59897131, 'learning_rate': 6.913e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019895, 'epoch': 2.01, 'global_step/max_steps': '165/405', 'percentage': '40.74%', 'elapsed_time': '2h 18m 10s', 'remaining_time': '3h 20m 58s'}
{'loss': 0.64350748, 'token_acc': 0.80203466, 'grad_norm': 0.51221188, 'learning_rate': 6.723e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019886, 'epoch': 2.07, 'global_step/max_steps': '170/405', 'percentage': '41.98%', 'elapsed_time': '2h 22m 25s', 'remaining_time': '3h 16m 52s'}
{'loss': 0.59812784, 'token_acc': 0.80184307, 'grad_norm': 0.52895864, 'learning_rate': 6.53e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019905, 'epoch': 2.13, 'global_step/max_steps': '175/405', 'percentage': '43.21%', 'elapsed_time': '2h 26m 27s', 'remaining_time': '3h 12m 29s'}
{'loss': 0.60168495, 'token_acc': 0.80204451, 'grad_norm': 0.54771068, 'learning_rate': 6.334e-05, 'memory(GiB)': 76.21, 'train_speed(iter/s)': 0.019928, 'epoch': 2.2, 'global_step/max_steps': '180/405', 'percentage': '44.44%', 'elapsed_time': '2h 30m 28s', 'remaining_time': '3h 8m 6s'}
Train:  44%|████▍     | 180/405 [2:30:28<2:59:27, 47.85s/it][2025-03-11 22:00:31,985] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.59545937, 'token_acc': 0.80456827, 'grad_norm': 0.57579227, 'learning_rate': 6.135e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019937, 'epoch': 2.26, 'global_step/max_steps': '185/405', 'percentage': '45.68%', 'elapsed_time': '2h 34m 35s', 'remaining_time': '3h 3m 50s'}
{'loss': 0.59948916, 'token_acc': 0.80121831, 'grad_norm': 0.53543298, 'learning_rate': 5.935e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019937, 'epoch': 2.32, 'global_step/max_steps': '190/405', 'percentage': '46.91%', 'elapsed_time': '2h 38m 46s', 'remaining_time': '2h 59m 39s'}
{'loss': 0.59326115, 'token_acc': 0.79956183, 'grad_norm': 0.55039623, 'learning_rate': 5.734e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01994, 'epoch': 2.38, 'global_step/max_steps': '195/405', 'percentage': '48.15%', 'elapsed_time': '2h 42m 55s', 'remaining_time': '2h 55m 27s'}
{'loss': 0.58592167, 'token_acc': 0.80714245, 'grad_norm': 0.69052059, 'learning_rate': 5.531e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019935, 'epoch': 2.44, 'global_step/max_steps': '200/405', 'percentage': '49.38%', 'elapsed_time': '2h 47m 9s', 'remaining_time': '2h 51m 19s'}
Train:  49%|████▉     | 200/405 [2:47:09<2:49:56, 49.74s/it]
{'eval_loss': 0.56886792, 'eval_token_acc': 0.82167251, 'eval_runtime': 15.7067, 'eval_samples_per_second': 1.655, 'eval_steps_per_second': 0.446, 'epoch': 2.44, 'global_step/max_steps': '200/405', 'percentage': '49.38%', 'elapsed_time': '2h 47m 24s', 'remaining_time': '2h 51m 36s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.86s/it]49.74s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-200
*****************************************
Train:  50%|████▉     | 202/405 [2:49:15<3:08:49, 55.81s/it][2025-03-11 22:19:18,274] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.54969444, 'token_acc': 0.82021995, 'grad_norm': 0.51556883, 'learning_rate': 5.327e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019871, 'epoch': 2.5, 'global_step/max_steps': '205/405', 'percentage': '50.62%', 'elapsed_time': '2h 51m 52s', 'remaining_time': '2h 47m 41s'}
{'loss': 0.52501326, 'token_acc': 0.81536282, 'grad_norm': 0.54576287, 'learning_rate': 5.123e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019883, 'epoch': 2.56, 'global_step/max_steps': '210/405', 'percentage': '51.85%', 'elapsed_time': '2h 55m 58s', 'remaining_time': '2h 43m 24s'}
{'loss': 0.5639473, 'token_acc': 0.8235682, 'grad_norm': 0.51644597, 'learning_rate': 4.918e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019872, 'epoch': 2.62, 'global_step/max_steps': '215/405', 'percentage': '53.09%', 'elapsed_time': '3h 0m 15s', 'remaining_time': '2h 39m 17s'}
Train:  53%|█████▎    | 215/405 [3:00:15<2:40:55, 50.82s/it][2025-03-11 22:30:17,273] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.54539089, 'token_acc': 0.82929161, 'grad_norm': 0.5427966, 'learning_rate': 4.714e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019887, 'epoch': 2.69, 'global_step/max_steps': '220/405', 'percentage': '54.32%', 'elapsed_time': '3h 4m 18s', 'remaining_time': '2h 34m 59s'}
{'loss': 0.54721932, 'token_acc': 0.82292752, 'grad_norm': 0.58632606, 'learning_rate': 4.51e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01989, 'epoch': 2.75, 'global_step/max_steps': '225/405', 'percentage': '55.56%', 'elapsed_time': '3h 8m 28s', 'remaining_time': '2h 30m 46s'}
{'loss': 0.51745701, 'token_acc': 0.82614152, 'grad_norm': 0.51928985, 'learning_rate': 4.307e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019892, 'epoch': 2.81, 'global_step/max_steps': '230/405', 'percentage': '56.79%', 'elapsed_time': '3h 12m 38s', 'remaining_time': '2h 26m 34s'}
{'loss': 0.54157047, 'token_acc': 0.81710944, 'grad_norm': 0.71657186, 'learning_rate': 4.105e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019899, 'epoch': 2.87, 'global_step/max_steps': '235/405', 'percentage': '58.02%', 'elapsed_time': '3h 16m 46s', 'remaining_time': '2h 22m 20s'}
{'loss': 0.54548702, 'token_acc': 0.81284619, 'grad_norm': 0.50686509, 'learning_rate': 3.904e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019907, 'epoch': 2.93, 'global_step/max_steps': '240/405', 'percentage': '59.26%', 'elapsed_time': '3h 20m 52s', 'remaining_time': '2h 18m 6s'}
{'loss': 0.51912632, 'token_acc': 0.83365523, 'grad_norm': 0.68279731, 'learning_rate': 3.706e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019907, 'epoch': 2.99, 'global_step/max_steps': '245/405', 'percentage': '60.49%', 'elapsed_time': '3h 25m 3s', 'remaining_time': '2h 13m 55s'}
{'loss': 0.52836185, 'token_acc': 0.83409461, 'grad_norm': 0.55463023, 'learning_rate': 3.509e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019923, 'epoch': 3.05, 'global_step/max_steps': '250/405', 'percentage': '61.73%', 'elapsed_time': '3h 29m 4s', 'remaining_time': '2h 9m 37s'}
Train:  62%|██████▏   | 250/405 [3:29:04<2:07:39, 49.41s/it]
{'eval_loss': 0.52870411, 'eval_token_acc': 0.83231801, 'eval_runtime': 15.7131, 'eval_samples_per_second': 1.655, 'eval_steps_per_second': 0.445, 'epoch': 3.05, 'global_step/max_steps': '250/405', 'percentage': '61.73%', 'elapsed_time': '3h 29m 20s', 'remaining_time': '2h 9m 47s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.86s/it]49.41s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-250
************************************************
{'loss': 0.51691947, 'token_acc': 0.82422604, 'grad_norm': 0.53855505, 'learning_rate': 3.316e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019896, 'epoch': 3.11, 'global_step/max_steps': '255/405', 'percentage': '62.96%', 'elapsed_time': '3h 33m 33s', 'remaining_time': '2h 5m 37s'}
Train:  63%|██████▎   | 257/405 [3:35:20<2:08:40, 52.17s/it][2025-03-11 23:05:38,683] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.50732822, 'token_acc': 0.83722172, 'grad_norm': 0.7386699, 'learning_rate': 3.124e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019885, 'epoch': 3.17, 'global_step/max_steps': '260/405', 'percentage': '64.20%', 'elapsed_time': '3h 37m 51s', 'remaining_time': '2h 1m 30s'}
{'loss': 0.50304022, 'token_acc': 0.84518402, 'grad_norm': 0.5332581, 'learning_rate': 2.936e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019889, 'epoch': 3.23, 'global_step/max_steps': '265/405', 'percentage': '65.43%', 'elapsed_time': '3h 42m 0s', 'remaining_time': '1h 57m 17s'}
{'loss': 0.5034606, 'token_acc': 0.81697432, 'grad_norm': 0.67853403, 'learning_rate': 2.752e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019886, 'epoch': 3.29, 'global_step/max_steps': '270/405', 'percentage': '66.67%', 'elapsed_time': '3h 46m 13s', 'remaining_time': '1h 53m 6s'}
{'loss': 0.5183465, 'token_acc': 0.83563731, 'grad_norm': 0.6259943, 'learning_rate': 2.571e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019885, 'epoch': 3.35, 'global_step/max_steps': '275/405', 'percentage': '67.90%', 'elapsed_time': '3h 50m 25s', 'remaining_time': '1h 48m 55s'}
{'loss': 0.51731062, 'token_acc': 0.83534514, 'grad_norm': 0.61180005, 'learning_rate': 2.394e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019881, 'epoch': 3.42, 'global_step/max_steps': '280/405', 'percentage': '69.14%', 'elapsed_time': '3h 54m 40s', 'remaining_time': '1h 44m 45s'}
{'loss': 0.48814211, 'token_acc': 0.82283914, 'grad_norm': 0.57190785, 'learning_rate': 2.222e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019883, 'epoch': 3.48, 'global_step/max_steps': '285/405', 'percentage': '70.37%', 'elapsed_time': '3h 58m 50s', 'remaining_time': '1h 40m 33s'}
{'loss': 0.4921607, 'token_acc': 0.82588464, 'grad_norm': 0.52349298, 'learning_rate': 2.054e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019888, 'epoch': 3.54, 'global_step/max_steps': '290/405', 'percentage': '71.60%', 'elapsed_time': '4h 2m 58s', 'remaining_time': '1h 36m 20s'}
{'loss': 0.46711798, 'token_acc': 0.85013139, 'grad_norm': 0.6346718, 'learning_rate': 1.892e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019889, 'epoch': 3.6, 'global_step/max_steps': '295/405', 'percentage': '72.84%', 'elapsed_time': '4h 7m 8s', 'remaining_time': '1h 32m 9s'}
{'loss': 0.48140554, 'token_acc': 0.83738891, 'grad_norm': 0.62962168, 'learning_rate': 1.734e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019894, 'epoch': 3.66, 'global_step/max_steps': '300/405', 'percentage': '74.07%', 'elapsed_time': '4h 11m 16s', 'remaining_time': '1h 27m 56s'}
Train:  74%|███████▍  | 300/405 [4:11:16<1:25:56, 49.11s/it]
{'eval_loss': 0.50453913, 'eval_token_acc': 0.84007138, 'eval_runtime': 15.7353, 'eval_samples_per_second': 1.652, 'eval_steps_per_second': 0.445, 'epoch': 3.66, 'global_step/max_steps': '300/405', 'percentage': '74.07%', 'elapsed_time': '4h 11m 32s', 'remaining_time': '1h 28m 2s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.87s/it]49.11s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-300
********************************************
{'loss': 0.48724985, 'token_acc': 0.83901735, 'grad_norm': 0.59317845, 'learning_rate': 1.582e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019877, 'epoch': 3.72, 'global_step/max_steps': '305/405', 'percentage': '75.31%', 'elapsed_time': '4h 15m 40s', 'remaining_time': '1h 23m 49s'}
{'loss': 0.47789598, 'token_acc': 0.85550931, 'grad_norm': 0.67272985, 'learning_rate': 1.436e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019881, 'epoch': 3.78, 'global_step/max_steps': '310/405', 'percentage': '76.54%', 'elapsed_time': '4h 19m 49s', 'remaining_time': '1h 19m 37s'}
{'loss': 0.49318271, 'token_acc': 0.81770707, 'grad_norm': 0.64257801, 'learning_rate': 1.295e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01989, 'epoch': 3.84, 'global_step/max_steps': '315/405', 'percentage': '77.78%', 'elapsed_time': '4h 23m 53s', 'remaining_time': '1h 15m 23s'}
Train:  78%|███████▊  | 316/405 [4:24:43<1:14:10, 50.00s/it][2025-03-11 23:54:59,684] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.46136761, 'token_acc': 0.8458454, 'grad_norm': 0.62953055, 'learning_rate': 1.161e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01989, 'epoch': 3.91, 'global_step/max_steps': '320/405', 'percentage': '79.01%', 'elapsed_time': '4h 28m 5s', 'remaining_time': '1h 11m 12s'}
{'loss': 0.4856822, 'token_acc': 0.83825816, 'grad_norm': 0.64470125, 'learning_rate': 1.033e-05, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019893, 'epoch': 3.97, 'global_step/max_steps': '325/405', 'percentage': '80.25%', 'elapsed_time': '4h 32m 13s', 'remaining_time': '1h 7m 0s'}
{'loss': 0.46592345, 'token_acc': 0.84528571, 'grad_norm': 0.65905805, 'learning_rate': 9.12e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019917, 'epoch': 4.02, 'global_step/max_steps': '330/405', 'percentage': '81.48%', 'elapsed_time': '4h 36m 5s', 'remaining_time': '1h 2m 44s'}
{'loss': 0.48042569, 'token_acc': 0.85237186, 'grad_norm': 0.61635281, 'learning_rate': 7.98e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019923, 'epoch': 4.09, 'global_step/max_steps': '335/405', 'percentage': '82.72%', 'elapsed_time': '4h 40m 11s', 'remaining_time': '58m 32s'}
{'loss': 0.45569935, 'token_acc': 0.83371485, 'grad_norm': 0.64527875, 'learning_rate': 6.9e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01993, 'epoch': 4.15, 'global_step/max_steps': '340/405', 'percentage': '83.95%', 'elapsed_time': '4h 44m 16s', 'remaining_time': '54m 20s'}
{'loss': 0.46417255, 'token_acc': 0.84960884, 'grad_norm': 0.67313113, 'learning_rate': 5.9e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019931, 'epoch': 4.21, 'global_step/max_steps': '345/405', 'percentage': '85.19%', 'elapsed_time': '4h 48m 26s', 'remaining_time': '50m 9s'}
{'loss': 0.47292795, 'token_acc': 0.85013211, 'grad_norm': 0.59537749, 'learning_rate': 4.98e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.019936, 'epoch': 4.27, 'global_step/max_steps': '350/405', 'percentage': '86.42%', 'elapsed_time': '4h 52m 32s', 'remaining_time': '45m 58s'}
Train:  86%|████████▋ | 350/405 [4:52:32<45:19, 49.44s/it]
{'eval_loss': 0.490695, 'eval_token_acc': 0.84296351, 'eval_runtime': 15.6909, 'eval_samples_per_second': 1.657, 'eval_steps_per_second': 0.446, 'epoch': 4.27, 'global_step/max_steps': '350/405', 'percentage': '86.42%', 'elapsed_time': '4h 52m 48s', 'remaining_time': '46m 0s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.86s/it].44s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-350
*****************************************
Train:  87%|████████▋ | 352/405 [4:54:35<48:02, 54.39s/it][2025-03-12 00:25:04,775] [WARNING] [stage3.py:2139:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.46881456, 'token_acc': 0.83740075, 'grad_norm': 0.59338625, 'learning_rate': 4.13e-06, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01991, 'epoch': 4.33, 'global_step/max_steps': '355/405', 'percentage': '87.65%', 'elapsed_time': '4h 57m 6s', 'remaining_time': '41m 50s'}
{'eval_loss': 0.48915866, 'eval_token_acc': 0.84357886, 'eval_runtime': 15.8494, 'eval_samples_per_second': 1.64, 'eval_steps_per_second': 0.442, 'epoch': 4.88, 'global_step/max_steps': '400/405', 'percentage': '98.77%', 'elapsed_time': '5h 34m 38s', 'remaining_time': '4m 10s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.88s/it].31s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-400
{'loss': 0.49458728, 'token_acc': 0.83115697, 'grad_norm': 0.65133526, 'learning_rate': 0.0, 'memory(GiB)': 76.86, 'train_speed(iter/s)': 0.01992, 'epoch': 4.94, 'global_step/max_steps': '405/405', 'percentage': '100.00%', 'elapsed_time': '5h 38m 48s', 'remaining_time': '0s'}
Train: 100%|██████████| 405/405 [5:38:48<00:00, 50.98s/it]
{'eval_loss': 0.4893617, 'eval_token_acc': 0.84308658, 'eval_runtime': 15.9508, 'eval_samples_per_second': 1.63, 'eval_steps_per_second': 0.439, 'epoch': 4.94, 'global_step/max_steps': '405/405', 'percentage': '100.00%', 'elapsed_time': '5h 39m 4s', 'remaining_time': '0s'}
Val: 100%|██████████| 7/7 [00:13<00:00,  1.90s/it].98s/it]
[INFO:swift] Saving model checkpoint to /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-405
{'train_runtime': 20349.5179, 'train_samples_per_second': 0.642, 'train_steps_per_second': 0.02, 'train_loss': 0.63218051, 'epoch': 4.94, 'global_step/max_steps': '405/405', 'percentage': '100.00%', 'elapsed_time': '5h 39m 9s', 'remaining_time': '0s'}
Train: 100%|██████████| 405/405 [5:39:09<00:00, 50.25s/it]
[INFO:swift] last_model_checkpoint: /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-405
[INFO:swift] best_model_checkpoint: /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-400

4. 模型部署及推理脚本

我们采用多卡部署,并且自定义服务端口:

RAY_memory_monitor_refresh_ms=0
CUDA_VISIBLE_DEVICES=0,1 swift deploy \--ckpt_dir /data/qwq32b_sft_lora/output/v9-20250311-192834/checkpoint-400 \--infer_backend vllm \--max_new_tokens 2048 \--tensor_parallel_size 2 \--port 8011

推理脚本:

from openai import OpenAIopenai_api_key = "EMPTY"
openai_api_base = "http://ip:8011/v1"client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)chat_response = client.chat.completions.create(model="QwQ-32B",messages=[{"role": "system", "content": "你是一款客户机器人,帮助客户解决问题"}, {"role": "user", "content": "问一下这款手机现在附带什么配件"}, {"role": "assistant", "content": "附件内容:锂离子电池组 NP-FW50,电源适配器AC-UUD12 ,Micro USB 连接线,肩带,镜头盖,热靴盖,遮光罩,使用说明书,保修卡"}, {"role": "user", "content": "售后和质保是什么标准"}],temperature=0.7,top_p=0.8,max_tokens=2048,extra_body={"repetition_penalty": 1.05,},
)
print("Chat response:", chat_response)

5. 参考材料

【1】https://github.com/modelscope/ms-swift

【2】推理和部署

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/73538.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Linux驱动开发-①pinctrl 和 gpio 子系统②并发和竞争③内核定时器

Linux驱动开发-①pinctrl 和 gpio 子系统②并发和竞争③内核定时器 一&#xff0c;pinctrl 和 gpio 子系统1.pinctrl子系统2.GPIO子系统 二&#xff0c;并发和竞争1.原子操作2.自旋锁3.信号量4.互斥体 三&#xff0c;按键实验四&#xff0c;内核定时器1.关于定时器的有关概念1.…

数据库的高阶知识

目录 一、case when二、几种常见的嵌套查询2.1 比较运算符2.2 ANY/ALL 关键词2.3 in 关键词2.4 EXISTS关键词2.5 in和exists的异同点 三、开窗函数 数据库的基本知识 数据库的高阶知识 一、case when 在实际工作中&#xff0c;经常会涉及以下两类问题&#xff1a; 数据的映射…

MapReduce处理数据流程

&#xff08;一&#xff09;Shuffle MapReduce中的Shuffle过程指的是在Map方法执行后、Reduce方法执行前对数据进行分区排序的阶段 &#xff08;二&#xff09;处理流程 1. 首先MapReduce会将处理的数据集划分成多个split&#xff0c;split划分是逻辑上进行划分&#xff0c;…

OrioleDB: 新一代PostgreSQL存储引擎

PostgreSQL 12 引入了可插拔式的表存储方法接口&#xff0c;允许为不同的表选择不同的存储机制&#xff0c;例如用于 OLTP 操作的堆表&#xff08;HEAP、默认&#xff09;、用于 OLAP 操作的列式表&#xff08;Citus&#xff09;&#xff0c;以及用于超快速搜索处理的内存表。 …

电脑自动关机故障维修案例分享

电脑基本配置&#xff1a; C P U: AMD A10 9700 内存&#xff1a;8G 硬盘&#xff1a;金邦512G固态硬盘 主板&#xff1a;华硕 A320M-F 显卡&#xff1a;集成&#xff08;核心显卡&#xff09; 操作系统&#xff1a;Win10专业版 故障描述&#xff1a; 使用一段时间会黑屏…

栈(LIFO)算法题

1.删除字符串中所有相邻的重复字符 注意&#xff0c;我们需要重复处理&#xff0c;而不是处理一次相邻的相同元素就结束了。对示例来说&#xff0c;如果只进行一次处理&#xff0c;结果为aaca&#xff0c;但是处理之后又出现了相邻的重复元素&#xff0c;我们还得继续处理&…

conda的基本使用及pycharm里设置conda环境

创建conda环境 conda create --name your_env_name python3.8 把your_env_name换成实际的conda环境名称&#xff0c;python后边的根据自己的需要&#xff0c;选择python的版本。 激活conda环境 conda activate your_env_name 安装相关的包、库 conda install package_name …

synchronized与 Java内置锁(未写完)

文章目录 一、 synchronized 关键字二、Java对象结构1. 对象头2. 对象体3. 对齐字节4. 对象头中的字段长度5. Mark Word 的结构信息6. 使用 JOL 工具查看对象的布局 三、Java 内置锁机制3.1 内置锁的演进过程1. 无锁状态2. 偏向锁状态3. 轻量级锁状态4. 重量级锁状态 一、 sync…

LLM(3): Transformer 架构

Transformer 架构是当前大语言模型的主力架构和基础技术&#xff0c;本文以通俗易懂的方式&#xff0c;对此作简要介绍。 1.4 介绍 Transformer 架构 大多数现代的大规模语言模型&#xff08;LLMs&#xff09;依赖于 Transformer 架构&#xff0c;这是一种在 2017 年的论文《…

88.HarmonyOS NEXT 性能监控与调试指南:构建高性能应用

温馨提示&#xff1a;本篇博客的详细代码已发布到 git : https://gitcode.com/nutpi/HarmonyosNext 可以下载运行哦&#xff01; HarmonyOS NEXT 性能监控与调试指南&#xff1a;构建高性能应用 文章目录 HarmonyOS NEXT 性能监控与调试指南&#xff1a;构建高性能应用1. 性能监…

012---状态机的基本知识

1. 摘要 文章为学习记录。主要介绍状态机概述、状态转移图、状态编码、状态机写法、状态机代码示例。 2. 状态机概述 状态机 &#xff08;Finite State Machine&#xff09;&#xff0c;也称为同步有限状态机&#xff0c;用于描述有先后顺序或时序规律的事情。 “同步”&…

deepseek+kimi做ppt教程记录

1.首先注册deepseek和kimi deepseek官网&#xff1a;https://chat.deepseek.com/ kimi官网&#xff1a;https://kimi.moonshot.cn/ 以下以一篇工作总结报告为例 2.使用deepseek生成ppt大纲 让deepseek生成kimi生成ppt所需要的内容时&#xff0c;需要注意提示词内容&#xff0c;…

【Linux内核系列】:文件系统收尾以及软硬链接详解

&#x1f525; 本文专栏&#xff1a;Linux &#x1f338;作者主页&#xff1a;努力努力再努力wz &#x1f4aa; 今日博客励志语录&#xff1a; 世界上只有一种个人英雄主义&#xff0c;那么就是面对生活的种种失败却依然热爱着生活 内容回顾 那么在之前的学习中&#xff0c;我们…

最新版Chrome浏览器加载ActiveX控件技术--allWebPlugin中间件一键部署浏览器扩展

allWebPlugin简介 allWebPlugin中间件是一款为用户提供安全、可靠、便捷的浏览器插件服务的中间件产品&#xff0c;致力于将浏览器插件重新应用到所有浏览器。它将现有ActiveX控件直接嵌入浏览器&#xff0c;实现插件加载、界面显示、接口调用、事件回调等。支持Chrome、Firefo…

基于SpringBoot和MybatisPlus实现通用Controller

基于SpringBoot和MybatisPlus实现通用Controller&#xff0c;只需要创建实体类和mapper接口&#xff0c;单表增删改查接口就已经实现&#xff0c;提升开发效率 1.定义通用controller package com.xian.controller;import cn.hutool.core.map.MapUtil; import com.baomidou.my…

Axure大屏可视化原型模板及素材:数据可视化的高效解决方案

数据可视化已成为企业决策、运营分析、市场洞察的重要工具。数据可视化大屏&#xff0c;作为数据展示和交互的直观平台&#xff0c;能够实时呈现关键数据&#xff0c;帮助企业快速做出决策。Axure作为原型设计领域的领先工具&#xff0c;以其丰富的组件库、强大的交互设计能力和…

YOLOE:实时查看任何事物

摘要 https://arxiv.org/pdf/2503.07465v1 目标检测和分割在计算机视觉应用中得到了广泛应用&#xff0c;然而&#xff0c;尽管YOLO系列等传统模型高效且准确&#xff0c;但它们受限于预定义的类别&#xff0c;阻碍了在开放场景中的适应性。最近的开放集方法利用文本提示、视觉…

這是我第一次寫關於aapenal服務器管理控制面板的文章

首先我們來認識一下服務器管理面板的所有功能  網站管理功能&#xff1a; 支持創建和管理多個網站。配置虛擬主機&#xff08;Vhost&#xff09;和域名綁定。自動安裝常用應用&#xff08;如WordPress、Joomla等&#xff09;。  文件管理功能&#xff1a; 文件上傳、…

jmeter:登录接口的token用于下一个接口

问题&#xff1a; 仅仅登录接口可以使用&#xff0c;其他接口进行测试的时候都是报错&#xff1a;账号已经失效 原因&#xff1a; 应该是登录接口的token并没有用到下一个接口上来 解决方法 1、目录建设如下&#xff1a; 2、先添加一个后置处理器&#xff1a;查看结果数&…

1、操作系统引论

一、操作系统 会使用linux系统 建议大家先学会linux的基础指令&#xff0c;可以看菜鸟教程网站进行学习。 1、各种定义 操作系统定义 管理计算机的 硬件 和软件资源&#xff0c; 能对各类作业进行调度&#xff0c;方便用户使用计算机的程序集合。操作系统运行在内核态&#xf…