基于Baichuan2的新冠流感中医自我诊断治疗(大模型微调+Gradio)

一、项目说明

项目使用paddleNLP提供的大模型套件对Baichuan2-7b/13b进行微调,使用《中医治疗新冠流感支原体感染等有效病历集》进行Lora训练,使大模型具备使用中医方案诊断和治疗新冠、流感等上呼吸道感染的能力。

二、PaddleNLP

PaddleNLP提供的飞桨大模型套件秉承了一站式体验、性能极致、生态兼容的设计理念,旨在提供业界主流大模型预训练、精调(含SFT、PEFT)、量化、推理等统一流程, 帮助开发者低成本、低门槛、快速实现大语言模型定制化。PaddleNLP支持多个主流大模型的SFT、LoRA、Prefix Tuning等精调策略,提供统一、高效精调方案:

  •  1. 统一训练入口。飞桨大模型套件精调方案可适配业界主流大模型,用户只需修改配置文件,即能在单卡或多卡(支持4D并行分布式策略)进行多种大模型精调。
    
  •  1. 高效数据和分布式策略。Zero Padding零填充优化策略有效减少了pad token的占比,提高模型训练效率高达100%。独创PEFT结合低比特和分布式并行策略,大幅降低大模型精调硬件门槛,支持单卡   (A100 80G)百亿模型微调、单机(A100 80G * 8)千亿模型微调。
    
  •  1. 支持多轮对话。支持统一对话模板,支持多轮对话高效训练,详参多轮对话文档。
    

三、Baichuan2-7b/13b-chat

Baichuan2系列产品是百川智能在深度学习领域的最新成果,经过微调后的模型在多个任务上取得了优异的性能。开源这些模型将为开发者提供一个强大的工具,帮助他们在各种应用场景中实现更高效、更准确的人工智能应用.

Baichuan 2系列产品完全开源,并且在在「免费商用」这条路上,Baichuan 2 践行得非常彻底,极大弥补了中国开源生态的短板,让中国开发者用上了对中文场景更友好的开源大模型。

Baichuan2系列模型效率也很高,130亿参数的Baichuan2-13b量化版,在消费级显卡的笔记本电脑上也可以实现快速推理。因此,我们选用Baichuan2系统模型做为本项目的基座 

四、训练数据说明

《中医治疗新冠流感支原体感染等有效病历集》是云中医整理的近期高发上呼吸道感染中医诊断治疗的有效病历,包含新冠,甲流,支原体,腺病毒,合胞病毒等各种病毒引发的感冒、咳嗽等病历。经处理弱化了原病历的处方及处方药,增加了OTC中成药及家庭食疗的治疗方案,避免医疗的资质问题及可能的纠纷,更适合于一般轻症的自我诊所治疗。 数据分两部分:case为病历记录,diagnosis为从病历提取的诊断结果及处方。数据示例如下:

    {"case":"患者,男性,45岁,因新冠感染前来就诊。患者近日出现恶寒、无汗、后背痛的症状,并有发热、身痛、头痛。
背部疼痛严重,影响日常生活。患者还表现出清涕、鼻塞、神疲乏力、声哑、无食欲等症状。舌淡苔白,脉紧。根据患者的主症
和症状关联,考虑为葛根汤证。葛根汤为中医经典方剂,主要用于治疗风寒感冒,尤其对于恶寒、无汗、后背痛等症状有显著疗
效。综上所述,患者新冠感染后出现恶寒、无汗、后背痛、发热、身痛、头痛等症状,考虑为葛根汤证。建议采用葛根汤进行治
疗。","diagnosis":"诊断:太阳阳明伤寒 。建议处方:葛根汤。建议中成药:葛根汤颗粒或风寒感冒颗粒或感冒软胶囊 建议食疗:葱白姜汤"}

PaddleNLP训练数据支持的数据格式是每行包含一个字典,每个字典包含以下字段:

src : str, List(str), 模型的输入指令(instruction)、提示(prompt),模型应该执行的任务。

tgt : str, List(str), 模型的输出。

因此,在训练前,需要将训练数据转换为要求的格式数据。

五、环境准备

1. 获取并安装最新版PaddleNLP

In [1]


#直接克隆github上的最新版本,考虑网络问题,也可以从gitee上克隆(gitee可能版本不是最新,最好是从github上取)
#!git clone https://gitee.com/PaddlePaddle/PaddleNLP
!git clone https://github.com/PaddlePaddle/PaddleNLP.git
Cloning into 'PaddleNLP'...
remote: Enumerating objects: 60471, done.
remote: Counting objects: 100% (578/578), done.
remote: Compressing objects: 100% (423/423), done.
remote: Total 60471 (delta 271), reused 382 (delta 144), pack-reused 59893
Receiving objects: 100% (60471/60471), 97.72 MiB | 15.36 MiB/s, done.
Resolving deltas: 100% (41419/41419), done.

In [2]

# 安装本地下载的版本.
!pip install -r PaddleNLP/requirements.txt
!pip install -e ./PaddleNLP
Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/, https://pypi.tuna.tsinghua.edu.cn/simple/
Ignoring protobuf: markers 'platform_system == "Windows"' don't match your environment
Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 1)) (0.42.1)
Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 2)) (6.8.0)
Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 3)) (0.4.6)
Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 4)) (1.2.2)
Requirement already satisfied: dill<0.3.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 5)) (0.3.4)
Requirement already satisfied: multiprocess<=0.70.12.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 6)) (0.70.12.2)
Requirement already satisfied: datasets>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 7)) (2.16.0)
Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 8)) (4.66.1)
Requirement already satisfied: paddlefsl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 9)) (1.1.0)
Requirement already satisfied: sentencepiece in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 10)) (0.1.99)
Requirement already satisfied: huggingface_hub>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 11)) (0.20.1)
Requirement already satisfied: onnx>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 12)) (1.15.0)
Requirement already satisfied: protobuf>=3.20.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 13)) (3.20.3)
Requirement already satisfied: paddle2onnx in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 15)) (1.1.0)
Requirement already satisfied: Flask-Babel in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 16)) (4.0.0)
Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 17)) (2.5.3)
Requirement already satisfied: fastapi in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 18)) (0.105.0)
Requirement already satisfied: uvicorn in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 19)) (0.25.0)
Requirement already satisfied: typer in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 20)) (0.9.0)
Requirement already satisfied: rich in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 21)) (13.7.0)
Requirement already satisfied: safetensors in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 22)) (0.4.1)
Requirement already satisfied: tool_helpers in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 23)) (0.1.1)
Requirement already satisfied: aistudio-sdk>=0.1.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 24)) (0.1.5)
Requirement already satisfied: jinja2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 25)) (3.1.2)
Requirement already satisfied: numpy>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.26.2)
Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.3.2)
Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.13.1)
Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (0.6)
Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.1.4)
Requirement already satisfied: requests>=2.19.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.31.0)
Requirement already satisfied: xxhash in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.4.1)
Requirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.10.0)
Requirement already satisfied: aiohttp in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.9.1)
Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (23.2)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (6.0.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from huggingface_hub>=0.11.1->-r PaddleNLP/requirements.txt (line 11)) (4.9.0)
Requirement already satisfied: Babel>=2.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2.14.0)
Requirement already satisfied: Flask>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (3.0.0)
Requirement already satisfied: pytz>=2022.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2023.3.post1)
Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.8.98)
Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (10.1.0)
Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.16.0)
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.8.2)
Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (4.1)
Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (5.9.7)
Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (3.7.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (2.5.3)
Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (0.27.0)
Requirement already satisfied: click>=7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->-r PaddleNLP/requirements.txt (line 19)) (8.1.7)
Requirement already satisfied: h11>=0.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->-r PaddleNLP/requirements.txt (line 19)) (0.14.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->-r PaddleNLP/requirements.txt (line 21)) (2.2.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->-r PaddleNLP/requirements.txt (line 21)) (2.17.2)
Requirement already satisfied: pybind11 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tool_helpers->-r PaddleNLP/requirements.txt (line 23)) (2.11.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from jinja2->-r PaddleNLP/requirements.txt (line 25)) (2.1.3)
Requirement already satisfied: idna>=2.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (3.6)
Requirement already satisfied: sniffio>=1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (1.3.0)
Requirement already satisfied: exceptiongroup in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (1.2.0)
Requirement already satisfied: Werkzeug>=3.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (3.0.1)
Requirement already satisfied: itsdangerous>=2.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2.1.2)
Requirement already satisfied: blinker>=1.6.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (1.7.0)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.9.4)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.4.1)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.3.1)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (4.0.3)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->-r PaddleNLP/requirements.txt (line 21)) (0.1.1)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->-r PaddleNLP/requirements.txt (line 18)) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.6 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->-r PaddleNLP/requirements.txt (line 18)) (2.14.6)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.11.17)
Requirement already satisfied: scipy>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.11.4)
Requirement already satisfied: joblib>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (3.2.0)
Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.19.0)
Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.18.3)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (4.47.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pandas->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.3)
Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/, https://pypi.tuna.tsinghua.edu.cn/simple/
Obtaining file:///home/aistudio/PaddleNLPInstalling build dependencies ... doneChecking if build backend supports build_editable ... doneGetting requirements to build editable ... doneInstalling backend dependencies ... donePreparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.42.1)
Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (6.8.0)
Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.4.6)
Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.2.2)
Requirement already satisfied: dill<0.3.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.3.4)
Requirement already satisfied: multiprocess<=0.70.12.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.70.12.2)
Requirement already satisfied: datasets>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (2.16.0)
Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (4.66.1)
Requirement already satisfied: paddlefsl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.1.0)
Requirement already satisfied: sentencepiece in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.99)
Requirement already satisfied: huggingface-hub>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.20.1)
Requirement already satisfied: onnx>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.15.0)
Requirement already satisfied: paddle2onnx in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.1.0)
Requirement already satisfied: Flask-Babel in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (4.0.0)
Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (2.5.3)
Requirement already satisfied: fastapi in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.105.0)
Requirement already satisfied: uvicorn in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.25.0)
Requirement already satisfied: typer in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.9.0)
Requirement already satisfied: rich in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (13.7.0)
Requirement already satisfied: safetensors in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.4.1)
Requirement already satisfied: tool-helpers in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.1)
Requirement already satisfied: aistudio-sdk>=0.1.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.5)
Requirement already satisfied: jinja2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (3.1.2)
Requirement already satisfied: protobuf>=3.20.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (3.20.3)
Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2.31.0)
Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.13.1)
Requirement already satisfied: numpy>=1.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.26.2)
Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (0.6)
Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (2.1.4)
Requirement already satisfied: xxhash in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.4.1)
Requirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets>=2.0.0->paddlenlp==2.6.1.post0) (2023.10.0)
Requirement already satisfied: aiohttp in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.9.1)
Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (23.2)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (6.0.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from huggingface-hub>=0.11.1->paddlenlp==2.6.1.post0) (4.9.0)
Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (3.7.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (2.5.3)
Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (0.27.0)
Requirement already satisfied: Babel>=2.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (2.14.0)
Requirement already satisfied: Flask>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (3.0.0)
Requirement already satisfied: pytz>=2022.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (2023.3.post1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from jinja2->paddlenlp==2.6.1.post0) (2.1.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->paddlenlp==2.6.1.post0) (2.2.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->paddlenlp==2.6.1.post0) (2.17.2)
Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->paddlenlp==2.6.1.post0) (1.3.2)
Requirement already satisfied: pybind11 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tool-helpers->paddlenlp==2.6.1.post0) (2.11.1)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from typer->paddlenlp==2.6.1.post0) (8.1.7)
Requirement already satisfied: h11>=0.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->paddlenlp==2.6.1.post0) (0.14.0)
Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (0.8.98)
Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (10.1.0)
Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (1.16.0)
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (3.8.2)
Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (4.1)
Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (5.9.7)
Requirement already satisfied: idna>=2.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (3.6)
Requirement already satisfied: sniffio>=1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (1.3.0)
Requirement already satisfied: exceptiongroup in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (1.2.0)
Requirement already satisfied: Werkzeug>=3.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (3.0.1)
Requirement already satisfied: itsdangerous>=2.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (2.1.2)
Requirement already satisfied: blinker>=1.6.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (1.7.0)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.9.4)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.4.1)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.3.1)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (4.0.3)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->paddlenlp==2.6.1.post0) (0.1.1)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->paddlenlp==2.6.1.post0) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.6 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->paddlenlp==2.6.1.post0) (2.14.6)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2023.11.17)
Requirement already satisfied: scipy>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (1.11.4)
Requirement already satisfied: joblib>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (3.2.0)
Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->paddlenlp==2.6.1.post0) (3.19.0)
Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->paddlenlp==2.6.1.post0) (0.18.3)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (4.47.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (1.4.5)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pandas->datasets>=2.0.0->paddlenlp==2.6.1.post0) (2023.3)
Building wheels for collected packages: paddlenlpBuilding editable for paddlenlp (pyproject.toml) ... doneCreated wheel for paddlenlp: filename=paddlenlp-2.6.1.post0-0.editable-py3-none-any.whl size=15186 sha256=d63900491865a4c53fb8126468b30096bf5f9f684b281a3a5413724d608a6f40Stored in directory: /tmp/pip-ephem-wheel-cache-dxh_79d_/wheels/ef/67/51/d39210219524142315c8b4babdd3bb2610f53d4d50639f381e
Successfully built paddlenlp
Installing collected packages: paddlenlpAttempting uninstall: paddlenlpFound existing installation: paddlenlp 2.6.1.post0Uninstalling paddlenlp-2.6.1.post0:Successfully uninstalled paddlenlp-2.6.1.post0
Successfully installed paddlenlp-2.6.1.post0

In [3]

# 查看是否安装成功,为确保可用,此处应重启一下内核
!pip list|grep paddlenlp
paddlenlp                  2.6.1.post0  /home/aistudio/PaddleNLP

2. 获取Baichuan2-7B/13B-chat模型 AIStudio以及集成了Baichuan2系列模型,模型可以使用from_aistudio=True参数直接加载,代码如下:

AutoModelForCausalLM.from_pretrained("aistudio/Baichuan2-7B-Chat", from_aistudio=True
)

不过考虑到本地化部署,我们还是先克隆下来,这里使用7B模型,大家可以根据自己的需要选择模型的版本

In [9]

# 可以从aistudio直接克隆,速度最快: !git clone http://git.aistudio.baidu.com/aistudio/Baichuan2-7B-Chat.git
Cloning into 'Baichuan2-7B-Chat'...
remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (74/74), done.
remote: Total 75 (delta 30), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (75/75), 13.65 KiB | 873.00 KiB/s, done.
Filtering content: 100% (9/9), 3.96 GiB | 8.93 MiB/s, done.
Encountered 6 files that may not have been copied correctly on Windows:model-00003-of-00004.safetensorsmodel_state-00003-of-00004.pdparamsmodel_state-00001-of-00004.pdparamsmodel_state-00002-of-00004.pdparamsmodel-00002-of-00004.safetensorsmodel-00001-of-00004.safetensorsSee: `git lfs help smudge` for more details.

六、数据准备

1. 按训练格式要求转换训练数据

In [5]

import json
from sklearn.model_selection import train_test_split  # 读取 JSON 文件
with open('data/data254538/RecentColdMedicalCase.json', 'r', encoding='utf-8') as f:data = json.load(f)  # 将数据集划分为训练集和测试集  
train, dev = train_test_split(data, test_size=0.1, random_state=42) #安装训练要求格式转换为src/tgt数据,每条数据一行
with open('TrainData/train.json', 'w', encoding="utf-8") as f:for item in train:temp = dict()temp['src'] = item['case']temp['tgt'] = item['diagnosis']json.dump(temp, f, ensure_ascii=False)f.write('\n')with open('TrainData/dev.json', 'w', encoding="utf-8") as f:for item in dev:temp = dict()temp['src'] = item['case']temp['tgt'] = item['diagnosis']json.dump(temp, f, ensure_ascii=False)f.write('\n')

2. 编辑微调参数 /home/aistudio/PaddleNLP/llm/llama/lora_argument.json中预设了Lora微调的参数,不需要在命令行输入。直接编辑文档,主要修改前两行,模型路径和数据路径,其他参数可以自己根据注释内容自行调整

{#预训练模型内置名称或者模型所在目录,默认为facebook/llama-7b"model_name_or_path": "/home/aistudio/Baichuan2-7B-Chat", #训练数据所在目录"dataset_name_or_path": "/home/aistudio/TrainData",#模型参数保存目录"output_dir": "./checkpoints/llama_lora_ckpts",#训练批次大小"per_device_train_batch_size": 4,#模型参数梯度累积的步数,可用于扩大 batch size。实际的 batch_size = per_device_train_batch_size * gradient_accumulation_steps。"gradient_accumulation_steps": 4,#评估批次大小"per_device_eval_batch_size": 8,#评估累积步数"eval_accumulation_steps":16,#要执行的训练 epoch 总数(如果不是整数,将在停止训练之前执行最后一个 epoch 的小数部分百分比)"num_train_epochs": 3,#参数更新的学习率。"learning_rate": 3e-04,#学习率热启的步数。"warmup_steps": 30,#训练日志打印的间隔步数。"logging_steps": 1,#模型评估的策略:每个epoch评估一次,每个batch评估一次或不定期"evaluation_strategy": "epoch",#模型保存的策略"save_strategy": "epoch",#上下文的最大输入长度,默认为128."src_length": 1024,#"max_length": 2048,#使用 float16 精度进行模型训练和推理"fp16": true,# float16 精度训练模式,O2表示纯 float16 训练。"fp16_opt_level": "O2",#是否训练模型。"do_train": true,#是否评估模型。"do_eval": true,#是否禁用tqdm库的进度条。"disable_tqdm": true,#否在训练结束后加载最佳模型"load_best_model_at_end": true,#在评估的时候是否调用model.generate,默认为False。"eval_with_do_generation": false,#用于比较模型的评估指标,如loss,accuracy等"metric_for_best_model": "accuracy",#是否重新计算评估指标"recompute": true,#存储和管理的模型数量,是否保存多个副本"save_total_limit": 1,#模型并行数量。"tensor_parallel_degree": 1,#流水线中并行执行的任务数量"pipeline_parallel_degree": 1,#是否使用LoRA技术。"lora": true,#是否使用零填充"zero_padding": false,#是否使用Flash Attention(快速注意力)机制。"use_flash_attention": false}

七、进行训练

1. 训练前先测试下原始模型的能力

In [3]

import json
import paddle
import get_result
from paddlenlp.transformers import AutoModelForCausalLM,LlamaTokenizer
#载入模型及权重
model = AutoModelForCausalLM.from_pretrained('/home/aistudio/Baichuan2-7B-Chat',dtype="float16",tensor_parallel_degree=0,tensor_parallel_rank=0,)
model.eval()
tokenizer = LlamaTokenizer.from_pretrained('/home/aistudio/Baichuan2-7B-Chat')
result=get_result.generate(model,tokenizer,"我感冒了,有点咳嗽,发热,头疼,有口渴但是小便不利")
print(result)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdm
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.warnings.warn("Setuptools is replacing distutils.")
[2023-12-27 12:58:44,513] [    INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'.
[2023-12-27 12:58:44,514] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json
[2023-12-27 12:58:44,518] [    INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json
W1227 12:58:44.522776  2705 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1227 12:58:44.524257  2705 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████| 4/4 [03:48<00:00, 57.18s/it]
[2023-12-27 13:02:48,099] [    INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.[2023-12-27 13:02:48,100] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2023-12-27 13:02:48,106] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json
 。”
根据您提供的症状, 可能是由于外感风寒引起的感冒现象. 这是一种常见的疾病,可以通过服用一些药物来缓解症状并促进康复。然而,在开始任何药物治疗之前,请务必咨询专业医生的意见和建议;因为每个人的病情和体质不同,可能需要不同的治疗方案或用药剂量。以下是一些建议供您参考:
1. 多休息、多饮水以帮助身体排毒;避免食用辛辣刺激性食物以及油腻食物以减少对呼吸道的刺激 ;保持室内空气流通,以免空气过于干燥引起咽喉不适等症状加重</s>

原始模型的回答比较泛,没有针对病情的精确诊断,也没有太有效的方案。接下来我们使用训练数据进行微调训练

2. 进行微调

执行下面训练前,先要重启一下内核,释放显存,否则会显存不够用

In [1]

%cd ~/PaddleNLP/llm/
# 单卡训练
!python  finetune_generation.py ./llama/lora_argument.json
# 分布式训练
# 将lora_argument.json中tensor_parallel_degree修改为2
#python  -u  -m paddle.distributed.launch --gpus "0,1"  finetune_generation.py ./llama/lora_argument.json
/home/aistudio/PaddleNLP/llm
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/IPython/core/magics/osm.py:393: UserWarning: using bookmarks requires you to install the `pickleshare` library.bkms = self.shell.db.get('bookmarks', {})
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library.self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.warnings.warn("Setuptools is replacing distutils.")
[2023-12-26 17:37:10,698] [    INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-12-26 17:37:10,699] [    INFO] - ============================================================
[2023-12-26 17:37:10,699] [    INFO] -      Model Configuration Arguments      
[2023-12-26 17:37:10,699] [    INFO] - paddle commit id              : 3a1b1659a405a044ce806fbe027cc146f1193e6d
[2023-12-26 17:37:10,699] [    INFO] - paddlenlp commit id           : 942865f52b42cd6e0666a19af316f32e151694eb.dirty
[2023-12-26 17:37:10,699] [    INFO] - aistudio_repo_id              : None
[2023-12-26 17:37:10,699] [    INFO] - aistudio_repo_license         : Apache License 2.0
[2023-12-26 17:37:10,699] [    INFO] - aistudio_repo_private         : True
[2023-12-26 17:37:10,699] [    INFO] - aistudio_token                : None
[2023-12-26 17:37:10,699] [    INFO] - from_aistudio                 : False
[2023-12-26 17:37:10,699] [    INFO] - lora                          : True
[2023-12-26 17:37:10,699] [    INFO] - lora_path                     : None
[2023-12-26 17:37:10,699] [    INFO] - lora_rank                     : 8
[2023-12-26 17:37:10,700] [    INFO] - model_name_or_path            : /home/aistudio/Baichuan2-7B-Chat
[2023-12-26 17:37:10,700] [    INFO] - neftune                       : False
[2023-12-26 17:37:10,700] [    INFO] - neftune_noise_alpha           : 5.0
[2023-12-26 17:37:10,700] [    INFO] - num_prefix_tokens             : 128
[2023-12-26 17:37:10,700] [    INFO] - prefix_tuning                 : False
[2023-12-26 17:37:10,700] [    INFO] - save_to_aistudio              : False
[2023-12-26 17:37:10,700] [    INFO] - use_flash_attention           : False
[2023-12-26 17:37:10,700] [    INFO] - weight_blocksize              : 64
[2023-12-26 17:37:10,700] [    INFO] - weight_double_quant           : False
[2023-12-26 17:37:10,700] [    INFO] - weight_double_quant_block_size: 256
[2023-12-26 17:37:10,700] [    INFO] - weight_quantize_algo          : None
[2023-12-26 17:37:10,700] [    INFO] - 
[2023-12-26 17:37:10,700] [    INFO] - ============================================================
[2023-12-26 17:37:10,700] [    INFO] -       Data Configuration Arguments      
[2023-12-26 17:37:10,700] [    INFO] - paddle commit id              : 3a1b1659a405a044ce806fbe027cc146f1193e6d
[2023-12-26 17:37:10,700] [    INFO] - paddlenlp commit id           : 942865f52b42cd6e0666a19af316f32e151694eb.dirty
[2023-12-26 17:37:10,700] [    INFO] - chat_template                 : None
[2023-12-26 17:37:10,700] [    INFO] - dataset_name_or_path          : /home/aistudio/TrainData
[2023-12-26 17:37:10,701] [    INFO] - eval_with_do_generation       : False
[2023-12-26 17:37:10,701] [    INFO] - intokens                      : None
[2023-12-26 17:37:10,701] [    INFO] - lazy                          : False
[2023-12-26 17:37:10,701] [    INFO] - max_length                    : 2048
[2023-12-26 17:37:10,701] [    INFO] - save_generation_output        : False
[2023-12-26 17:37:10,701] [    INFO] - src_length                    : 1024
[2023-12-26 17:37:10,701] [    INFO] - task_name                     : None
[2023-12-26 17:37:10,701] [    INFO] - task_name_or_path             : None
[2023-12-26 17:37:10,701] [    INFO] - zero_padding                  : False
[2023-12-26 17:37:10,701] [    INFO] - 
[2023-12-26 17:37:10,701] [    INFO] - ============================================================
[2023-12-26 17:37:10,701] [    INFO] -      Quant Configuration Arguments      
[2023-12-26 17:37:10,701] [    INFO] - paddle commit id              : 3a1b1659a405a044ce806fbe027cc146f1193e6d
[2023-12-26 17:37:10,701] [    INFO] - paddlenlp commit id           : 942865f52b42cd6e0666a19af316f32e151694eb.dirty
[2023-12-26 17:37:10,701] [    INFO] - do_gptq                       : False
[2023-12-26 17:37:10,701] [    INFO] - do_ptq                        : False
[2023-12-26 17:37:10,701] [    INFO] - do_qat                        : False
[2023-12-26 17:37:10,702] [    INFO] - gptq_step                     : 8
[2023-12-26 17:37:10,702] [    INFO] - ptq_step                      : 32
[2023-12-26 17:37:10,702] [    INFO] - quant_type                    : a8w8
[2023-12-26 17:37:10,702] [    INFO] - shift                         : False
[2023-12-26 17:37:10,702] [    INFO] - shift_all_linears             : False
[2023-12-26 17:37:10,702] [    INFO] - shift_sampler                 : ema
[2023-12-26 17:37:10,702] [    INFO] - shift_step                    : 32
[2023-12-26 17:37:10,702] [    INFO] - smooth                        : False
[2023-12-26 17:37:10,702] [    INFO] - smooth_all_linears            : False
[2023-12-26 17:37:10,702] [    INFO] - smooth_k_piece                : 3
[2023-12-26 17:37:10,702] [    INFO] - smooth_piecewise_search       : False
[2023-12-26 17:37:10,703] [    INFO] - smooth_sampler                : none
[2023-12-26 17:37:10,703] [    INFO] - smooth_search_piece           : False
[2023-12-26 17:37:10,703] [    INFO] - smooth_step                   : 32
[2023-12-26 17:37:10,703] [    INFO] - 
[2023-12-26 17:37:10,703] [    INFO] - ============================================================
[2023-12-26 17:37:10,703] [    INFO] -    Generation Configuration Arguments   
[2023-12-26 17:37:10,703] [    INFO] - paddle commit id              : 3a1b1659a405a044ce806fbe027cc146f1193e6d
[2023-12-26 17:37:10,703] [    INFO] - paddlenlp commit id           : 942865f52b42cd6e0666a19af316f32e151694eb.dirty
[2023-12-26 17:37:10,703] [    INFO] - top_k                         : 1
[2023-12-26 17:37:10,703] [    INFO] - top_p                         : 1.0
[2023-12-26 17:37:10,703] [    INFO] - 
[2023-12-26 17:37:10,703] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2023-12-26 17:37:10,704] [    INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load '/home/aistudio/Baichuan2-7B-Chat'.
[2023-12-26 17:37:10,704] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json
[2023-12-26 17:37:10,705] [    INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'.
[2023-12-26 17:37:10,706] [    INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json
W1226 17:37:10.709461 26242 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1226 17:37:10.710600 26242 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████████████| 4/4 [04:16<00:00, 64.11s/it]
[2023-12-26 17:41:56,850] [    INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.[2023-12-26 17:41:56,850] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2023-12-26 17:41:56,853] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json
[2023-12-26 17:41:56,853] [    INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load '/home/aistudio/Baichuan2-7B-Chat'.
Downloading data files: 100%|███████████████████| 1/1 [00:00<00:00, 7436.71it/s]
Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 1189.87it/s]
Generating train split: 1848 examples [00:00, 108999.65 examples/s]
Downloading data files: 100%|██████████████████| 1/1 [00:00<00:00, 11214.72it/s]
Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 1536.38it/s]
Generating train split: 206 examples [00:00, 77987.78 examples/s]
[2023-12-26 17:42:21,202] [    INFO] - Frozen parameters: 7.51e+09 || Trainable parameters:2.00e+07 || Total parameters:7.53e+09|| Trainable:0.27%
[2023-12-26 17:42:21,202] [    INFO] - The global seed is set to 42, local seed is set to 43 and random seed is set to 42.
[2023-12-26 17:42:21,238] [    INFO] - Using half precision
[2023-12-26 17:42:21,268] [    INFO] - ============================================================
[2023-12-26 17:42:21,268] [    INFO] -     Training Configuration Arguments    
[2023-12-26 17:42:21,268] [    INFO] - paddle commit id              : 3a1b1659a405a044ce806fbe027cc146f1193e6d
[2023-12-26 17:42:21,268] [    INFO] - paddlenlp commit id           : 942865f52b42cd6e0666a19af316f32e151694eb.dirty
[2023-12-26 17:42:21,268] [    INFO] - _no_sync_in_gradient_accumulation: True
[2023-12-26 17:42:21,268] [    INFO] - adam_beta1                    : 0.9
[2023-12-26 17:42:21,268] [    INFO] - adam_beta2                    : 0.999
[2023-12-26 17:42:21,268] [    INFO] - adam_epsilon                  : 1e-08
[2023-12-26 17:42:21,268] [    INFO] - amp_custom_black_list         : None
[2023-12-26 17:42:21,268] [    INFO] - amp_custom_white_list         : None
[2023-12-26 17:42:21,268] [    INFO] - amp_master_grad               : False
[2023-12-26 17:42:21,269] [    INFO] - autotuner_benchmark           : False
[2023-12-26 17:42:21,269] [    INFO] - benchmark                     : False
[2023-12-26 17:42:21,269] [    INFO] - bf16                          : False
[2023-12-26 17:42:21,269] [    INFO] - bf16_full_eval                : False
[2023-12-26 17:42:21,269] [    INFO] - current_device                : gpu:0
[2023-12-26 17:42:21,269] [    INFO] - data_parallel_rank            : 0
[2023-12-26 17:42:21,269] [    INFO] - dataloader_drop_last          : False
[2023-12-26 17:42:21,269] [    INFO] - dataloader_num_workers        : 0
[2023-12-26 17:42:21,269] [    INFO] - dataset_rank                  : 0
[2023-12-26 17:42:21,269] [    INFO] - dataset_world_size            : 1
[2023-12-26 17:42:21,269] [    INFO] - device                        : gpu
[2023-12-26 17:42:21,269] [    INFO] - disable_tqdm                  : True
[2023-12-26 17:42:21,269] [    INFO] - distributed_dataloader        : False
[2023-12-26 17:42:21,269] [    INFO] - do_eval                       : True
[2023-12-26 17:42:21,269] [    INFO] - do_export                     : False
[2023-12-26 17:42:21,269] [    INFO] - do_predict                    : False
[2023-12-26 17:42:21,269] [    INFO] - do_train                      : True
[2023-12-26 17:42:21,269] [    INFO] - eval_accumulation_steps       : 16
[2023-12-26 17:42:21,270] [    INFO] - eval_batch_size               : 8
[2023-12-26 17:42:21,270] [    INFO] - eval_steps                    : None
[2023-12-26 17:42:21,270] [    INFO] - evaluation_strategy           : IntervalStrategy.EPOCH
[2023-12-26 17:42:21,270] [    INFO] - flatten_param_grads           : False
[2023-12-26 17:42:21,270] [    INFO] - force_reshard_pp              : False
[2023-12-26 17:42:21,270] [    INFO] - fp16                          : True
[2023-12-26 17:42:21,270] [    INFO] - fp16_full_eval                : False
[2023-12-26 17:42:21,270] [    INFO] - fp16_opt_level                : O2
[2023-12-26 17:42:21,270] [    INFO] - gradient_accumulation_steps   : 4
[2023-12-26 17:42:21,270] [    INFO] - greater_is_better             : True
[2023-12-26 17:42:21,270] [    INFO] - hybrid_parallel_topo_order    : None
[2023-12-26 17:42:21,270] [    INFO] - ignore_data_skip              : False
[2023-12-26 17:42:21,270] [    INFO] - ignore_load_lr_and_optim      : False
[2023-12-26 17:42:21,270] [    INFO] - label_names                   : None
[2023-12-26 17:42:21,270] [    INFO] - lazy_data_processing          : True
[2023-12-26 17:42:21,270] [    INFO] - learning_rate                 : 0.0003
[2023-12-26 17:42:21,270] [    INFO] - load_best_model_at_end        : True
[2023-12-26 17:42:21,270] [    INFO] - load_sharded_model            : False
[2023-12-26 17:42:21,270] [    INFO] - local_process_index           : 0
[2023-12-26 17:42:21,271] [    INFO] - local_rank                    : -1
[2023-12-26 17:42:21,271] [    INFO] - log_level                     : -1
[2023-12-26 17:42:21,271] [    INFO] - log_level_replica             : -1
[2023-12-26 17:42:21,271] [    INFO] - log_on_each_node              : True
[2023-12-26 17:42:21,271] [    INFO] - logging_dir                   : ./checkpoints/llama_lora_ckpts/runs/Dec26_17-37-10_jupyter-3484865-7331292
[2023-12-26 17:42:21,271] [    INFO] - logging_first_step            : False
[2023-12-26 17:42:21,271] [    INFO] - logging_steps                 : 1
[2023-12-26 17:42:21,271] [    INFO] - logging_strategy              : IntervalStrategy.STEPS
[2023-12-26 17:42:21,271] [    INFO] - logical_process_index         : 0
[2023-12-26 17:42:21,271] [    INFO] - lr_end                        : 1e-07
[2023-12-26 17:42:21,271] [    INFO] - lr_scheduler_type             : SchedulerType.LINEAR
[2023-12-26 17:42:21,271] [    INFO] - max_evaluate_steps            : -1
[2023-12-26 17:42:21,271] [    INFO] - max_grad_norm                 : 1.0
[2023-12-26 17:42:21,271] [    INFO] - max_steps                     : -1
[2023-12-26 17:42:21,271] [    INFO] - metric_for_best_model         : accuracy
[2023-12-26 17:42:21,271] [    INFO] - minimum_eval_times            : None
[2023-12-26 17:42:21,271] [    INFO] - no_cuda                       : False
[2023-12-26 17:42:21,271] [    INFO] - num_cycles                    : 0.5
[2023-12-26 17:42:21,272] [    INFO] - num_train_epochs              : 3
[2023-12-26 17:42:21,272] [    INFO] - optim                         : OptimizerNames.ADAMW
[2023-12-26 17:42:21,272] [    INFO] - optimizer_name_suffix         : None
[2023-12-26 17:42:21,272] [    INFO] - output_dir                    : ./checkpoints/llama_lora_ckpts
[2023-12-26 17:42:21,272] [    INFO] - overwrite_output_dir          : False
[2023-12-26 17:42:21,272] [    INFO] - past_index                    : -1
[2023-12-26 17:42:21,272] [    INFO] - per_device_eval_batch_size    : 8
[2023-12-26 17:42:21,272] [    INFO] - per_device_train_batch_size   : 4
[2023-12-26 17:42:21,272] [    INFO] - pipeline_parallel_config      : 
[2023-12-26 17:42:21,272] [    INFO] - pipeline_parallel_degree      : -1
[2023-12-26 17:42:21,272] [    INFO] - pipeline_parallel_rank        : 0
[2023-12-26 17:42:21,272] [    INFO] - power                         : 1.0
[2023-12-26 17:42:21,272] [    INFO] - prediction_loss_only          : False
[2023-12-26 17:42:21,272] [    INFO] - process_index                 : 0
[2023-12-26 17:42:21,272] [    INFO] - recompute                     : True
[2023-12-26 17:42:21,272] [    INFO] - remove_unused_columns         : True
[2023-12-26 17:42:21,272] [    INFO] - report_to                     : ['visualdl']
[2023-12-26 17:42:21,272] [    INFO] - resume_from_checkpoint        : None
[2023-12-26 17:42:21,273] [    INFO] - run_name                      : ./checkpoints/llama_lora_ckpts
[2023-12-26 17:42:21,273] [    INFO] - save_on_each_node             : False
[2023-12-26 17:42:21,273] [    INFO] - save_sharded_model            : False
[2023-12-26 17:42:21,273] [    INFO] - save_steps                    : 500
[2023-12-26 17:42:21,273] [    INFO] - save_strategy                 : IntervalStrategy.EPOCH
[2023-12-26 17:42:21,273] [    INFO] - save_total_limit              : 1
[2023-12-26 17:42:21,273] [    INFO] - scale_loss                    : 32768
[2023-12-26 17:42:21,273] [    INFO] - seed                          : 42
[2023-12-26 17:42:21,273] [    INFO] - sep_parallel_degree           : -1
[2023-12-26 17:42:21,273] [    INFO] - sharding                      : []
[2023-12-26 17:42:21,273] [    INFO] - sharding_degree               : -1
[2023-12-26 17:42:21,273] [    INFO] - sharding_parallel_config      : 
[2023-12-26 17:42:21,273] [    INFO] - sharding_parallel_degree      : -1
[2023-12-26 17:42:21,273] [    INFO] - sharding_parallel_rank        : 0
[2023-12-26 17:42:21,273] [    INFO] - should_load_dataset           : True
[2023-12-26 17:42:21,273] [    INFO] - should_load_sharding_stage1_model: False
[2023-12-26 17:42:21,273] [    INFO] - should_log                    : True
[2023-12-26 17:42:21,273] [    INFO] - should_save                   : True
[2023-12-26 17:42:21,273] [    INFO] - should_save_model_state       : True
[2023-12-26 17:42:21,274] [    INFO] - should_save_sharding_stage1_model: False
[2023-12-26 17:42:21,274] [    INFO] - skip_memory_metrics           : True
[2023-12-26 17:42:21,274] [    INFO] - skip_profile_timer            : True
[2023-12-26 17:42:21,274] [    INFO] - tensor_parallel_config        : 
[2023-12-26 17:42:21,274] [    INFO] - tensor_parallel_degree        : -1
[2023-12-26 17:42:21,274] [    INFO] - tensor_parallel_rank          : 0
[2023-12-26 17:42:21,274] [    INFO] - to_static                     : False
[2023-12-26 17:42:21,274] [    INFO] - train_batch_size              : 4
[2023-12-26 17:42:21,274] [    INFO] - unified_checkpoint            : False
[2023-12-26 17:42:21,274] [    INFO] - use_auto_parallel             : False
[2023-12-26 17:42:21,274] [    INFO] - use_hybrid_parallel           : False
[2023-12-26 17:42:21,274] [    INFO] - warmup_ratio                  : 0.0
[2023-12-26 17:42:21,274] [    INFO] - warmup_steps                  : 30
[2023-12-26 17:42:21,274] [    INFO] - weight_decay                  : 0.0
[2023-12-26 17:42:21,274] [    INFO] - weight_name_suffix            : None
[2023-12-26 17:42:21,274] [    INFO] - world_size                    : 1
[2023-12-26 17:42:21,274] [    INFO] - 
[2023-12-26 17:42:21,274] [    INFO] - Starting training from resume_from_checkpoint : None
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/distributed/parallel.py:411: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card.warnings.warn(
[2023-12-26 17:42:21,280] [    INFO] - ***** Running training *****
[2023-12-26 17:42:21,280] [    INFO] -   Num examples = 1,848
[2023-12-26 17:42:21,280] [    INFO] -   Num Epochs = 3
[2023-12-26 17:42:21,281] [    INFO] -   Instantaneous batch size per device = 4
[2023-12-26 17:42:21,281] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 16
[2023-12-26 17:42:21,281] [    INFO] -   Gradient Accumulation steps = 4
[2023-12-26 17:42:21,281] [    INFO] -   Total optimization steps = 345
[2023-12-26 17:42:21,281] [    INFO] -   Total num train samples = 5,544
[2023-12-26 17:42:21,285] [    INFO] -   Number of trainable parameters = 19,988,480 (per device)
[2023-12-26 17:42:24,950] [    INFO] - loss: 4.04572821, learning_rate: 1e-05, global_step: 1, interval_runtime: 3.6647, interval_samples_per_second: 4.365975805930098, interval_steps_per_second: 0.27287348787063115, ppl: 57.15279011734303, epoch: 0.0087
[2023-12-26 17:42:28,307] [    INFO] - loss: 4.65756416, learning_rate: 2e-05, global_step: 2, interval_runtime: 3.3567, interval_samples_per_second: 4.766609671416571, interval_steps_per_second: 0.2979131044635357, ppl: 105.37908269415595, epoch: 0.0173
[2023-12-26 17:42:31,806] [    INFO] - loss: 4.39336109, learning_rate: 3e-05, global_step: 3, interval_runtime: 3.4991, interval_samples_per_second: 4.572590091463571, interval_steps_per_second: 0.2857868807164732, ppl: 80.91191469147887, epoch: 0.026
[2023-12-26 17:42:35,023] [    INFO] - loss: 4.2095747, learning_rate: 4e-05, global_step: 4, interval_runtime: 3.2167, interval_samples_per_second: 4.974087316240644, interval_steps_per_second: 0.31088045726504027, ppl: 67.32789916460032, epoch: 0.0346
[2023-12-26 17:42:38,280] [    INFO] - loss: 4.45522022, learning_rate: 5e-05, global_step: 5, interval_runtime: 3.2576, interval_samples_per_second: 4.911592329911189, interval_steps_per_second: 0.3069745206194493, ppl: 86.07510421755674, epoch: 0.0433
[2023-12-26 17:42:42,170] [    INFO] - loss: 4.172194, learning_rate: 6e-05, global_step: 6, interval_runtime: 3.89, interval_samples_per_second: 4.113123033615344, interval_steps_per_second: 0.257070189600959, ppl: 64.85759368161546, epoch: 0.0519
[2023-12-26 17:42:45,784] [    INFO] - loss: 3.75121832, learning_rate: 7e-05, global_step: 7, interval_runtime: 3.6141, interval_samples_per_second: 4.427085699961078, interval_steps_per_second: 0.2766928562475674, ppl: 42.57291785460257, epoch: 0.0606
[2023-12-26 17:42:49,598] [    INFO] - loss: 3.57292008, learning_rate: 8e-05, global_step: 8, interval_runtime: 3.8137, interval_samples_per_second: 4.195439390785131, interval_steps_per_second: 0.2622149619240707, ppl: 35.62045601509179, epoch: 0.0693
[2023-12-26 17:42:52,626] [    INFO] - loss: 3.01917839, learning_rate: 9e-05, global_step: 9, interval_runtime: 3.028, interval_samples_per_second: 5.2839536578947826, interval_steps_per_second: 0.3302471036184239, ppl: 20.47446274838995, epoch: 0.0779
[2023-12-26 17:42:55,893] [    INFO] - loss: 2.75773215, learning_rate: 0.0001, global_step: 10, interval_runtime: 3.2669, interval_samples_per_second: 4.897542422550416, interval_steps_per_second: 0.306096401409401, ppl: 15.764051874163764, epoch: 0.0866
[2023-12-26 17:42:59,152] [    INFO] - loss: 2.5989778, learning_rate: 0.00011, global_step: 11, interval_runtime: 3.2584, interval_samples_per_second: 4.9104358222759465, interval_steps_per_second: 0.30690223889224666, ppl: 13.449982433667916, epoch: 0.0952
[2023-12-26 17:43:02,654] [    INFO] - loss: 2.21501446, learning_rate: 0.00012, global_step: 12, interval_runtime: 3.5021, interval_samples_per_second: 4.5686416499291065, interval_steps_per_second: 0.28554010312056916, ppl: 9.161541586542349, epoch: 0.1039
[2023-12-26 17:43:05,641] [    INFO] - loss: 2.03604507, learning_rate: 0.00013, global_step: 13, interval_runtime: 2.9872, interval_samples_per_second: 5.356214143645703, interval_steps_per_second: 0.33476338397785643, ppl: 7.660253444848598, epoch: 0.1126
[2023-12-26 17:43:09,544] [    INFO] - loss: 1.90918612, learning_rate: 0.00014, global_step: 14, interval_runtime: 3.9035, interval_samples_per_second: 4.09884021807836, interval_steps_per_second: 0.2561775136298975, ppl: 6.7475948306385, epoch: 0.1212
[2023-12-26 17:43:13,583] [    INFO] - loss: 1.76850057, learning_rate: 0.00015, global_step: 15, interval_runtime: 4.0385, interval_samples_per_second: 3.9618929230574502, interval_steps_per_second: 0.24761830769109064, ppl: 5.862057024120994, epoch: 0.1299
[2023-12-26 17:43:17,162] [    INFO] - loss: 1.59435368, learning_rate: 0.00016, global_step: 16, interval_runtime: 3.5791, interval_samples_per_second: 4.470455789035147, interval_steps_per_second: 0.2794034868146967, ppl: 4.925144823605828, epoch: 0.1385
[2023-12-26 17:43:20,310] [    INFO] - loss: 1.55910134, learning_rate: 0.00017, global_step: 17, interval_runtime: 3.1482, interval_samples_per_second: 5.082190660424854, interval_steps_per_second: 0.3176369162765534, ppl: 4.754546603849948, epoch: 0.1472
[2023-12-26 17:43:24,120] [    INFO] - loss: 1.37142038, learning_rate: 0.00018, global_step: 18, interval_runtime: 3.8102, interval_samples_per_second: 4.199265432108036, interval_steps_per_second: 0.26245408950675225, ppl: 3.940944360515859, epoch: 0.1558
[2023-12-26 17:43:27,430] [    INFO] - loss: 1.26009345, learning_rate: 0.00019, global_step: 19, interval_runtime: 3.3092, interval_samples_per_second: 4.835006761991377, interval_steps_per_second: 0.30218792262446104, ppl: 3.5257509533974374, epoch: 0.1645
[2023-12-26 17:43:30,616] [    INFO] - loss: 1.38265204, learning_rate: 0.0002, global_step: 20, interval_runtime: 3.1861, interval_samples_per_second: 5.021890906010552, interval_steps_per_second: 0.3138681816256595, ppl: 3.985457216342121, epoch: 0.1732
[2023-12-26 17:43:34,073] [    INFO] - loss: 1.34700322, learning_rate: 0.00021, global_step: 21, interval_runtime: 3.4576, interval_samples_per_second: 4.627547107516889, interval_steps_per_second: 0.28922169421980554, ppl: 3.845882978897622, epoch: 0.1818
[2023-12-26 17:43:37,613] [    INFO] - loss: 0.96020913, learning_rate: 0.00022, global_step: 22, interval_runtime: 3.5402, interval_samples_per_second: 4.519494023607096, interval_steps_per_second: 0.2824683764754435, ppl: 2.6122427146223246, epoch: 0.1905
[2023-12-26 17:43:41,826] [    INFO] - loss: 0.81633461, learning_rate: 0.00023, global_step: 23, interval_runtime: 4.2123, interval_samples_per_second: 3.798402662854955, interval_steps_per_second: 0.2374001664284347, ppl: 2.262192803692768, epoch: 0.1991
[2023-12-26 17:43:46,109] [    INFO] - loss: 0.92583209, learning_rate: 0.00024, global_step: 24, interval_runtime: 4.2829, interval_samples_per_second: 3.735805341596222, interval_steps_per_second: 0.23348783384976388, ppl: 2.523967554998823, epoch: 0.2078
[2023-12-26 17:43:49,823] [    INFO] - loss: 0.97212011, learning_rate: 0.00025, global_step: 25, interval_runtime: 3.7141, interval_samples_per_second: 4.307892351580294, interval_steps_per_second: 0.26924327197376835, ppl: 2.643543124577833, epoch: 0.2165
[2023-12-26 17:43:53,004] [    INFO] - loss: 0.68968725, learning_rate: 0.00026, global_step: 26, interval_runtime: 3.1813, interval_samples_per_second: 5.029367627441282, interval_steps_per_second: 0.3143354767150801, ppl: 1.9930920962051093, epoch: 0.2251
[2023-12-26 17:43:56,862] [    INFO] - loss: 0.75413704, learning_rate: 0.00027, global_step: 27, interval_runtime: 3.8575, interval_samples_per_second: 4.1477499141974175, interval_steps_per_second: 0.2592343696373386, ppl: 2.1257762717033786, epoch: 0.2338
[2023-12-26 17:43:59,763] [    INFO] - loss: 0.63414562, learning_rate: 0.00028, global_step: 28, interval_runtime: 2.9012, interval_samples_per_second: 5.514889879483934, interval_steps_per_second: 0.34468061746774586, ppl: 1.885410596015647, epoch: 0.2424
[2023-12-26 17:44:03,739] [    INFO] - loss: 0.6446268, learning_rate: 0.00029, global_step: 29, interval_runtime: 3.9758, interval_samples_per_second: 4.024363191163546, interval_steps_per_second: 0.2515226994477216, ppl: 1.9052758476273477, epoch: 0.2511
[2023-12-26 17:44:07,650] [    INFO] - loss: 0.63658696, learning_rate: 0.0003, global_step: 30, interval_runtime: 3.9109, interval_samples_per_second: 4.091122564876725, interval_steps_per_second: 0.25569516030479533, ppl: 1.8900191475517596, epoch: 0.2597
[2023-12-26 17:44:11,223] [    INFO] - loss: 0.56769204, learning_rate: 0.000299, global_step: 31, interval_runtime: 3.5735, interval_samples_per_second: 4.477359595438927, interval_steps_per_second: 0.27983497471493296, ppl: 1.7641906676844925, epoch: 0.2684
[2023-12-26 17:44:14,480] [    INFO] - loss: 0.51316339, learning_rate: 0.0002981, global_step: 32, interval_runtime: 3.2572, interval_samples_per_second: 4.912153169838064, interval_steps_per_second: 0.307009573114879, ppl: 1.6705675015668517, epoch: 0.2771
[2023-12-26 17:44:17,453] [    INFO] - loss: 0.54714298, learning_rate: 0.0002971, global_step: 33, interval_runtime: 2.9726, interval_samples_per_second: 5.382444256551639, interval_steps_per_second: 0.33640276603447744, ppl: 1.7283081464920642, epoch: 0.2857
[2023-12-26 17:44:20,106] [    INFO] - loss: 0.5057705, learning_rate: 0.0002962, global_step: 34, interval_runtime: 2.6522, interval_samples_per_second: 6.03280133930277, interval_steps_per_second: 0.3770500837064231, ppl: 1.6582627197822182, epoch: 0.2944
[2023-12-26 17:44:22,814] [    INFO] - loss: 0.44090223, learning_rate: 0.0002952, global_step: 35, interval_runtime: 2.7086, interval_samples_per_second: 5.907133986294971, interval_steps_per_second: 0.3691958741434357, ppl: 1.5541087497017636, epoch: 0.303
[2023-12-26 17:44:26,301] [    INFO] - loss: 0.41087997, learning_rate: 0.0002943, global_step: 36, interval_runtime: 3.4869, interval_samples_per_second: 4.588559640353671, interval_steps_per_second: 0.2867849775221044, ppl: 1.508144323130449, epoch: 0.3117
[2023-12-26 17:44:29,741] [    INFO] - loss: 0.36454743, learning_rate: 0.0002933, global_step: 37, interval_runtime: 3.4403, interval_samples_per_second: 4.65081652284074, interval_steps_per_second: 0.29067603267754627, ppl: 1.439862222225052, epoch: 0.3203
[2023-12-26 17:44:33,123] [    INFO] - loss: 0.34224176, learning_rate: 0.0002924, global_step: 38, interval_runtime: 3.3821, interval_samples_per_second: 4.730738726062611, interval_steps_per_second: 0.29567117037891316, ppl: 1.40810067878726, epoch: 0.329
[2023-12-26 17:44:36,322] [    INFO] - loss: 0.40164879, learning_rate: 0.0002914, global_step: 39, interval_runtime: 3.1989, interval_samples_per_second: 5.001789822888476, interval_steps_per_second: 0.31261186393052975, ppl: 1.4942864321684426, epoch: 0.3377
[2023-12-26 17:44:39,941] [    INFO] - loss: 0.34734392, learning_rate: 0.0002905, global_step: 40, interval_runtime: 3.6195, interval_samples_per_second: 4.420510938298477, interval_steps_per_second: 0.27628193364365483, ppl: 1.415303392821156, epoch: 0.3463
[2023-12-26 17:44:43,331] [    INFO] - loss: 0.34797683, learning_rate: 0.0002895, global_step: 41, interval_runtime: 3.3899, interval_samples_per_second: 4.719858802727873, interval_steps_per_second: 0.29499117517049206, ppl: 1.4161994360189456, epoch: 0.355
[2023-12-26 17:44:46,994] [    INFO] - loss: 0.3465372, learning_rate: 0.0002886, global_step: 42, interval_runtime: 3.6628, interval_samples_per_second: 4.368268092655978, interval_steps_per_second: 0.2730167557909986, ppl: 1.4141620996819957, epoch: 0.3636
[2023-12-26 17:44:50,066] [    INFO] - loss: 0.35139573, learning_rate: 0.0002876, global_step: 43, interval_runtime: 3.0719, interval_samples_per_second: 5.20858310791659, interval_steps_per_second: 0.3255364442447869, ppl: 1.4210495666020952, epoch: 0.3723
[2023-12-26 17:44:53,690] [    INFO] - loss: 0.3359192, learning_rate: 0.0002867, global_step: 44, interval_runtime: 3.6242, interval_samples_per_second: 4.414716444232063, interval_steps_per_second: 0.27591977776450394, ppl: 1.3992259627854928, epoch: 0.381
[2023-12-26 17:44:57,252] [    INFO] - loss: 0.332609, learning_rate: 0.0002857, global_step: 45, interval_runtime: 3.5619, interval_samples_per_second: 4.492029541467871, interval_steps_per_second: 0.28075184634174194, ppl: 1.3946019025079608, epoch: 0.3896
[2023-12-26 17:45:00,691] [    INFO] - loss: 0.30754462, learning_rate: 0.0002848, global_step: 46, interval_runtime: 3.4393, interval_samples_per_second: 4.6521041994702985, interval_steps_per_second: 0.29075651246689366, ppl: 1.360081493984319, epoch: 0.3983
[2023-12-26 17:45:04,107] [    INFO] - loss: 0.2745533, learning_rate: 0.0002838, global_step: 47, interval_runtime: 3.4151, interval_samples_per_second: 4.685137446638551, interval_steps_per_second: 0.29282109041490945, ppl: 1.3159427119464535, epoch: 0.4069
[2023-12-26 17:45:07,275] [    INFO] - loss: 0.33314824, learning_rate: 0.0002829, global_step: 48, interval_runtime: 3.168, interval_samples_per_second: 5.050457186622707, interval_steps_per_second: 0.3156535741639192, ppl: 1.3953541304353352, epoch: 0.4156
[2023-12-26 17:45:11,373] [    INFO] - loss: 0.30882508, learning_rate: 0.0002819, global_step: 49, interval_runtime: 4.0984, interval_samples_per_second: 3.903920851275084, interval_steps_per_second: 0.24399505320469275, ppl: 1.361824139389874, epoch: 0.4242
[2023-12-26 17:45:14,388] [    INFO] - loss: 0.30945367, learning_rate: 0.000281, global_step: 50, interval_runtime: 3.0149, interval_samples_per_second: 5.306904905470886, interval_steps_per_second: 0.3316815565919304, ppl: 1.3626804375276809, epoch: 0.4329
[2023-12-26 17:45:17,930] [    INFO] - loss: 0.29789728, learning_rate: 0.00028, global_step: 51, interval_runtime: 3.5425, interval_samples_per_second: 4.5166080190902775, interval_steps_per_second: 0.28228800119314235, ppl: 1.34702341452768, epoch: 0.4416
[2023-12-26 17:45:21,610] [    INFO] - loss: 0.28248543, learning_rate: 0.000279, global_step: 52, interval_runtime: 3.6793, interval_samples_per_second: 4.348656011857886, interval_steps_per_second: 0.2717910007411179, ppl: 1.3264224489808771, epoch: 0.4502
[2023-12-26 17:45:25,480] [    INFO] - loss: 0.28851509, learning_rate: 0.0002781, global_step: 53, interval_runtime: 3.8697, interval_samples_per_second: 4.134670933214908, interval_steps_per_second: 0.25841693332593174, ppl: 1.3344444861382638, epoch: 0.4589
[2023-12-26 17:45:28,597] [    INFO] - loss: 0.26777136, learning_rate: 0.0002771, global_step: 54, interval_runtime: 3.1172, interval_samples_per_second: 5.132893716860017, interval_steps_per_second: 0.32080585730375105, ppl: 1.3070482623338415, epoch: 0.4675
[2023-12-26 17:45:31,972] [    INFO] - loss: 0.30797887, learning_rate: 0.0002762, global_step: 55, interval_runtime: 3.3754, interval_samples_per_second: 4.740216348520788, interval_steps_per_second: 0.2962635217825493, ppl: 1.3606722376290126, epoch: 0.4762
[2023-12-26 17:45:35,789] [    INFO] - loss: 0.25563985, learning_rate: 0.0002752, global_step: 56, interval_runtime: 3.8169, interval_samples_per_second: 4.191847577319691, interval_steps_per_second: 0.2619904735824807, ppl: 1.2912875869600262, epoch: 0.4848
[2023-12-26 17:45:39,008] [    INFO] - loss: 0.27637056, learning_rate: 0.0002743, global_step: 57, interval_runtime: 3.2185, interval_samples_per_second: 4.971223970727729, interval_steps_per_second: 0.31070149817048304, ppl: 1.3183362962229255, epoch: 0.4935
[2023-12-26 17:45:42,650] [    INFO] - loss: 0.29341272, learning_rate: 0.0002733, global_step: 58, interval_runtime: 3.63, interval_samples_per_second: 4.407697726782218, interval_steps_per_second: 0.27548110792388864, ppl: 1.3409961321598929, epoch: 0.5022
[2023-12-26 17:45:46,185] [    INFO] - loss: 0.2738843, learning_rate: 0.0002724, global_step: 59, interval_runtime: 3.5479, interval_samples_per_second: 4.50970456331901, interval_steps_per_second: 0.28185653520743814, ppl: 1.3150626406888208, epoch: 0.5108
[2023-12-26 17:45:49,729] [    INFO] - loss: 0.29272103, learning_rate: 0.0002714, global_step: 60, interval_runtime: 3.5435, interval_samples_per_second: 4.51530705583338, interval_steps_per_second: 0.28220669098958623, ppl: 1.3400688992610694, epoch: 0.5195
[2023-12-26 17:45:53,090] [    INFO] - loss: 0.24711998, learning_rate: 0.0002705, global_step: 61, interval_runtime: 3.3613, interval_samples_per_second: 4.76002560569709, interval_steps_per_second: 0.29750160035606815, ppl: 1.280332717882807, epoch: 0.5281
[2023-12-26 17:45:56,060] [    INFO] - loss: 0.27978322, learning_rate: 0.0002695, global_step: 62, interval_runtime: 2.9697, interval_samples_per_second: 5.387787058462534, interval_steps_per_second: 0.33673669115390836, ppl: 1.3228430153437678, epoch: 0.5368
[2023-12-26 17:46:00,180] [    INFO] - loss: 0.26979572, learning_rate: 0.0002686, global_step: 63, interval_runtime: 4.1194, interval_samples_per_second: 3.8840582897572737, interval_steps_per_second: 0.2427536431098296, ppl: 1.3096968785260072, epoch: 0.5455
[2023-12-26 17:46:03,959] [    INFO] - loss: 0.2797547, learning_rate: 0.0002676, global_step: 64, interval_runtime: 3.7798, interval_samples_per_second: 4.233009314355052, interval_steps_per_second: 0.26456308214719076, ppl: 1.322805288398959, epoch: 0.5541
[2023-12-26 17:46:07,112] [    INFO] - loss: 0.26245928, learning_rate: 0.0002667, global_step: 65, interval_runtime: 3.1532, interval_samples_per_second: 5.0742238975114935, interval_steps_per_second: 0.31713899359446834, ppl: 1.3001235260606159, epoch: 0.5628
[2023-12-26 17:46:10,654] [    INFO] - loss: 0.27325338, learning_rate: 0.0002657, global_step: 66, interval_runtime: 3.5412, interval_samples_per_second: 4.518302126492652, interval_steps_per_second: 0.28239388290579076, ppl: 1.314233203049469, epoch: 0.5714
[2023-12-26 17:46:13,738] [    INFO] - loss: 0.29552907, learning_rate: 0.0002648, global_step: 67, interval_runtime: 3.0848, interval_samples_per_second: 5.186665460984193, interval_steps_per_second: 0.32416659131151204, ppl: 1.343837154562674, epoch: 0.5801
[2023-12-26 17:46:16,404] [    INFO] - loss: 0.27431649, learning_rate: 0.0002638, global_step: 68, interval_runtime: 2.665, interval_samples_per_second: 6.003694960613001, interval_steps_per_second: 0.37523093503831256, ppl: 1.3156311204482851, epoch: 0.5887
[2023-12-26 17:46:19,818] [    INFO] - loss: 0.30101836, learning_rate: 0.0002629, global_step: 69, interval_runtime: 3.4145, interval_samples_per_second: 4.685920299323026, interval_steps_per_second: 0.29287001870768914, ppl: 1.351234149969267, epoch: 0.5974
[2023-12-26 17:46:23,446] [    INFO] - loss: 0.26293159, learning_rate: 0.0002619, global_step: 70, interval_runtime: 3.6285, interval_samples_per_second: 4.409583156907597, interval_steps_per_second: 0.2755989473067248, ppl: 1.3007377324396991, epoch: 0.6061
[2023-12-26 17:46:26,805] [    INFO] - loss: 0.25193915, learning_rate: 0.000261, global_step: 71, interval_runtime: 3.3583, interval_samples_per_second: 4.764344747034641, interval_steps_per_second: 0.29777154668966505, ppl: 1.2865177502978775, epoch: 0.6147
[2023-12-26 17:46:29,961] [    INFO] - loss: 0.26743451, learning_rate: 0.00026, global_step: 72, interval_runtime: 3.1567, interval_samples_per_second: 5.068537361968574, interval_steps_per_second: 0.3167835851230359, ppl: 1.3066080572723744, epoch: 0.6234
[2023-12-26 17:46:33,563] [    INFO] - loss: 0.26232645, learning_rate: 0.000259, global_step: 73, interval_runtime: 3.6013, interval_samples_per_second: 4.4428989111833515, interval_steps_per_second: 0.27768118194895947, ppl: 1.299950842121707, epoch: 0.632
[2023-12-26 17:46:36,887] [    INFO] - loss: 0.28119218, learning_rate: 0.0002581, global_step: 74, interval_runtime: 3.3247, interval_samples_per_second: 4.812406825716142, interval_steps_per_second: 0.30077542660725887, ppl: 1.324708161888552, epoch: 0.6407
[2023-12-26 17:46:40,115] [    INFO] - loss: 0.27209201, learning_rate: 0.0002571, global_step: 75, interval_runtime: 3.2275, interval_samples_per_second: 4.957351649423985, interval_steps_per_second: 0.30983447808899905, ppl: 1.312707777997345, epoch: 0.6494
[2023-12-26 17:46:43,942] [    INFO] - loss: 0.27171448, learning_rate: 0.0002562, global_step: 76, interval_runtime: 3.8267, interval_samples_per_second: 4.181108421820734, interval_steps_per_second: 0.26131927636379587, ppl: 1.3122122849675446, epoch: 0.658
[2023-12-26 17:46:47,543] [    INFO] - loss: 0.27416253, learning_rate: 0.0002552, global_step: 77, interval_runtime: 3.6017, interval_samples_per_second: 4.442374228735902, interval_steps_per_second: 0.2776483892959939, ppl: 1.3154285814728313, epoch: 0.6667
[2023-12-26 17:46:51,255] [    INFO] - loss: 0.2439681, learning_rate: 0.0002543, global_step: 78, interval_runtime: 3.7116, interval_samples_per_second: 4.310802079060973, interval_steps_per_second: 0.2694251299413108, ppl: 1.2763036157547154, epoch: 0.6753
[2023-12-26 17:46:54,776] [    INFO] - loss: 0.27000949, learning_rate: 0.0002533, global_step: 79, interval_runtime: 3.5207, interval_samples_per_second: 4.544553290140272, interval_steps_per_second: 0.284034580633767, ppl: 1.3099768823548728, epoch: 0.684
[2023-12-26 17:46:57,390] [    INFO] - loss: 0.28652525, learning_rate: 0.0002524, global_step: 80, interval_runtime: 2.6142, interval_samples_per_second: 6.120512756437044, interval_steps_per_second: 0.38253204727731527, ppl: 1.3317917952124918, epoch: 0.6926
[2023-12-26 17:47:00,768] [    INFO] - loss: 0.25104171, learning_rate: 0.0002514, global_step: 81, interval_runtime: 3.3783, interval_samples_per_second: 4.736054461807919, interval_steps_per_second: 0.29600340386299495, ppl: 1.2853636957328707, epoch: 0.7013
[2023-12-26 17:47:03,697] [    INFO] - loss: 0.2385323, learning_rate: 0.0002505, global_step: 82, interval_runtime: 2.9293, interval_samples_per_second: 5.462112555607908, interval_steps_per_second: 0.3413820347254943, ppl: 1.269384706500266, epoch: 0.71
[2023-12-26 17:47:07,113] [    INFO] - loss: 0.24549332, learning_rate: 0.0002495, global_step: 83, interval_runtime: 3.4153, interval_samples_per_second: 4.684774406946456, interval_steps_per_second: 0.2927984004341535, ppl: 1.2782517448405986, epoch: 0.7186
[2023-12-26 17:47:10,832] [    INFO] - loss: 0.22970712, learning_rate: 0.0002486, global_step: 84, interval_runtime: 3.7191, interval_samples_per_second: 4.302069220702643, interval_steps_per_second: 0.2688793262939152, ppl: 1.258231445133833, epoch: 0.7273
[2023-12-26 17:47:14,400] [    INFO] - loss: 0.25397247, learning_rate: 0.0002476, global_step: 85, interval_runtime: 3.568, interval_samples_per_second: 4.484274570278145, interval_steps_per_second: 0.28026716064238405, ppl: 1.2891363138565606, epoch: 0.7359
[2023-12-26 17:47:17,574] [    INFO] - loss: 0.2352851, learning_rate: 0.0002467, global_step: 86, interval_runtime: 3.1738, interval_samples_per_second: 5.041209940914457, interval_steps_per_second: 0.31507562130715355, ppl: 1.2652694456349067, epoch: 0.7446
[2023-12-26 17:47:20,941] [    INFO] - loss: 0.27693576, learning_rate: 0.0002457, global_step: 87, interval_runtime: 3.3667, interval_samples_per_second: 4.752486174464305, interval_steps_per_second: 0.2970303859040191, ppl: 1.3190816305091784, epoch: 0.7532
[2023-12-26 17:47:24,730] [    INFO] - loss: 0.31699374, learning_rate: 0.0002448, global_step: 88, interval_runtime: 3.7896, interval_samples_per_second: 4.222126759815494, interval_steps_per_second: 0.2638829224884684, ppl: 1.3729939769562607, epoch: 0.7619
[2023-12-26 17:47:27,797] [    INFO] - loss: 0.32770675, learning_rate: 0.0002438, global_step: 89, interval_runtime: 3.0667, interval_samples_per_second: 5.217395063161403, interval_steps_per_second: 0.3260871914475877, ppl: 1.3877819455565001, epoch: 0.7706
[2023-12-26 17:47:30,898] [    INFO] - loss: 0.22880366, learning_rate: 0.0002429, global_step: 90, interval_runtime: 3.1017, interval_samples_per_second: 5.158516492850441, interval_steps_per_second: 0.32240728080315256, ppl: 1.2570951967072017, epoch: 0.7792
[2023-12-26 17:47:34,383] [    INFO] - loss: 0.22428387, learning_rate: 0.0002419, global_step: 91, interval_runtime: 3.4843, interval_samples_per_second: 4.591995491543428, interval_steps_per_second: 0.28699971822146425, ppl: 1.2514262113704304, epoch: 0.7879
[2023-12-26 17:47:39,494] [    INFO] - loss: 0.26413378, learning_rate: 0.000241, global_step: 92, interval_runtime: 5.1112, interval_samples_per_second: 3.1303636372371897, interval_steps_per_second: 0.19564772732732436, ppl: 1.3023024066636666, epoch: 0.7965
[2023-12-26 17:47:43,067] [    INFO] - loss: 0.27162313, learning_rate: 0.00024, global_step: 93, interval_runtime: 3.5728, interval_samples_per_second: 4.478319289217104, interval_steps_per_second: 0.279894955576069, ppl: 1.3120924198502355, epoch: 0.8052
[2023-12-26 17:47:45,606] [    INFO] - loss: 0.26476258, learning_rate: 0.000239, global_step: 94, interval_runtime: 2.5391, interval_samples_per_second: 6.301347343074687, interval_steps_per_second: 0.3938342089421679, ppl: 1.303121551929258, epoch: 0.8139
[2023-12-26 17:47:48,807] [    INFO] - loss: 0.3001802, learning_rate: 0.0002381, global_step: 95, interval_runtime: 3.2009, interval_samples_per_second: 4.998657318996902, interval_steps_per_second: 0.3124160824373064, ppl: 1.3501020740507794, epoch: 0.8225
[2023-12-26 17:47:52,331] [    INFO] - loss: 0.2234904, learning_rate: 0.0002371, global_step: 96, interval_runtime: 3.5241, interval_samples_per_second: 4.540179100277009, interval_steps_per_second: 0.28376119376731307, ppl: 1.2504336360559385, epoch: 0.8312
[2023-12-26 17:47:55,521] [    INFO] - loss: 0.27073395, learning_rate: 0.0002362, global_step: 97, interval_runtime: 3.1898, interval_samples_per_second: 5.0159280647981275, interval_steps_per_second: 0.31349550404988297, ppl: 1.3109262520557279, epoch: 0.8398
[2023-12-26 17:47:59,074] [    INFO] - loss: 0.26731879, learning_rate: 0.0002352, global_step: 98, interval_runtime: 3.553, interval_samples_per_second: 4.503248218197377, interval_steps_per_second: 0.28145301363733605, ppl: 1.3064568653361208, epoch: 0.8485
[2023-12-26 17:48:03,000] [    INFO] - loss: 0.25308639, learning_rate: 0.0002343, global_step: 99, interval_runtime: 3.9263, interval_samples_per_second: 4.075121221879381, interval_steps_per_second: 0.2546950763674613, ppl: 1.2879945418769403, epoch: 0.8571
[2023-12-26 17:48:06,504] [    INFO] - loss: 0.2433984, learning_rate: 0.0002333, global_step: 100, interval_runtime: 3.5039, interval_samples_per_second: 4.566329732828105, interval_steps_per_second: 0.2853956083017566, ppl: 1.275576712662826, epoch: 0.8658
[2023-12-26 17:48:10,291] [    INFO] - loss: 0.26445276, learning_rate: 0.0002324, global_step: 101, interval_runtime: 3.7869, interval_samples_per_second: 4.2250563366120515, interval_steps_per_second: 0.2640660210382532, ppl: 1.3027178813458784, epoch: 0.8745
[2023-12-26 17:48:13,416] [    INFO] - loss: 0.24420807, learning_rate: 0.0002314, global_step: 102, interval_runtime: 3.1254, interval_samples_per_second: 5.119266901857392, interval_steps_per_second: 0.319954181366087, ppl: 1.2766099270846831, epoch: 0.8831
[2023-12-26 17:48:17,553] [    INFO] - loss: 0.2597208, learning_rate: 0.0002305, global_step: 103, interval_runtime: 4.1361, interval_samples_per_second: 3.8683560753211825, interval_steps_per_second: 0.2417722547075739, ppl: 1.2965680343304327, epoch: 0.8918
[2023-12-26 17:48:21,048] [    INFO] - loss: 0.24676055, learning_rate: 0.0002295, global_step: 104, interval_runtime: 3.4953, interval_samples_per_second: 4.577567099565918, interval_steps_per_second: 0.2860979437228699, ppl: 1.2798726105871545, epoch: 0.9004
[2023-12-26 17:48:24,069] [    INFO] - loss: 0.22971785, learning_rate: 0.0002286, global_step: 105, interval_runtime: 3.0213, interval_samples_per_second: 5.295789070145969, interval_steps_per_second: 0.33098681688412307, ppl: 1.2582449460296714, epoch: 0.9091
[2023-12-26 17:48:26,900] [    INFO] - loss: 0.25949037, learning_rate: 0.0002276, global_step: 106, interval_runtime: 2.8308, interval_samples_per_second: 5.652026202735068, interval_steps_per_second: 0.35325163767094175, ppl: 1.296269300578213, epoch: 0.9177
[2023-12-26 17:48:29,737] [    INFO] - loss: 0.26405108, learning_rate: 0.0002267, global_step: 107, interval_runtime: 2.8369, interval_samples_per_second: 5.639943002919188, interval_steps_per_second: 0.35249643768244926, ppl: 1.3021947107079246, epoch: 0.9264
[2023-12-26 17:48:32,962] [    INFO] - loss: 0.30479836, learning_rate: 0.0002257, global_step: 108, interval_runtime: 3.2252, interval_samples_per_second: 4.9609837270178225, interval_steps_per_second: 0.3100614829386139, ppl: 1.3563514807180617, epoch: 0.9351
[2023-12-26 17:48:36,223] [    INFO] - loss: 0.291565, learning_rate: 0.0002248, global_step: 109, interval_runtime: 3.2612, interval_samples_per_second: 4.906185363514207, interval_steps_per_second: 0.30663658521963794, ppl: 1.338520634504136, epoch: 0.9437
[2023-12-26 17:48:39,279] [    INFO] - loss: 0.22575521, learning_rate: 0.0002238, global_step: 110, interval_runtime: 3.0558, interval_samples_per_second: 5.236013050696715, interval_steps_per_second: 0.3272508156685447, ppl: 1.2532688400464898, epoch: 0.9524
[2023-12-26 17:48:43,107] [    INFO] - loss: 0.2475778, learning_rate: 0.0002229, global_step: 111, interval_runtime: 3.8281, interval_samples_per_second: 4.1796288008266975, interval_steps_per_second: 0.2612268000516686, ppl: 1.2809190140065132, epoch: 0.961
[2023-12-26 17:48:46,229] [    INFO] - loss: 0.25429082, learning_rate: 0.0002219, global_step: 112, interval_runtime: 3.1222, interval_samples_per_second: 5.124616267721022, interval_steps_per_second: 0.32028851673256387, ppl: 1.2895467757338794, epoch: 0.9697
[2023-12-26 17:48:49,501] [    INFO] - loss: 0.28627616, learning_rate: 0.000221, global_step: 113, interval_runtime: 3.2718, interval_samples_per_second: 4.890284026761988, interval_steps_per_second: 0.3056427516726242, ppl: 1.3314601005068545, epoch: 0.9784
[2023-12-26 17:48:52,786] [    INFO] - loss: 0.26187339, learning_rate: 0.00022, global_step: 114, interval_runtime: 3.2853, interval_samples_per_second: 4.870233688813434, interval_steps_per_second: 0.3043896055508396, ppl: 1.2993620197891702, epoch: 0.987
[2023-12-26 17:48:56,188] [    INFO] - loss: 0.26199824, learning_rate: 0.000219, global_step: 115, interval_runtime: 3.401, interval_samples_per_second: 4.704479187251886, interval_steps_per_second: 0.2940299492032429, ppl: 1.2995242552646797, epoch: 0.9957
[2023-12-26 17:48:57,346] [    INFO] - ***** Running Evaluation *****
[2023-12-26 17:48:57,354] [    INFO] -   Num examples = 206
[2023-12-26 17:48:57,354] [    INFO] -   Total prediction steps = 26
[2023-12-26 17:48:57,354] [    INFO] -   Pre device batch size = 8
[2023-12-26 17:48:57,354] [    INFO] -   Total Batch size = 8
[2023-12-26 17:49:11,883] [    INFO] - eval_loss: 0.25766491889953613, eval_accuracy: 0.9982474588152822, eval_runtime: 14.5363, eval_samples_per_second: 14.171448478326102, eval_steps_per_second: 1.7886294195945565, eval_ppl: 1.2939051828041612, epoch: 0.9957
[2023-12-26 17:49:11,884] [    INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-115
[2023-12-26 17:49:11,885] [    INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/tokenizer_config.json
[2023-12-26 17:49:11,885] [    INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/special_tokens_map.json
[2023-12-26 17:49:11,887] [    INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/chat_template.json
[2023-12-26 17:49:12,064] [    INFO] - Saving optimizer files.
[2023-12-26 17:49:16,238] [    INFO] - loss: 0.42286861, learning_rate: 0.0002181, global_step: 116, interval_runtime: 20.0504, interval_samples_per_second: 0.7979880325215534, interval_steps_per_second: 0.049874252032597086, ppl: 1.526333737801267, epoch: 1.0087
[2023-12-26 17:49:19,708] [    INFO] - loss: 0.27453929, learning_rate: 0.0002171, global_step: 117, interval_runtime: 3.4699, interval_samples_per_second: 4.611077381935594, interval_steps_per_second: 0.28819233637097463, ppl: 1.3159242757182052, epoch: 1.0173
[2023-12-26 17:49:22,841] [    INFO] - loss: 0.23074579, learning_rate: 0.0002162, global_step: 118, interval_runtime: 3.1331, interval_samples_per_second: 5.106751557924221, interval_steps_per_second: 0.3191719723702638, ppl: 1.2595390113362899, epoch: 1.026
[2023-12-26 17:49:26,209] [    INFO] - loss: 0.24562941, learning_rate: 0.0002152, global_step: 119, interval_runtime: 3.368, interval_samples_per_second: 4.750556789404557, interval_steps_per_second: 0.29690979933778483, ppl: 1.2784257139580142, epoch: 1.0346
[2023-12-26 17:49:29,819] [    INFO] - loss: 0.24127965, learning_rate: 0.0002143, global_step: 120, interval_runtime: 3.6097, interval_samples_per_second: 4.432549666070322, interval_steps_per_second: 0.27703435412939514, ppl: 1.272876945578587, epoch: 1.0433
[2023-12-26 17:49:34,017] [    INFO] - loss: 0.26145872, learning_rate: 0.0002133, global_step: 121, interval_runtime: 4.1987, interval_samples_per_second: 3.8107239140857154, interval_steps_per_second: 0.23817024463035721, ppl: 1.2988233250384196, epoch: 1.0519
[2023-12-26 17:49:37,157] [    INFO] - loss: 0.24270561, learning_rate: 0.0002124, global_step: 122, interval_runtime: 3.1392, interval_samples_per_second: 5.096851230335854, interval_steps_per_second: 0.3185532018959909, ppl: 1.274693311912996, epoch: 1.0606
[2023-12-26 17:49:40,476] [    INFO] - loss: 0.24763, learning_rate: 0.0002114, global_step: 123, interval_runtime: 3.319, interval_samples_per_second: 4.820663331917016, interval_steps_per_second: 0.3012914582448135, ppl: 1.2809858797242244, epoch: 1.0693
[2023-12-26 17:49:43,025] [    INFO] - loss: 0.23295119, learning_rate: 0.0002105, global_step: 124, interval_runtime: 2.5489, interval_samples_per_second: 6.277124540034533, interval_steps_per_second: 0.3923202837521583, ppl: 1.2623198639909898, epoch: 1.0779
[2023-12-26 17:49:46,303] [    INFO] - loss: 0.27050617, learning_rate: 0.0002095, global_step: 125, interval_runtime: 3.2785, interval_samples_per_second: 4.88032202089331, interval_steps_per_second: 0.30502012630583186, ppl: 1.3106276832793233, epoch: 1.0866
[2023-12-26 17:49:49,566] [    INFO] - loss: 0.24398029, learning_rate: 0.0002086, global_step: 126, interval_runtime: 3.2631, interval_samples_per_second: 4.90330863858365, interval_steps_per_second: 0.3064567899114781, ppl: 1.2763191739906188, epoch: 1.0952
[2023-12-26 17:49:53,029] [    INFO] - loss: 0.24883732, learning_rate: 0.0002076, global_step: 127, interval_runtime: 3.4632, interval_samples_per_second: 4.620060157492262, interval_steps_per_second: 0.2887537598432664, ppl: 1.2825333735686955, epoch: 1.1039
[2023-12-26 17:49:56,814] [    INFO] - loss: 0.24923915, learning_rate: 0.0002067, global_step: 128, interval_runtime: 3.7849, interval_samples_per_second: 4.227313505925205, interval_steps_per_second: 0.26420709412032534, ppl: 1.2830488375116988, epoch: 1.1126
[2023-12-26 17:50:00,253] [    INFO] - loss: 0.25661612, learning_rate: 0.0002057, global_step: 129, interval_runtime: 3.4389, interval_samples_per_second: 4.6527112086549245, interval_steps_per_second: 0.2907944505409328, ppl: 1.29254884785796, epoch: 1.1212
[2023-12-26 17:50:03,993] [    INFO] - loss: 0.24869432, learning_rate: 0.0002048, global_step: 130, interval_runtime: 3.7395, interval_samples_per_second: 4.278694837629995, interval_steps_per_second: 0.2674184273518747, ppl: 1.2823499844089126, epoch: 1.1299
[2023-12-26 17:50:08,240] [    INFO] - loss: 0.26458281, learning_rate: 0.0002038, global_step: 131, interval_runtime: 4.2476, interval_samples_per_second: 3.7668204909298133, interval_steps_per_second: 0.23542628068311333, ppl: 1.3028873108232604, epoch: 1.1385
[2023-12-26 17:50:11,384] [    INFO] - loss: 0.25587648, learning_rate: 0.0002029, global_step: 132, interval_runtime: 3.144, interval_samples_per_second: 5.089023319435538, interval_steps_per_second: 0.3180639574647211, ppl: 1.2915931804966019, epoch: 1.1472
[2023-12-26 17:50:14,954] [    INFO] - loss: 0.26866463, learning_rate: 0.0002019, global_step: 133, interval_runtime: 3.5699, interval_samples_per_second: 4.481976585800716, interval_steps_per_second: 0.28012353661254474, ppl: 1.3082163309577965, epoch: 1.1558
[2023-12-26 17:50:17,765] [    INFO] - loss: 0.24390888, learning_rate: 0.000201, global_step: 134, interval_runtime: 2.811, interval_samples_per_second: 5.691881395841557, interval_steps_per_second: 0.35574258724009733, ppl: 1.2762280352925501, epoch: 1.1645
[2023-12-26 17:50:21,071] [    INFO] - loss: 0.25758892, learning_rate: 0.0002, global_step: 135, interval_runtime: 3.3064, interval_samples_per_second: 4.839085183257765, interval_steps_per_second: 0.30244282395361033, ppl: 1.2938068511707594, epoch: 1.1732
[2023-12-26 17:50:24,536] [    INFO] - loss: 0.26233327, learning_rate: 0.000199, global_step: 136, interval_runtime: 3.4641, interval_samples_per_second: 4.618844516366238, interval_steps_per_second: 0.28867778227288987, ppl: 1.2999597078166822, epoch: 1.1818
[2023-12-26 17:50:27,393] [    INFO] - loss: 0.23059112, learning_rate: 0.0001981, global_step: 137, interval_runtime: 2.8578, interval_samples_per_second: 5.598641286619746, interval_steps_per_second: 0.34991508041373415, ppl: 1.2593442135024853, epoch: 1.1905
[2023-12-26 17:50:30,817] [    INFO] - loss: 0.25141767, learning_rate: 0.0001971, global_step: 138, interval_runtime: 3.4232, interval_samples_per_second: 4.674031065873333, interval_steps_per_second: 0.2921269416170833, ppl: 1.2858470319197617, epoch: 1.1991
[2023-12-26 17:50:34,215] [    INFO] - loss: 0.24990171, learning_rate: 0.0001962, global_step: 139, interval_runtime: 3.3985, interval_samples_per_second: 4.707977607017454, interval_steps_per_second: 0.29424860043859086, ppl: 1.2838992160317682, epoch: 1.2078
[2023-12-26 17:50:37,795] [    INFO] - loss: 0.27762073, learning_rate: 0.0001952, global_step: 140, interval_runtime: 3.58, interval_samples_per_second: 4.469330688584939, interval_steps_per_second: 0.27933316803655867, ppl: 1.3199854713702266, epoch: 1.2165
[2023-12-26 17:50:40,972] [    INFO] - loss: 0.25949112, learning_rate: 0.0001943, global_step: 141, interval_runtime: 3.1773, interval_samples_per_second: 5.035766780695585, interval_steps_per_second: 0.3147354237934741, ppl: 1.296270272780553, epoch: 1.2251
[2023-12-26 17:50:44,141] [    INFO] - loss: 0.27442539, learning_rate: 0.0001933, global_step: 142, interval_runtime: 3.1688, interval_samples_per_second: 5.049256401457214, interval_steps_per_second: 0.31557852509107587, ppl: 1.315774400478758, epoch: 1.2338
[2023-12-26 17:50:47,584] [    INFO] - loss: 0.24077226, learning_rate: 0.0001924, global_step: 143, interval_runtime: 3.4431, interval_samples_per_second: 4.647037890792138, interval_steps_per_second: 0.2904398681745086, ppl: 1.2722312643651177, epoch: 1.2424
[2023-12-26 17:50:51,579] [    INFO] - loss: 0.22204766, learning_rate: 0.0001914, global_step: 144, interval_runtime: 3.9946, interval_samples_per_second: 4.005430431401292, interval_steps_per_second: 0.25033940196258075, ppl: 1.2486308861942246, epoch: 1.2511
[2023-12-26 17:50:54,459] [    INFO] - loss: 0.19586501, learning_rate: 0.0001905, global_step: 145, interval_runtime: 2.8804, interval_samples_per_second: 5.554750144996249, interval_steps_per_second: 0.34717188406226557, ppl: 1.2163626974508257, epoch: 1.2597
[2023-12-26 17:50:58,505] [    INFO] - loss: 0.25291246, learning_rate: 0.0001895, global_step: 146, interval_runtime: 4.0459, interval_samples_per_second: 3.9546168944965663, interval_steps_per_second: 0.2471635559060354, ppl: 1.2877705404671191, epoch: 1.2684
[2023-12-26 17:51:01,899] [    INFO] - loss: 0.25938711, learning_rate: 0.0001886, global_step: 147, interval_runtime: 3.3934, interval_samples_per_second: 4.715020224835305, interval_steps_per_second: 0.29468876405220656, ppl: 1.2961354547208157, epoch: 1.2771
[2023-12-26 17:51:05,917] [    INFO] - loss: 0.24724537, learning_rate: 0.0001876, global_step: 148, interval_runtime: 3.7245, interval_samples_per_second: 4.295922984210759, interval_steps_per_second: 0.26849518651317245, ppl: 1.2804932688678359, epoch: 1.2857
[2023-12-26 17:51:09,845] [    INFO] - loss: 0.23437944, learning_rate: 0.0001867, global_step: 149, interval_runtime: 4.2219, interval_samples_per_second: 3.7897346655173596, interval_steps_per_second: 0.23685841659483498, ppl: 1.2641240604518345, epoch: 1.2944
[2023-12-26 17:51:12,408] [    INFO] - loss: 0.26815447, learning_rate: 0.0001857, global_step: 150, interval_runtime: 2.563, interval_samples_per_second: 6.242649019140541, interval_steps_per_second: 0.3901655636962838, ppl: 1.3075491015257499, epoch: 1.303
[2023-12-26 17:51:16,165] [    INFO] - loss: 0.27968669, learning_rate: 0.0001848, global_step: 151, interval_runtime: 3.756, interval_samples_per_second: 4.259878168341987, interval_steps_per_second: 0.2662423855213742, ppl: 1.3227153274704508, epoch: 1.3117
[2023-12-26 17:51:19,702] [    INFO] - loss: 0.25936183, learning_rate: 0.0001838, global_step: 152, interval_runtime: 3.5382, interval_samples_per_second: 4.522101530478644, interval_steps_per_second: 0.28263134565491527, ppl: 1.296102688830683, epoch: 1.3203
[2023-12-26 17:51:23,326] [    INFO] - loss: 0.28054172, learning_rate: 0.0001829, global_step: 153, interval_runtime: 3.6235, interval_samples_per_second: 4.415651502541165, interval_steps_per_second: 0.2759782189088228, ppl: 1.323846772397645, epoch: 1.329
[2023-12-26 17:51:26,726] [    INFO] - loss: 0.27994087, learning_rate: 0.0001819, global_step: 154, interval_runtime: 3.3999, interval_samples_per_second: 4.705982870381181, interval_steps_per_second: 0.2941239293988238, ppl: 1.3230515779846548, epoch: 1.3377
[2023-12-26 17:51:30,480] [    INFO] - loss: 0.24816436, learning_rate: 0.000181, global_step: 155, interval_runtime: 3.754, interval_samples_per_second: 4.262152946368684, interval_steps_per_second: 0.26638455914804277, ppl: 1.2816705702582385, epoch: 1.3463
[2023-12-26 17:51:33,859] [    INFO] - loss: 0.27234039, learning_rate: 0.00018, global_step: 156, interval_runtime: 3.3798, interval_samples_per_second: 4.734005144076394, interval_steps_per_second: 0.29587532150477464, ppl: 1.3130338688507908, epoch: 1.355
[2023-12-26 17:51:36,960] [    INFO] - loss: 0.27089909, learning_rate: 0.000179, global_step: 157, interval_runtime: 3.1011, interval_samples_per_second: 5.159453649892815, interval_steps_per_second: 0.32246585311830095, ppl: 1.3111427562932552, epoch: 1.3636
[2023-12-26 17:51:40,082] [    INFO] - loss: 0.21484688, learning_rate: 0.0001781, global_step: 158, interval_runtime: 3.121, interval_samples_per_second: 5.126642982820146, interval_steps_per_second: 0.3204151864262591, ppl: 1.239672063846393, epoch: 1.3723
[2023-12-26 17:51:44,127] [    INFO] - loss: 0.25808209, learning_rate: 0.0001771, global_step: 159, interval_runtime: 4.046, interval_samples_per_second: 3.9545353324650216, interval_steps_per_second: 0.24715845827906385, ppl: 1.2944450752591026, epoch: 1.381
[2023-12-26 17:51:47,282] [    INFO] - loss: 0.25623158, learning_rate: 0.0001762, global_step: 160, interval_runtime: 3.1549, interval_samples_per_second: 5.071501292570801, interval_steps_per_second: 0.31696883078567506, ppl: 1.2920519066770093, epoch: 1.3896
[2023-12-26 17:51:50,774] [    INFO] - loss: 0.25370789, learning_rate: 0.0001752, global_step: 161, interval_runtime: 3.4915, interval_samples_per_second: 4.582599389153949, interval_steps_per_second: 0.2864124618221218, ppl: 1.2887952792880928, epoch: 1.3983
[2023-12-26 17:51:54,296] [    INFO] - loss: 0.2352947, learning_rate: 0.0001743, global_step: 162, interval_runtime: 3.5221, interval_samples_per_second: 4.542725048925997, interval_steps_per_second: 0.28392031555787484, ppl: 1.2652815922798888, epoch: 1.4069
[2023-12-26 17:51:58,034] [    INFO] - loss: 0.25839075, learning_rate: 0.0001733, global_step: 163, interval_runtime: 3.7382, interval_samples_per_second: 4.280177741040718, interval_steps_per_second: 0.26751110881504486, ppl: 1.2948446803439122, epoch: 1.4156
[2023-12-26 17:52:01,288] [    INFO] - loss: 0.2666446, learning_rate: 0.0001724, global_step: 164, interval_runtime: 3.2534, interval_samples_per_second: 4.917902685289206, interval_steps_per_second: 0.30736891783057535, ppl: 1.3055763620286938, epoch: 1.4242
[2023-12-26 17:52:03,999] [    INFO] - loss: 0.25938639, learning_rate: 0.0001714, global_step: 165, interval_runtime: 2.7114, interval_samples_per_second: 5.901048876214106, interval_steps_per_second: 0.3688155547633816, ppl: 1.2961345215036244, epoch: 1.4329
[2023-12-26 17:52:07,377] [    INFO] - loss: 0.24273644, learning_rate: 0.0001705, global_step: 166, interval_runtime: 3.3783, interval_samples_per_second: 4.736141699063666, interval_steps_per_second: 0.29600885619147915, ppl: 1.2747326113135993, epoch: 1.4416
[2023-12-26 17:52:10,345] [    INFO] - loss: 0.28284302, learning_rate: 0.0001695, global_step: 167, interval_runtime: 2.9683, interval_samples_per_second: 5.3902156492231486, interval_steps_per_second: 0.3368884780764468, ppl: 1.3268968491997402, epoch: 1.4502
[2023-12-26 17:52:14,356] [    INFO] - loss: 0.2741535, learning_rate: 0.0001686, global_step: 168, interval_runtime: 4.0106, interval_samples_per_second: 3.9894091906581686, interval_steps_per_second: 0.24933807441613554, ppl: 1.315416703206371, epoch: 1.4589
[2023-12-26 17:52:17,175] [    INFO] - loss: 0.26654774, learning_rate: 0.0001676, global_step: 169, interval_runtime: 2.8192, interval_samples_per_second: 5.675461935112414, interval_steps_per_second: 0.35471637094452585, ppl: 1.305449910026437, epoch: 1.4675
[2023-12-26 17:52:20,552] [    INFO] - loss: 0.26166382, learning_rate: 0.0001667, global_step: 170, interval_runtime: 3.3769, interval_samples_per_second: 4.738139341921953, interval_steps_per_second: 0.2961337088701221, ppl: 1.29908974102241, epoch: 1.4762
[2023-12-26 17:52:23,081] [    INFO] - loss: 0.25510639, learning_rate: 0.0001657, global_step: 171, interval_runtime: 2.5285, interval_samples_per_second: 6.327875295277558, interval_steps_per_second: 0.39549220595484735, ppl: 1.2905989203882529, epoch: 1.4848
[2023-12-26 17:52:25,945] [    INFO] - loss: 0.24863556, learning_rate: 0.0001648, global_step: 172, interval_runtime: 2.8643, interval_samples_per_second: 5.585921939935049, interval_steps_per_second: 0.34912012124594055, ppl: 1.2822746357375945, epoch: 1.4935
[2023-12-26 17:52:28,796] [    INFO] - loss: 0.23852955, learning_rate: 0.0001638, global_step: 173, interval_runtime: 2.8505, interval_samples_per_second: 5.6130172682452075, interval_steps_per_second: 0.35081357926532547, ppl: 1.269381215697123, epoch: 1.5022
[2023-12-26 17:52:31,756] [    INFO] - loss: 0.23728807, learning_rate: 0.0001629, global_step: 174, interval_runtime: 2.9605, interval_samples_per_second: 5.404557332604875, interval_steps_per_second: 0.3377848332878047, ppl: 1.267806282132004, epoch: 1.5108
[2023-12-26 17:52:35,203] [    INFO] - loss: 0.22020681, learning_rate: 0.0001619, global_step: 175, interval_runtime: 3.4468, interval_samples_per_second: 4.6420115031974465, interval_steps_per_second: 0.2901257189498404, ppl: 1.2463344583654559, epoch: 1.5195
[2023-12-26 17:52:38,768] [    INFO] - loss: 0.25036472, learning_rate: 0.000161, global_step: 176, interval_runtime: 3.5652, interval_samples_per_second: 4.487882178626485, interval_steps_per_second: 0.2804926361641553, ppl: 1.2844938118490652, epoch: 1.5281
[2023-12-26 17:52:43,140] [    INFO] - loss: 0.23791191, learning_rate: 0.00016, global_step: 177, interval_runtime: 4.3724, interval_samples_per_second: 3.6593169006811537, interval_steps_per_second: 0.2287073062925721, ppl: 1.2685974371544657, epoch: 1.5368
[2023-12-26 17:52:46,225] [    INFO] - loss: 0.20882736, learning_rate: 0.000159, global_step: 178, interval_runtime: 3.0847, interval_samples_per_second: 5.186969734280591, interval_steps_per_second: 0.32418560839253696, ppl: 1.232232247590898, epoch: 1.5455
[2023-12-26 17:52:50,148] [    INFO] - loss: 0.27170029, learning_rate: 0.0001581, global_step: 179, interval_runtime: 3.9226, interval_samples_per_second: 4.078900434792255, interval_steps_per_second: 0.2549312771745159, ppl: 1.3121936648073314, epoch: 1.5541
[2023-12-26 17:52:54,298] [    INFO] - loss: 0.28896001, learning_rate: 0.0001571, global_step: 180, interval_runtime: 4.1502, interval_samples_per_second: 3.8552488398152898, interval_steps_per_second: 0.2409530524884556, ppl: 1.3350383392778098, epoch: 1.5628
[2023-12-26 17:52:57,714] [    INFO] - loss: 0.25084904, learning_rate: 0.0001562, global_step: 181, interval_runtime: 3.4157, interval_samples_per_second: 4.684270169488852, interval_steps_per_second: 0.29276688559305325, ppl: 1.2851160685655432, epoch: 1.5714
[2023-12-26 17:53:01,407] [    INFO] - loss: 0.22257608, learning_rate: 0.0001552, global_step: 182, interval_runtime: 3.6934, interval_samples_per_second: 4.332054394895487, interval_steps_per_second: 0.27075339968096795, ppl: 1.2492908620839802, epoch: 1.5801
[2023-12-26 17:53:04,779] [    INFO] - loss: 0.26296413, learning_rate: 0.0001543, global_step: 183, interval_runtime: 3.3723, interval_samples_per_second: 4.744499599528044, interval_steps_per_second: 0.29653122497050277, ppl: 1.3007800591341643, epoch: 1.5887
[2023-12-26 17:53:07,778] [    INFO] - loss: 0.2423287, learning_rate: 0.0001533, global_step: 184, interval_runtime: 2.9986, interval_samples_per_second: 5.335845010280282, interval_steps_per_second: 0.33349031314251765, ppl: 1.2742129577876262, epoch: 1.5974
[2023-12-26 17:53:10,936] [    INFO] - loss: 0.27385578, learning_rate: 0.0001524, global_step: 185, interval_runtime: 3.1584, interval_samples_per_second: 5.065927928991564, interval_steps_per_second: 0.31662049556197275, ppl: 1.315025135637133, epoch: 1.6061
[2023-12-26 17:53:14,781] [    INFO] - loss: 0.27811337, learning_rate: 0.0001514, global_step: 186, interval_runtime: 3.8442, interval_samples_per_second: 4.1621134343028325, interval_steps_per_second: 0.26013208964392703, ppl: 1.3206359092155378, epoch: 1.6147
[2023-12-26 17:53:18,501] [    INFO] - loss: 0.24453022, learning_rate: 0.0001505, global_step: 187, interval_runtime: 3.7206, interval_samples_per_second: 4.300356433877232, interval_steps_per_second: 0.268772277117327, ppl: 1.277021253223494, epoch: 1.6234
[2023-12-26 17:53:22,207] [    INFO] - loss: 0.28027064, learning_rate: 0.0001495, global_step: 188, interval_runtime: 3.7061, interval_samples_per_second: 4.3172589965647905, interval_steps_per_second: 0.2698286872852994, ppl: 1.323487952651209, epoch: 1.632
[2023-12-26 17:53:25,552] [    INFO] - loss: 0.25416121, learning_rate: 0.0001486, global_step: 189, interval_runtime: 3.345, interval_samples_per_second: 4.783321627266378, interval_steps_per_second: 0.29895760170414865, ppl: 1.289379648407197, epoch: 1.6407
[2023-12-26 17:53:28,571] [    INFO] - loss: 0.29258764, learning_rate: 0.0001476, global_step: 190, interval_runtime: 3.0188, interval_samples_per_second: 5.3001757277696475, interval_steps_per_second: 0.33126098298560297, ppl: 1.3398901593919177, epoch: 1.6494
[2023-12-26 17:53:31,835] [    INFO] - loss: 0.26587084, learning_rate: 0.0001467, global_step: 191, interval_runtime: 3.2637, interval_samples_per_second: 4.902365879014341, interval_steps_per_second: 0.3063978674383963, ppl: 1.3045665499892738, epoch: 1.658
[2023-12-26 17:53:35,816] [    INFO] - loss: 0.22777697, learning_rate: 0.0001457, global_step: 192, interval_runtime: 3.9808, interval_samples_per_second: 4.019260435176305, interval_steps_per_second: 0.25120377719851905, ppl: 1.2558052119602279, epoch: 1.6667
[2023-12-26 17:53:40,226] [    INFO] - loss: 0.25831285, learning_rate: 0.0001448, global_step: 193, interval_runtime: 4.4106, interval_samples_per_second: 3.627641659480362, interval_steps_per_second: 0.22672760371752262, ppl: 1.2947438158720355, epoch: 1.6753
[2023-12-26 17:53:43,513] [    INFO] - loss: 0.27399379, learning_rate: 0.0001438, global_step: 194, interval_runtime: 3.2865, interval_samples_per_second: 4.868390826319412, interval_steps_per_second: 0.30427442664496324, ppl: 1.3152066347801625, epoch: 1.684
[2023-12-26 17:53:47,115] [    INFO] - loss: 0.25389808, learning_rate: 0.0001429, global_step: 195, interval_runtime: 3.6019, interval_samples_per_second: 4.44214133768107, interval_steps_per_second: 0.2776338336050669, ppl: 1.2890404185730422, epoch: 1.6926
[2023-12-26 17:53:50,539] [    INFO] - loss: 0.23455918, learning_rate: 0.0001419, global_step: 196, interval_runtime: 3.4245, interval_samples_per_second: 4.672247791644994, interval_steps_per_second: 0.2920154869778121, ppl: 1.264351294531375, epoch: 1.7013
[2023-12-26 17:53:54,229] [    INFO] - loss: 0.24967569, learning_rate: 0.000141, global_step: 197, interval_runtime: 3.6895, interval_samples_per_second: 4.336603126697901, interval_steps_per_second: 0.27103769541861883, ppl: 1.2836090619225116, epoch: 1.71
[2023-12-26 17:53:58,033] [    INFO] - loss: 0.27436778, learning_rate: 0.00014, global_step: 198, interval_runtime: 3.8043, interval_samples_per_second: 4.2057209552481405, interval_steps_per_second: 0.2628575597030088, ppl: 1.3156986008989742, epoch: 1.7186
[2023-12-26 17:54:01,206] [    INFO] - loss: 0.27817079, learning_rate: 0.000139, global_step: 199, interval_runtime: 3.1734, interval_samples_per_second: 5.041898881432306, interval_steps_per_second: 0.3151186800895191, ppl: 1.3207117423065922, epoch: 1.7273
[2023-12-26 17:54:04,930] [    INFO] - loss: 0.22592455, learning_rate: 0.0001381, global_step: 200, interval_runtime: 3.7237, interval_samples_per_second: 4.296812519168232, interval_steps_per_second: 0.2685507824480145, ppl: 1.2534810865622685, epoch: 1.7359
[2023-12-26 17:54:08,118] [    INFO] - loss: 0.2252913, learning_rate: 0.0001371, global_step: 201, interval_runtime: 3.1881, interval_samples_per_second: 5.018602571847399, interval_steps_per_second: 0.31366266074046245, ppl: 1.2526875709376046, epoch: 1.7446
[2023-12-26 17:54:11,897] [    INFO] - loss: 0.24922991, learning_rate: 0.0001362, global_step: 202, interval_runtime: 3.779, interval_samples_per_second: 4.233925337932708, interval_steps_per_second: 0.26462033362079423, ppl: 1.283036982195212, epoch: 1.7532
[2023-12-26 17:54:15,445] [    INFO] - loss: 0.24007367, learning_rate: 0.0001352, global_step: 203, interval_runtime: 3.5477, interval_samples_per_second: 4.509910041530718, interval_steps_per_second: 0.2818693775956699, ppl: 1.271342806696099, epoch: 1.7619
[2023-12-26 17:54:19,061] [    INFO] - loss: 0.21843848, learning_rate: 0.0001343, global_step: 204, interval_runtime: 3.6156, interval_samples_per_second: 4.425265522074444, interval_steps_per_second: 0.27657909512965273, ppl: 1.2441324752429004, epoch: 1.7706
[2023-12-26 17:54:22,657] [    INFO] - loss: 0.25901291, learning_rate: 0.0001333, global_step: 205, interval_runtime: 3.5968, interval_samples_per_second: 4.4483577751895425, interval_steps_per_second: 0.2780223609493464, ppl: 1.2956505315684395, epoch: 1.7792
[2023-12-26 17:54:26,160] [    INFO] - loss: 0.25296086, learning_rate: 0.0001324, global_step: 206, interval_runtime: 3.5027, interval_samples_per_second: 4.567842146425268, interval_steps_per_second: 0.28549013415157926, ppl: 1.287832870069642, epoch: 1.7879
[2023-12-26 17:54:31,364] [    INFO] - loss: 0.25822967, learning_rate: 0.0001314, global_step: 207, interval_runtime: 4.8634, interval_samples_per_second: 3.2898795200332103, interval_steps_per_second: 0.20561747000207564, ppl: 1.2946361235604167, epoch: 1.7965
[2023-12-26 17:54:34,513] [    INFO] - loss: 0.25502729, learning_rate: 0.0001305, global_step: 208, interval_runtime: 3.4896, interval_samples_per_second: 4.58503463361906, interval_steps_per_second: 0.28656466460119123, ppl: 1.2904968380510597, epoch: 1.8052
[2023-12-26 17:54:37,955] [    INFO] - loss: 0.23156594, learning_rate: 0.0001295, global_step: 209, interval_runtime: 3.4414, interval_samples_per_second: 4.649269288212603, interval_steps_per_second: 0.2905793305132877, ppl: 1.2605724459842225, epoch: 1.8139
[2023-12-26 17:54:41,491] [    INFO] - loss: 0.23533037, learning_rate: 0.0001286, global_step: 210, interval_runtime: 3.5364, interval_samples_per_second: 4.5243706983283145, interval_steps_per_second: 0.28277316864551966, ppl: 1.2653267256792347, epoch: 1.8225
[2023-12-26 17:54:45,119] [    INFO] - loss: 0.24839923, learning_rate: 0.0001276, global_step: 211, interval_runtime: 3.6283, interval_samples_per_second: 4.409803953264618, interval_steps_per_second: 0.2756127470790386, ppl: 1.2819716315788272, epoch: 1.8312
[2023-12-26 17:54:48,561] [    INFO] - loss: 0.2554785, learning_rate: 0.0001267, global_step: 212, interval_runtime: 3.4418, interval_samples_per_second: 4.64878876601975, interval_steps_per_second: 0.2905492978762344, ppl: 1.291079254515542, epoch: 1.8398
[2023-12-26 17:54:51,773] [    INFO] - loss: 0.26169631, learning_rate: 0.0001257, global_step: 213, interval_runtime: 3.2122, interval_samples_per_second: 4.980961213382988, interval_steps_per_second: 0.3113100758364368, ppl: 1.299131949133763, epoch: 1.8485
[2023-12-26 17:54:54,852] [    INFO] - loss: 0.21404707, learning_rate: 0.0001248, global_step: 214, interval_runtime: 3.0789, interval_samples_per_second: 5.196694781124536, interval_steps_per_second: 0.3247934238202835, ppl: 1.2386809581339717, epoch: 1.8571
[2023-12-26 17:54:57,791] [    INFO] - loss: 0.27386618, learning_rate: 0.0001238, global_step: 215, interval_runtime: 2.9393, interval_samples_per_second: 5.4433959968502235, interval_steps_per_second: 0.34021224980313897, ppl: 1.3150388119696605, epoch: 1.8658
[2023-12-26 17:55:00,520] [    INFO] - loss: 0.2628904, learning_rate: 0.0001229, global_step: 216, interval_runtime: 2.7284, interval_samples_per_second: 5.864209488383408, interval_steps_per_second: 0.366513093023963, ppl: 1.300684156155911, epoch: 1.8745
[2023-12-26 17:55:04,433] [    INFO] - loss: 0.22695646, learning_rate: 0.0001219, global_step: 217, interval_runtime: 3.913, interval_samples_per_second: 4.088939940065642, interval_steps_per_second: 0.25555874625410263, ppl: 1.2547752338372222, epoch: 1.8831
[2023-12-26 17:55:07,549] [    INFO] - loss: 0.26187435, learning_rate: 0.000121, global_step: 218, interval_runtime: 3.1156, interval_samples_per_second: 5.135382011242782, interval_steps_per_second: 0.32096137570267386, ppl: 1.2993632671773079, epoch: 1.8918
[2023-12-26 17:55:10,801] [    INFO] - loss: 0.27500743, learning_rate: 0.00012, global_step: 219, interval_runtime: 3.2521, interval_samples_per_second: 4.919908025267539, interval_steps_per_second: 0.30749425157922117, ppl: 1.3165404567268755, epoch: 1.9004
[2023-12-26 17:55:15,393] [    INFO] - loss: 0.24148373, learning_rate: 0.000119, global_step: 220, interval_runtime: 4.5918, interval_samples_per_second: 3.4844913366529418, interval_steps_per_second: 0.21778070854080886, ppl: 1.2731367408142449, epoch: 1.9091
[2023-12-26 17:55:18,813] [    INFO] - loss: 0.26187742, learning_rate: 0.0001181, global_step: 221, interval_runtime: 3.4204, interval_samples_per_second: 4.677857654097276, interval_steps_per_second: 0.29236610338107977, ppl: 1.2993672562286616, epoch: 1.9177
[2023-12-26 17:55:22,702] [    INFO] - loss: 0.24043539, learning_rate: 0.0001171, global_step: 222, interval_runtime: 3.8894, interval_samples_per_second: 4.11376320138214, interval_steps_per_second: 0.25711020008638374, ppl: 1.2718027599982764, epoch: 1.9264
[2023-12-26 17:55:25,935] [    INFO] - loss: 0.24163382, learning_rate: 0.0001162, global_step: 223, interval_runtime: 3.2333, interval_samples_per_second: 4.948552835341657, interval_steps_per_second: 0.30928455220885354, ppl: 1.273327840248372, epoch: 1.9351
[2023-12-26 17:55:29,523] [    INFO] - loss: 0.25678757, learning_rate: 0.0001152, global_step: 224, interval_runtime: 3.5878, interval_samples_per_second: 4.459531574463244, interval_steps_per_second: 0.27872072340395276, ppl: 1.292770474356314, epoch: 1.9437
[2023-12-26 17:55:33,280] [    INFO] - loss: 0.2650784, learning_rate: 0.0001143, global_step: 225, interval_runtime: 3.7566, interval_samples_per_second: 4.259140901950391, interval_steps_per_second: 0.26619630637189945, ppl: 1.303533168772783, epoch: 1.9524
[2023-12-26 17:55:36,056] [    INFO] - loss: 0.27205297, learning_rate: 0.0001133, global_step: 226, interval_runtime: 2.7762, interval_samples_per_second: 5.76335509034562, interval_steps_per_second: 0.36020969314660123, ppl: 1.3126565308860423, epoch: 1.961
[2023-12-26 17:55:39,553] [    INFO] - loss: 0.26250398, learning_rate: 0.0001124, global_step: 227, interval_runtime: 3.4969, interval_samples_per_second: 4.575467931501679, interval_steps_per_second: 0.28596674571885494, ppl: 1.3001816428811321, epoch: 1.9697
[2023-12-26 17:55:42,754] [    INFO] - loss: 0.22927305, learning_rate: 0.0001114, global_step: 228, interval_runtime: 3.2008, interval_samples_per_second: 4.998771998954195, interval_steps_per_second: 0.3124232499346372, ppl: 1.2576854031292437, epoch: 1.9784
[2023-12-26 17:55:46,400] [    INFO] - loss: 0.23428649, learning_rate: 0.0001105, global_step: 229, interval_runtime: 3.6463, interval_samples_per_second: 4.388005366390592, interval_steps_per_second: 0.274250335399412, ppl: 1.2640065655810742, epoch: 1.987
[2023-12-26 17:55:49,800] [    INFO] - loss: 0.26624548, learning_rate: 0.0001095, global_step: 230, interval_runtime: 3.3999, interval_samples_per_second: 4.7060696633521895, interval_steps_per_second: 0.29412935395951184, ppl: 1.3050553843642994, epoch: 1.9957
[2023-12-26 17:55:51,041] [    INFO] - ***** Running Evaluation *****
[2023-12-26 17:55:51,048] [    INFO] -   Num examples = 206
[2023-12-26 17:55:51,049] [    INFO] -   Total prediction steps = 26
[2023-12-26 17:55:51,049] [    INFO] -   Pre device batch size = 8
[2023-12-26 17:55:51,049] [    INFO] -   Total Batch size = 8
[2023-12-26 17:56:05,578] [    INFO] - eval_loss: 0.24841691553592682, eval_accuracy: 0.9998247458815283, eval_runtime: 14.5366, eval_samples_per_second: 14.171123774194562, eval_steps_per_second: 1.788588437519702, eval_ppl: 1.2819943041346622, epoch: 1.9957
[2023-12-26 17:56:05,579] [    INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-230
[2023-12-26 17:56:05,579] [    INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/tokenizer_config.json
[2023-12-26 17:56:05,580] [    INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/special_tokens_map.json
[2023-12-26 17:56:05,583] [    INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/chat_template.json
[2023-12-26 17:56:05,760] [    INFO] - Saving optimizer files.
[2023-12-26 17:56:10,125] [    INFO] - loss: 0.3656939, learning_rate: 0.0001086, global_step: 231, interval_runtime: 20.3245, interval_samples_per_second: 0.7872267269065429, interval_steps_per_second: 0.049201670431658934, ppl: 1.441513927701439, epoch: 2.0087
[2023-12-26 17:56:13,735] [    INFO] - loss: 0.25071743, learning_rate: 0.0001076, global_step: 232, interval_runtime: 3.61, interval_samples_per_second: 4.432105284987014, interval_steps_per_second: 0.2770065803116884, ppl: 1.284946945569142, epoch: 2.0173
[2023-12-26 17:56:16,630] [    INFO] - loss: 0.23165847, learning_rate: 0.0001067, global_step: 233, interval_runtime: 2.8955, interval_samples_per_second: 5.525760972168335, interval_steps_per_second: 0.34536006076052095, ppl: 1.260689092149201, epoch: 2.026
[2023-12-26 17:56:19,615] [    INFO] - loss: 0.27415121, learning_rate: 0.0001057, global_step: 234, interval_runtime: 2.9846, interval_samples_per_second: 5.360863817454984, interval_steps_per_second: 0.3350539885909365, ppl: 1.3154136909055696, epoch: 2.0346
[2023-12-26 17:56:22,586] [    INFO] - loss: 0.28157312, learning_rate: 0.0001048, global_step: 235, interval_runtime: 2.9716, interval_samples_per_second: 5.3843227946127294, interval_steps_per_second: 0.3365201746632956, ppl: 1.325212892345648, epoch: 2.0433
[2023-12-26 17:56:25,969] [    INFO] - loss: 0.25894362, learning_rate: 0.0001038, global_step: 236, interval_runtime: 3.3824, interval_samples_per_second: 4.730322905159614, interval_steps_per_second: 0.2956451815724759, ppl: 1.2955607590533118, epoch: 2.0519
[2023-12-26 17:56:29,945] [    INFO] - loss: 0.26484933, learning_rate: 0.0001029, global_step: 237, interval_runtime: 3.9764, interval_samples_per_second: 4.023722798639813, interval_steps_per_second: 0.2514826749149883, ppl: 1.303234602627391, epoch: 2.0606
[2023-12-26 17:56:33,242] [    INFO] - loss: 0.2394658, learning_rate: 0.0001019, global_step: 238, interval_runtime: 3.2964, interval_samples_per_second: 4.853791011648139, interval_steps_per_second: 0.30336193822800867, ppl: 1.2705702303809643, epoch: 2.0693
[2023-12-26 17:56:36,414] [    INFO] - loss: 0.23598538, learning_rate: 0.000101, global_step: 239, interval_runtime: 3.1719, interval_samples_per_second: 5.044337250358957, interval_steps_per_second: 0.3152710781474348, ppl: 1.2661557988337833, epoch: 2.0779
[2023-12-26 17:56:39,721] [    INFO] - loss: 0.25548252, learning_rate: 0.0001, global_step: 240, interval_runtime: 3.3075, interval_samples_per_second: 4.8375500000360425, interval_steps_per_second: 0.30234687500225266, ppl: 1.2910844446645773, epoch: 2.0866
[2023-12-26 17:56:42,374] [    INFO] - loss: 0.25759581, learning_rate: 9.905e-05, global_step: 241, interval_runtime: 2.6527, interval_samples_per_second: 6.031540701410583, interval_steps_per_second: 0.3769712938381614, ppl: 1.293815765530674, epoch: 2.0952
[2023-12-26 17:56:45,520] [    INFO] - loss: 0.21545997, learning_rate: 9.81e-05, global_step: 242, interval_runtime: 3.1461, interval_samples_per_second: 5.085716654476958, interval_steps_per_second: 0.31785729090480985, ppl: 1.2404323274232008, epoch: 2.1039
[2023-12-26 17:56:48,992] [    INFO] - loss: 0.28708792, learning_rate: 9.714e-05, global_step: 243, interval_runtime: 3.4717, interval_samples_per_second: 4.608658078343157, interval_steps_per_second: 0.2880411298964473, ppl: 1.332541365362446, epoch: 2.1126
[2023-12-26 17:56:52,167] [    INFO] - loss: 0.26709995, learning_rate: 9.619e-05, global_step: 244, interval_runtime: 3.1752, interval_samples_per_second: 5.039020141404312, interval_steps_per_second: 0.3149387588377695, ppl: 1.306170991597156, epoch: 2.1212
[2023-12-26 17:56:55,975] [    INFO] - loss: 0.23067287, learning_rate: 9.524e-05, global_step: 245, interval_runtime: 3.8081, interval_samples_per_second: 4.201598763252775, interval_steps_per_second: 0.26259992270329846, ppl: 1.2594471691001918, epoch: 2.1299
[2023-12-26 17:56:59,639] [    INFO] - loss: 0.2489226, learning_rate: 9.429e-05, global_step: 246, interval_runtime: 3.6639, interval_samples_per_second: 4.366963643875407, interval_steps_per_second: 0.27293522774221296, ppl: 1.2826427526786524, epoch: 2.1385
[2023-12-26 17:57:03,154] [    INFO] - loss: 0.2632488, learning_rate: 9.333e-05, global_step: 247, interval_runtime: 3.5152, interval_samples_per_second: 4.551726008993185, interval_steps_per_second: 0.2844828755620741, ppl: 1.3011504049042621, epoch: 2.1472
[2023-12-26 17:57:07,978] [    INFO] - loss: 0.26488072, learning_rate: 9.238e-05, global_step: 248, interval_runtime: 4.8246, interval_samples_per_second: 3.316333676947755, interval_steps_per_second: 0.20727085480923468, ppl: 1.3032755118036337, epoch: 2.1558
[2023-12-26 17:57:11,275] [    INFO] - loss: 0.26563224, learning_rate: 9.143e-05, global_step: 249, interval_runtime: 3.2961, interval_samples_per_second: 4.854288515698945, interval_steps_per_second: 0.3033930322311841, ppl: 1.304255317541954, epoch: 2.1645
[2023-12-26 17:57:14,520] [    INFO] - loss: 0.23992111, learning_rate: 9.048e-05, global_step: 250, interval_runtime: 3.2451, interval_samples_per_second: 4.930568905885227, interval_steps_per_second: 0.3081605566178267, ppl: 1.2711488654317253, epoch: 2.1732
[2023-12-26 17:57:17,530] [    INFO] - loss: 0.25501767, learning_rate: 8.952e-05, global_step: 251, interval_runtime: 3.0108, interval_samples_per_second: 5.314274790747383, interval_steps_per_second: 0.33214217442171146, ppl: 1.2904844235311916, epoch: 2.1818
[2023-12-26 17:57:21,339] [    INFO] - loss: 0.24364254, learning_rate: 8.857e-05, global_step: 252, interval_runtime: 3.8085, interval_samples_per_second: 4.201152140561123, interval_steps_per_second: 0.2625720087850702, ppl: 1.275888169979503, epoch: 2.1905
[2023-12-26 17:57:24,571] [    INFO] - loss: 0.21903639, learning_rate: 8.762e-05, global_step: 253, interval_runtime: 3.2316, interval_samples_per_second: 4.951084361900219, interval_steps_per_second: 0.3094427726187637, ppl: 1.2448765769219226, epoch: 2.1991
[2023-12-26 17:57:28,548] [    INFO] - loss: 0.2594893, learning_rate: 8.667e-05, global_step: 254, interval_runtime: 3.9779, interval_samples_per_second: 4.022264940905389, interval_steps_per_second: 0.2513915588065868, ppl: 1.2962679135708033, epoch: 2.2078
[2023-12-26 17:57:31,999] [    INFO] - loss: 0.24149872, learning_rate: 8.571e-05, global_step: 255, interval_runtime: 3.4502, interval_samples_per_second: 4.637357737953826, interval_steps_per_second: 0.2898348586221141, ppl: 1.2731558252770274, epoch: 2.2165
[2023-12-26 17:57:35,627] [    INFO] - loss: 0.23661155, learning_rate: 8.476e-05, global_step: 256, interval_runtime: 3.6282, interval_samples_per_second: 4.409906825259528, interval_steps_per_second: 0.2756191765787205, ppl: 1.2669488758849545, epoch: 2.2251
[2023-12-26 17:57:39,345] [    INFO] - loss: 0.25656974, learning_rate: 8.381e-05, global_step: 257, interval_runtime: 3.7176, interval_samples_per_second: 4.303832502341623, interval_steps_per_second: 0.26898953139635146, ppl: 1.2924889008325786, epoch: 2.2338
[2023-12-26 17:57:42,004] [    INFO] - loss: 0.24968293, learning_rate: 8.286e-05, global_step: 258, interval_runtime: 2.6592, interval_samples_per_second: 6.016841883666504, interval_steps_per_second: 0.3760526177291565, ppl: 1.2836183552857618, epoch: 2.2424
[2023-12-26 17:57:45,878] [    INFO] - loss: 0.25985175, learning_rate: 8.19e-05, global_step: 259, interval_runtime: 3.874, interval_samples_per_second: 4.1300723405708775, interval_steps_per_second: 0.25812952128567984, ppl: 1.2967378310317246, epoch: 2.2511
[2023-12-26 17:57:49,201] [    INFO] - loss: 0.25258374, learning_rate: 8.095e-05, global_step: 260, interval_runtime: 3.3237, interval_samples_per_second: 4.813872911316932, interval_steps_per_second: 0.30086705695730825, ppl: 1.2873472941036401, epoch: 2.2597
[2023-12-26 17:57:53,102] [    INFO] - loss: 0.27856874, learning_rate: 8e-05, global_step: 261, interval_runtime: 3.901, interval_samples_per_second: 4.10154574847204, interval_steps_per_second: 0.2563466092795025, ppl: 1.3212374241350473, epoch: 2.2684
[2023-12-26 17:57:56,114] [    INFO] - loss: 0.23018984, learning_rate: 7.905e-05, global_step: 262, interval_runtime: 3.0118, interval_samples_per_second: 5.312472981014336, interval_steps_per_second: 0.332029561313396, ppl: 1.258838965236283, epoch: 2.2771
[2023-12-26 17:57:59,266] [    INFO] - loss: 0.25268465, learning_rate: 7.81e-05, global_step: 263, interval_runtime: 3.1513, interval_samples_per_second: 5.077254407965167, interval_steps_per_second: 0.31732840049782296, ppl: 1.2874772068737268, epoch: 2.2857
[2023-12-26 17:58:02,669] [    INFO] - loss: 0.25480685, learning_rate: 7.714e-05, global_step: 264, interval_runtime: 3.4035, interval_samples_per_second: 4.7010985877901765, interval_steps_per_second: 0.29381866173688603, ppl: 1.2902123922808444, epoch: 2.2944
[2023-12-26 17:58:05,789] [    INFO] - loss: 0.26835781, learning_rate: 7.619e-05, global_step: 265, interval_runtime: 3.1199, interval_samples_per_second: 5.128435359032087, interval_steps_per_second: 0.32052720993950545, ppl: 1.3078150055936044, epoch: 2.303
[2023-12-26 17:58:09,360] [    INFO] - loss: 0.2600894, learning_rate: 7.524e-05, global_step: 266, interval_runtime: 3.571, interval_samples_per_second: 4.480539634085072, interval_steps_per_second: 0.280033727130317, ppl: 1.2970460373984403, epoch: 2.3117
[2023-12-26 17:58:12,487] [    INFO] - loss: 0.24706353, learning_rate: 7.429e-05, global_step: 267, interval_runtime: 3.1274, interval_samples_per_second: 5.116001172484428, interval_steps_per_second: 0.31975007328027677, ppl: 1.2802604451408, epoch: 2.3203
[2023-12-26 17:58:15,359] [    INFO] - loss: 0.23921801, learning_rate: 7.333e-05, global_step: 268, interval_runtime: 2.8716, interval_samples_per_second: 5.571803589560932, interval_steps_per_second: 0.34823772434755823, ppl: 1.2702554347867892, epoch: 2.329
[2023-12-26 17:58:18,433] [    INFO] - loss: 0.24235269, learning_rate: 7.238e-05, global_step: 269, interval_runtime: 3.0745, interval_samples_per_second: 5.204087189128849, interval_steps_per_second: 0.32525544932055306, ppl: 1.274243526523154, epoch: 2.3377
[2023-12-26 17:58:21,776] [    INFO] - loss: 0.24522433, learning_rate: 7.143e-05, global_step: 270, interval_runtime: 3.3424, interval_samples_per_second: 4.786993651473001, interval_steps_per_second: 0.29918710321706254, ppl: 1.2779079541439566, epoch: 2.3463
[2023-12-26 17:58:24,804] [    INFO] - loss: 0.22888985, learning_rate: 7.048e-05, global_step: 271, interval_runtime: 3.0278, interval_samples_per_second: 5.284330203548979, interval_steps_per_second: 0.3302706377218112, ppl: 1.2572035504116417, epoch: 2.355
[2023-12-26 17:58:28,146] [    INFO] - loss: 0.26895219, learning_rate: 6.952e-05, global_step: 272, interval_runtime: 3.3413, interval_samples_per_second: 4.7884938344333285, interval_steps_per_second: 0.29928086465208303, ppl: 1.3085925757398087, epoch: 2.3636
[2023-12-26 17:58:31,349] [    INFO] - loss: 0.25630388, learning_rate: 6.857e-05, global_step: 273, interval_runtime: 3.2042, interval_samples_per_second: 4.993461661492928, interval_steps_per_second: 0.312091353843308, ppl: 1.2921453254069084, epoch: 2.3723
[2023-12-26 17:58:34,611] [    INFO] - loss: 0.2489031, learning_rate: 6.762e-05, global_step: 274, interval_runtime: 3.2619, interval_samples_per_second: 4.905145051417184, interval_steps_per_second: 0.306571565713574, ppl: 1.282617741388836, epoch: 2.381
[2023-12-26 17:58:38,113] [    INFO] - loss: 0.25413105, learning_rate: 6.667e-05, global_step: 275, interval_runtime: 3.5017, interval_samples_per_second: 4.569229250370903, interval_steps_per_second: 0.2855768281481814, ppl: 1.2893407613034216, epoch: 2.3896
[2023-12-26 17:58:41,438] [    INFO] - loss: 0.25917414, learning_rate: 6.571e-05, global_step: 276, interval_runtime: 3.3234, interval_samples_per_second: 4.8142790303213605, interval_steps_per_second: 0.30089243939508503, ppl: 1.2958594461448403, epoch: 2.3983
[2023-12-26 17:58:44,153] [    INFO] - loss: 0.23144963, learning_rate: 6.476e-05, global_step: 277, interval_runtime: 2.7165, interval_samples_per_second: 5.889941624433201, interval_steps_per_second: 0.36812135152707504, ppl: 1.2604258373292216, epoch: 2.4069
[2023-12-26 17:58:47,813] [    INFO] - loss: 0.24212955, learning_rate: 6.381e-05, global_step: 278, interval_runtime: 3.6594, interval_samples_per_second: 4.3722818309427405, interval_steps_per_second: 0.2732676144339213, ppl: 1.2739592235435087, epoch: 2.4156
[2023-12-26 17:58:51,909] [    INFO] - loss: 0.26929981, learning_rate: 6.286e-05, global_step: 279, interval_runtime: 4.0964, interval_samples_per_second: 3.9058749112927673, interval_steps_per_second: 0.24411718195579796, ppl: 1.3090475477650936, epoch: 2.4242
[2023-12-26 17:58:55,586] [    INFO] - loss: 0.24498856, learning_rate: 6.19e-05, global_step: 280, interval_runtime: 3.6778, interval_samples_per_second: 4.350392832018189, interval_steps_per_second: 0.2718995520011368, ppl: 1.2776066973006666, epoch: 2.4329
[2023-12-26 17:58:59,000] [    INFO] - loss: 0.27687347, learning_rate: 6.095e-05, global_step: 281, interval_runtime: 3.4139, interval_samples_per_second: 4.686728616429989, interval_steps_per_second: 0.29292053852687433, ppl: 1.3189994674734082, epoch: 2.4416
[2023-12-26 17:59:02,266] [    INFO] - loss: 0.28048354, learning_rate: 6e-05, global_step: 282, interval_runtime: 3.2653, interval_samples_per_second: 4.900004483167459, interval_steps_per_second: 0.3062502801979662, ppl: 1.323769753232936, epoch: 2.4502
[2023-12-26 17:59:05,672] [    INFO] - loss: 0.25545651, learning_rate: 5.905e-05, global_step: 283, interval_runtime: 3.406, interval_samples_per_second: 4.6976205739524115, interval_steps_per_second: 0.2936012858720257, ppl: 1.29105086399489, epoch: 2.4589
[2023-12-26 17:59:08,913] [    INFO] - loss: 0.23756577, learning_rate: 5.81e-05, global_step: 284, interval_runtime: 3.2419, interval_samples_per_second: 4.9353487782727425, interval_steps_per_second: 0.3084592986420464, ppl: 1.2681584008259699, epoch: 2.4675
[2023-12-26 17:59:12,223] [    INFO] - loss: 0.23696953, learning_rate: 5.714e-05, global_step: 285, interval_runtime: 3.3095, interval_samples_per_second: 4.834494743523587, interval_steps_per_second: 0.3021559214702242, ppl: 1.2674024994327784, epoch: 2.4762
[2023-12-26 17:59:15,924] [    INFO] - loss: 0.22214374, learning_rate: 5.619e-05, global_step: 286, interval_runtime: 3.7008, interval_samples_per_second: 4.323410216305595, interval_steps_per_second: 0.27021313851909967, ppl: 1.2487508604132393, epoch: 2.4848
[2023-12-26 17:59:18,926] [    INFO] - loss: 0.26022163, learning_rate: 5.524e-05, global_step: 287, interval_runtime: 3.0011, interval_samples_per_second: 5.331368640551071, interval_steps_per_second: 0.33321054003444195, ppl: 1.297217557135743, epoch: 2.4935
[2023-12-26 17:59:22,381] [    INFO] - loss: 0.25578877, learning_rate: 5.429e-05, global_step: 288, interval_runtime: 3.4561, interval_samples_per_second: 4.62943660989538, interval_steps_per_second: 0.2893397881184612, ppl: 1.291479899826737, epoch: 2.5022
[2023-12-26 17:59:26,236] [    INFO] - loss: 0.25826275, learning_rate: 5.333e-05, global_step: 289, interval_runtime: 3.8551, interval_samples_per_second: 4.150364865363805, interval_steps_per_second: 0.2593978040852378, ppl: 1.2946789508317431, epoch: 2.5108
[2023-12-26 17:59:29,565] [    INFO] - loss: 0.25632015, learning_rate: 5.238e-05, global_step: 290, interval_runtime: 3.3288, interval_samples_per_second: 4.806578254975566, interval_steps_per_second: 0.3004111409359729, ppl: 1.2921663487823773, epoch: 2.5195
[2023-12-26 17:59:32,303] [    INFO] - loss: 0.23544586, learning_rate: 5.143e-05, global_step: 291, interval_runtime: 2.7378, interval_samples_per_second: 5.844061328239773, interval_steps_per_second: 0.3652538330149858, ppl: 1.2654728667015342, epoch: 2.5281
[2023-12-26 17:59:36,206] [    INFO] - loss: 0.2519612, learning_rate: 5.048e-05, global_step: 292, interval_runtime: 3.9034, interval_samples_per_second: 4.098974158007226, interval_steps_per_second: 0.25618588487545163, ppl: 1.286546118327028, epoch: 2.5368
[2023-12-26 17:59:39,389] [    INFO] - loss: 0.25703761, learning_rate: 4.952e-05, global_step: 293, interval_runtime: 3.1824, interval_samples_per_second: 5.027707474921347, interval_steps_per_second: 0.3142317171825842, ppl: 1.2930937591010965, epoch: 2.5455
[2023-12-26 17:59:42,644] [    INFO] - loss: 0.25393027, learning_rate: 4.857e-05, global_step: 294, interval_runtime: 3.255, interval_samples_per_second: 4.915461133872737, interval_steps_per_second: 0.3072163208670461, ppl: 1.2890819134519724, epoch: 2.5541
[2023-12-26 17:59:46,200] [    INFO] - loss: 0.2280623, learning_rate: 4.762e-05, global_step: 295, interval_runtime: 3.556, interval_samples_per_second: 4.499463521579386, interval_steps_per_second: 0.28121647009871165, ppl: 1.2561635819857848, epoch: 2.5628
[2023-12-26 17:59:49,439] [    INFO] - loss: 0.25715587, learning_rate: 4.667e-05, global_step: 296, interval_runtime: 3.2288, interval_samples_per_second: 4.955421057314687, interval_steps_per_second: 0.3097138160821679, ppl: 1.2932466894116388, epoch: 2.5714
[2023-12-26 17:59:53,245] [    INFO] - loss: 0.23208797, learning_rate: 4.571e-05, global_step: 297, interval_runtime: 3.8165, interval_samples_per_second: 4.1922906524141785, interval_steps_per_second: 0.26201816577588616, ppl: 1.2612306744107442, epoch: 2.5801
[2023-12-26 17:59:56,236] [    INFO] - loss: 0.25757715, learning_rate: 4.476e-05, global_step: 298, interval_runtime: 2.9911, interval_samples_per_second: 5.349231498268504, interval_steps_per_second: 0.3343269686417815, ppl: 1.293791623153738, epoch: 2.5887
[2023-12-26 17:59:58,780] [    INFO] - loss: 0.23431894, learning_rate: 4.381e-05, global_step: 299, interval_runtime: 2.5437, interval_samples_per_second: 6.290102448579346, interval_steps_per_second: 0.39313140303620914, ppl: 1.2640475832596356, epoch: 2.5974
[2023-12-26 18:00:02,451] [    INFO] - loss: 0.26020345, learning_rate: 4.286e-05, global_step: 300, interval_runtime: 3.6713, interval_samples_per_second: 4.358165578026837, interval_steps_per_second: 0.2723853486266773, ppl: 1.297193973934926, epoch: 2.6061
[2023-12-26 18:00:06,312] [    INFO] - loss: 0.2662026, learning_rate: 4.19e-05, global_step: 301, interval_runtime: 3.8606, interval_samples_per_second: 4.144439144325265, interval_steps_per_second: 0.25902744652032905, ppl: 1.3049994247891998, epoch: 2.6147
[2023-12-26 18:00:09,488] [    INFO] - loss: 0.22707528, learning_rate: 4.095e-05, global_step: 302, interval_runtime: 3.1759, interval_samples_per_second: 5.037924252989906, interval_steps_per_second: 0.3148702658118691, ppl: 1.2549243350884367, epoch: 2.6234
[2023-12-26 18:00:12,644] [    INFO] - loss: 0.25288558, learning_rate: 4e-05, global_step: 303, interval_runtime: 2.9885, interval_samples_per_second: 5.353831891938681, interval_steps_per_second: 0.33461449324616754, ppl: 1.2877359256602163, epoch: 2.632
[2023-12-26 18:00:15,878] [    INFO] - loss: 0.21039963, learning_rate: 3.905e-05, global_step: 304, interval_runtime: 3.4022, interval_samples_per_second: 4.702833429771899, interval_steps_per_second: 0.2939270893607437, ppl: 1.2341711732447127, epoch: 2.6407
[2023-12-26 18:00:19,129] [    INFO] - loss: 0.29113054, learning_rate: 3.81e-05, global_step: 305, interval_runtime: 3.2507, interval_samples_per_second: 4.921999834096974, interval_steps_per_second: 0.3076249896310609, ppl: 1.3379392271375368, epoch: 2.6494
[2023-12-26 18:00:22,181] [    INFO] - loss: 0.25979307, learning_rate: 3.714e-05, global_step: 306, interval_runtime: 3.052, interval_samples_per_second: 5.242421288137288, interval_steps_per_second: 0.3276513305085805, ppl: 1.296661740688312, epoch: 2.658
[2023-12-26 18:00:24,877] [    INFO] - loss: 0.26546153, learning_rate: 3.619e-05, global_step: 307, interval_runtime: 2.6959, interval_samples_per_second: 5.934965613583775, interval_steps_per_second: 0.37093535084898593, ppl: 1.3040326871198566, epoch: 2.6667
[2023-12-26 18:00:28,383] [    INFO] - loss: 0.24498808, learning_rate: 3.524e-05, global_step: 308, interval_runtime: 3.5058, interval_samples_per_second: 4.563913070503891, interval_steps_per_second: 0.2852445669064932, ppl: 1.277606084049599, epoch: 2.6753
[2023-12-26 18:00:31,397] [    INFO] - loss: 0.23861594, learning_rate: 3.429e-05, global_step: 309, interval_runtime: 3.0124, interval_samples_per_second: 5.311380206602021, interval_steps_per_second: 0.3319612629126263, ppl: 1.2694908822773268, epoch: 2.684
[2023-12-26 18:00:35,293] [    INFO] - loss: 0.26361945, learning_rate: 3.333e-05, global_step: 310, interval_runtime: 3.8973, interval_samples_per_second: 4.105387214429576, interval_steps_per_second: 0.2565867009018485, ppl: 1.3016327656898303, epoch: 2.6926
[2023-12-26 18:00:38,928] [    INFO] - loss: 0.25977874, learning_rate: 3.238e-05, global_step: 311, interval_runtime: 3.6351, interval_samples_per_second: 4.401570949953271, interval_steps_per_second: 0.27509818437207945, ppl: 1.2966431596587014, epoch: 2.7013
[2023-12-26 18:00:42,534] [    INFO] - loss: 0.26889122, learning_rate: 3.143e-05, global_step: 312, interval_runtime: 3.6068, interval_samples_per_second: 4.436102060498853, interval_steps_per_second: 0.2772563787811783, ppl: 1.3085127932826588, epoch: 2.71
[2023-12-26 18:00:45,487] [    INFO] - loss: 0.23030058, learning_rate: 3.048e-05, global_step: 313, interval_runtime: 2.9525, interval_samples_per_second: 5.419194334419451, interval_steps_per_second: 0.3386996459012157, ppl: 1.2589783767823681, epoch: 2.7186
[2023-12-26 18:00:49,292] [    INFO] - loss: 0.22973362, learning_rate: 2.952e-05, global_step: 314, interval_runtime: 3.8053, interval_samples_per_second: 4.204660341211675, interval_steps_per_second: 0.2627912713257297, ppl: 1.2582647887089293, epoch: 2.7273
[2023-12-26 18:00:54,526] [    INFO] - loss: 0.27658898, learning_rate: 2.857e-05, global_step: 315, interval_runtime: 5.2337, interval_samples_per_second: 3.057112854928451, interval_steps_per_second: 0.1910695534330282, ppl: 1.3186242786861664, epoch: 2.7359
[2023-12-26 18:00:57,782] [    INFO] - loss: 0.26146588, learning_rate: 2.762e-05, global_step: 316, interval_runtime: 3.2563, interval_samples_per_second: 4.913592525882645, interval_steps_per_second: 0.30709953286766534, ppl: 1.2988326246467194, epoch: 2.7446
[2023-12-26 18:01:01,895] [    INFO] - loss: 0.22772613, learning_rate: 2.667e-05, global_step: 317, interval_runtime: 4.1131, interval_samples_per_second: 3.8900119484687288, interval_steps_per_second: 0.24312574677929555, ppl: 1.2557413684461678, epoch: 2.7532
[2023-12-26 18:01:05,393] [    INFO] - loss: 0.25907451, learning_rate: 2.571e-05, global_step: 318, interval_runtime: 3.4974, interval_samples_per_second: 4.57482009390981, interval_steps_per_second: 0.2859262558693631, ppl: 1.2957303460994465, epoch: 2.7619
[2023-12-26 18:01:09,002] [    INFO] - loss: 0.25298631, learning_rate: 2.476e-05, global_step: 319, interval_runtime: 3.6097, interval_samples_per_second: 4.43251394835555, interval_steps_per_second: 0.27703212177222186, ppl: 1.287865645833255, epoch: 2.7706
[2023-12-26 18:01:12,718] [    INFO] - loss: 0.25091952, learning_rate: 2.381e-05, global_step: 320, interval_runtime: 3.7159, interval_samples_per_second: 4.3058071828059425, interval_steps_per_second: 0.2691129489253714, ppl: 1.2852066467379928, epoch: 2.7792
[2023-12-26 18:01:16,532] [    INFO] - loss: 0.23852175, learning_rate: 2.286e-05, global_step: 321, interval_runtime: 3.8137, interval_samples_per_second: 4.1953520513843365, interval_steps_per_second: 0.26220950321152103, ppl: 1.269371314562255, epoch: 2.7879
[2023-12-26 18:01:20,111] [    INFO] - loss: 0.2394225, learning_rate: 2.19e-05, global_step: 322, interval_runtime: 3.5789, interval_samples_per_second: 4.470648175683526, interval_steps_per_second: 0.2794155109802204, ppl: 1.2705152158810613, epoch: 2.7965
[2023-12-26 18:01:23,907] [    INFO] - loss: 0.23940259, learning_rate: 2.095e-05, global_step: 323, interval_runtime: 3.7965, interval_samples_per_second: 4.214424717206121, interval_steps_per_second: 0.2634015448253826, ppl: 1.2704899201749327, epoch: 2.8052
[2023-12-26 18:01:27,614] [    INFO] - loss: 0.27364457, learning_rate: 2e-05, global_step: 324, interval_runtime: 3.707, interval_samples_per_second: 4.316156378066712, interval_steps_per_second: 0.2697597736291695, ppl: 1.314747418507585, epoch: 2.8139
[2023-12-26 18:01:31,409] [    INFO] - loss: 0.2073748, learning_rate: 1.905e-05, global_step: 325, interval_runtime: 3.7944, interval_samples_per_second: 4.216775989895652, interval_steps_per_second: 0.26354849936847824, ppl: 1.2304436556503757, epoch: 2.8225
[2023-12-26 18:01:35,076] [    INFO] - loss: 0.27432245, learning_rate: 1.81e-05, global_step: 326, interval_runtime: 3.6666, interval_samples_per_second: 4.363658116977148, interval_steps_per_second: 0.27272863231107175, ppl: 1.3156389616331297, epoch: 2.8312
[2023-12-26 18:01:38,026] [    INFO] - loss: 0.23853159, learning_rate: 1.714e-05, global_step: 327, interval_runtime: 2.9507, interval_samples_per_second: 5.422525735818736, interval_steps_per_second: 0.338907858488671, ppl: 1.2693838052374444, epoch: 2.8398
[2023-12-26 18:01:41,919] [    INFO] - loss: 0.24444909, learning_rate: 1.619e-05, global_step: 328, interval_runtime: 3.8926, interval_samples_per_second: 4.110366966863487, interval_steps_per_second: 0.25689793542896794, ppl: 1.2769176526918324, epoch: 2.8485
[2023-12-26 18:01:45,451] [    INFO] - loss: 0.2582615, learning_rate: 1.524e-05, global_step: 329, interval_runtime: 3.5324, interval_samples_per_second: 4.529516838633957, interval_steps_per_second: 0.2830948024146223, ppl: 1.2946773324840661, epoch: 2.8571
[2023-12-26 18:01:48,673] [    INFO] - loss: 0.24779966, learning_rate: 1.429e-05, global_step: 330, interval_runtime: 3.2218, interval_samples_per_second: 4.966123008457353, interval_steps_per_second: 0.31038268802858454, ppl: 1.2812032302259, epoch: 2.8658
[2023-12-26 18:01:52,729] [    INFO] - loss: 0.24498056, learning_rate: 1.333e-05, global_step: 331, interval_runtime: 4.0562, interval_samples_per_second: 3.9445352860847476, interval_steps_per_second: 0.24653345538029672, ppl: 1.2775964764879715, epoch: 2.8745
[2023-12-26 18:01:55,759] [    INFO] - loss: 0.22714457, learning_rate: 1.238e-05, global_step: 332, interval_runtime: 3.0296, interval_samples_per_second: 5.28113896366891, interval_steps_per_second: 0.3300711852293069, ppl: 1.2550112918081957, epoch: 2.8831
[2023-12-26 18:01:58,791] [    INFO] - loss: 0.27777711, learning_rate: 1.143e-05, global_step: 333, interval_runtime: 3.0253, interval_samples_per_second: 5.288713311737848, interval_steps_per_second: 0.3305445819836155, ppl: 1.320191906839008, epoch: 2.8918
[2023-12-26 18:02:02,712] [    INFO] - loss: 0.26041701, learning_rate: 1.048e-05, global_step: 334, interval_runtime: 3.9276, interval_samples_per_second: 4.073694137666119, interval_steps_per_second: 0.25460588360413244, ppl: 1.297471032263235, epoch: 2.9004
[2023-12-26 18:02:06,276] [    INFO] - loss: 0.24744098, learning_rate: 9.524e-06, global_step: 335, interval_runtime: 3.5639, interval_samples_per_second: 4.48946259600896, interval_steps_per_second: 0.28059141225056, ppl: 1.280743770655688, epoch: 2.9091
[2023-12-26 18:02:09,363] [    INFO] - loss: 0.2456432, learning_rate: 8.571e-06, global_step: 336, interval_runtime: 3.0873, interval_samples_per_second: 5.182528243944213, interval_steps_per_second: 0.3239080152465133, ppl: 1.2784433435701654, epoch: 2.9177
[2023-12-26 18:02:12,762] [    INFO] - loss: 0.28566605, learning_rate: 7.619e-06, global_step: 337, interval_runtime: 3.3991, interval_samples_per_second: 4.70716161416221, interval_steps_per_second: 0.29419760088513813, ppl: 1.330648011142046, epoch: 2.9264
[2023-12-26 18:02:16,726] [    INFO] - loss: 0.25106084, learning_rate: 6.667e-06, global_step: 338, interval_runtime: 3.9641, interval_samples_per_second: 4.036175474060172, interval_steps_per_second: 0.2522609671287607, ppl: 1.2853882849755656, epoch: 2.9351
[2023-12-26 18:02:19,759] [    INFO] - loss: 0.26100892, learning_rate: 5.714e-06, global_step: 339, interval_runtime: 3.0327, interval_samples_per_second: 5.27582381013065, interval_steps_per_second: 0.3297389881331656, ppl: 1.2982392456761134, epoch: 2.9437
[2023-12-26 18:02:22,956] [    INFO] - loss: 0.25734669, learning_rate: 4.762e-06, global_step: 340, interval_runtime: 3.1968, interval_samples_per_second: 5.004943052240602, interval_steps_per_second: 0.3128089407650376, ppl: 1.2934934902914355, epoch: 2.9524
[2023-12-26 18:02:26,641] [    INFO] - loss: 0.26527035, learning_rate: 3.81e-06, global_step: 341, interval_runtime: 3.6853, interval_samples_per_second: 4.341564719683562, interval_steps_per_second: 0.2713477949802226, ppl: 1.3037834059802764, epoch: 2.961
[2023-12-26 18:02:29,608] [    INFO] - loss: 0.23022859, learning_rate: 2.857e-06, global_step: 342, interval_runtime: 2.9664, interval_samples_per_second: 5.3937902446375645, interval_steps_per_second: 0.3371118902898478, ppl: 1.2588877461913108, epoch: 2.9697
[2023-12-26 18:02:33,054] [    INFO] - loss: 0.2588667, learning_rate: 1.905e-06, global_step: 343, interval_runtime: 3.4466, interval_samples_per_second: 4.642287339553546, interval_steps_per_second: 0.2901429587220966, ppl: 1.2954611083523406, epoch: 2.9784
[2023-12-26 18:02:36,004] [    INFO] - loss: 0.26296732, learning_rate: 9.524e-07, global_step: 344, interval_runtime: 2.9501, interval_samples_per_second: 5.423617388552312, interval_steps_per_second: 0.3389760867845195, ppl: 1.3007842086291714, epoch: 2.987
[2023-12-26 18:02:39,677] [    INFO] - loss: 0.23235616, learning_rate: 0.0, global_step: 345, interval_runtime: 3.6724, interval_samples_per_second: 4.356782299180001, interval_steps_per_second: 0.27229889369875004, ppl: 1.2615689692269303, epoch: 2.9957
[2023-12-26 18:02:39,677] [    INFO] - ***** Running Evaluation *****
[2023-12-26 18:02:39,677] [    INFO] -   Num examples = 206
[2023-12-26 18:02:39,677] [    INFO] -   Total prediction steps = 26
[2023-12-26 18:02:39,678] [    INFO] -   Pre device batch size = 8
[2023-12-26 18:02:39,678] [    INFO] -   Total Batch size = 8
[2023-12-26 18:02:54,183] [    INFO] - eval_loss: 0.2482166439294815, eval_accuracy: 1.0, eval_runtime: 14.5045, eval_samples_per_second: 14.202442044521213, eval_steps_per_second: 1.792541228920153, eval_ppl: 1.2817375827837763, epoch: 2.9957
[2023-12-26 18:02:54,183] [    INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-345
[2023-12-26 18:02:54,184] [    INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/tokenizer_config.json
[2023-12-26 18:02:54,184] [    INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/special_tokens_map.json
[2023-12-26 18:02:54,187] [    INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/chat_template.json
[2023-12-26 18:02:54,362] [    INFO] - Saving optimizer files.
[2023-12-26 18:02:55,144] [    INFO] - 
Training completed. [2023-12-26 18:02:55,144] [    INFO] - Loading best model from ./checkpoints/llama_lora_ckpts/checkpoint-345 (score: 1.0).
[2023-12-26 18:02:55,235] [    INFO] - Load lora weight successfully
[2023-12-26 18:02:55,236] [    INFO] - set state-dict :None
[2023-12-26 18:02:55,237] [    INFO] - train_runtime: 1233.9516, train_samples_per_second: 4.492882786697039, train_steps_per_second: 0.27958956735398244, train_loss: 0.4301475836315017, epoch: 2.9957
[2023-12-26 18:02:55,238] [    INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts
[2023-12-26 18:02:55,238] [    INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/tokenizer_config.json
[2023-12-26 18:02:55,238] [    INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/special_tokens_map.json
[2023-12-26 18:02:55,241] [    INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/chat_template.json
[2023-12-26 18:02:55,383] [    INFO] - ***** train metrics *****
[2023-12-26 18:02:55,383] [    INFO] -   epoch                    =     2.9957
[2023-12-26 18:02:55,384] [    INFO] -   train_loss               =     0.4301
[2023-12-26 18:02:55,384] [    INFO] -   train_runtime            = 0:20:33.95
[2023-12-26 18:02:55,384] [    INFO] -   train_samples_per_second =     4.4929
[2023-12-26 18:02:55,384] [    INFO] -   train_steps_per_second   =     0.2796
[2023-12-26 18:02:55,400] [    INFO] - ***** Running Evaluation *****
[2023-12-26 18:02:55,400] [    INFO] -   Num examples = 206
[2023-12-26 18:02:55,400] [    INFO] -   Total prediction steps = 26
[2023-12-26 18:02:55,400] [    INFO] -   Pre device batch size = 8
[2023-12-26 18:02:55,401] [    INFO] -   Total Batch size = 8
[2023-12-26 18:03:09,938] [    INFO] - eval_loss: 0.2482166439294815, eval_accuracy: 1.0, eval_runtime: 14.5378, eval_samples_per_second: 14.169953149904021, eval_steps_per_second: 1.7884406888228377, eval_ppl: 1.2817375827837763, epoch: 2.9957
[2023-12-26 18:03:09,938] [    INFO] - ***** eval metrics *****
[2023-12-26 18:03:09,938] [    INFO] -   epoch                   =     2.9957
[2023-12-26 18:03:09,939] [    INFO] -   eval_accuracy           =        1.0
[2023-12-26 18:03:09,939] [    INFO] -   eval_loss               =     0.2482
[2023-12-26 18:03:09,939] [    INFO] -   eval_ppl                =     1.2817
[2023-12-26 18:03:09,939] [    INFO] -   eval_runtime            = 0:00:14.53
[2023-12-26 18:03:09,939] [    INFO] -   eval_samples_per_second =      14.17
[2023-12-26 18:03:09,939] [    INFO] -   eval_steps_per_second   =     1.7884

3. 加载lora模型并测试训练后的效果

In [1]

import json
import paddle
import get_result
from paddlenlp.peft import LoRAModel
from paddlenlp.transformers import AutoModelForCausalLM,AutoTokenizer#加载基础模型
model = AutoModelForCausalLM.from_pretrained('/home/aistudio/Baichuan2-7B-Chat',dtype="float16",tensor_parallel_degree=0,tensor_parallel_rank=0,)
# 加载lora模型
model = LoRAModel.from_pretrained(model, '/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts')
model.eval()
tokenizer = AutoTokenizer.from_pretrained('/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts', padding_side="left")
result=get_result.generate(model,tokenizer,"我感冒了,有点咳嗽,发热,头疼,有口渴但是小便不利")
print(result)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdm
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.warnings.warn("Setuptools is replacing distutils.")
[2023-12-27 13:05:31,800] [    INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'.
[2023-12-27 13:05:31,802] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json
[2023-12-27 13:05:31,804] [    INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json
W1227 13:05:31.809015  3307 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1227 13:05:31.810439  3307 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████| 4/4 [03:49<00:00, 57.38s/it]
[2023-12-27 13:09:37,056] [    INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.[2023-12-27 13:09:37,057] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2023-12-27 13:09:37,061] [    INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json
[2023-12-27 13:09:37,228] [ WARNING] - Reset tensor_parallel_degree of lora_config to 0.
[2023-12-27 13:09:37,296] [    INFO] - Loading the LoRA weights from /home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts/lora_model_state.pdparams
[2023-12-27 13:09:37,355] [    INFO] - Load lora weight successfully
[2023-12-27 13:09:37,366] [    INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load '/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts'.
 的症状<br><reserved_203>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸 如果有发热但小便不热的症状,<reserved_221>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸 这个<reserved_264>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸D诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸j诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸<reserved_288>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或

看起来训练效果还不错,辨证和用药都是比较准确的。

八、gradio发布

打开根目录下的main.gradio.py,点击应用部署即可发布

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/29970.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

css 文字两端对齐

<body><div class"box"><p>姓名</p><p>性与别</p><p>家庭住址</p><p>how are you</p><p>hello</p><p>1234</p><p>1 2 3 4</p></div> </body> text-a…

Ubuntu-24.04-live-server-amd64启用ssh

系列文章目录 Ubuntu-24.04-live-server-amd64安装界面中文版 Ubuntu安装qemu-guest-agent Ubuntu乌班图安装VIM文本编辑器工具 文章目录 系列文章目录前言一、输入安装命令二、使用私钥登录&#xff08;可选&#xff09;1.创建私钥2.生成三个文件说明3.将公钥复制到服务器 三…

面向对象进阶--继承(Java继承(超详解))

目录 1. 继承 1.1 继承概述 1.2 继承特点 1.3练习 1.4继承父类的内容 构造方法是否被子类继承 成员变量是否被子类继承 成员方法是否被子类继承 1.5总结 继承中&#xff1a;成员变量的访问特点 继承中&#xff1a;成员方法的访问特点 方法重写概述 方法重写的本质 …

飞睿智能LR-WIFI无线数据采集模块,6公里视频图传,安防监控、工业传输数据更高效

在数字化浪潮席卷全球的今天&#xff0c;无线数据采集技术已经成为推动社会进步的重要力量。特别是在安防监控和工业领域&#xff0c;高效、稳定的数据传输成为了实现智能化、自动化的关键。飞睿智能LR-WiFi无线数据采集模块不仅具备可靠的传输性能&#xff0c;还能在复杂环境下…

尚硅谷爬虫学习第一天(3) 请求对象定制

#url的组成 #协议 http&#xff0c;https&#xff0c;一个安全&#xff0c;一个不安全。 #主机&#xff0c; 端口号 学过java 的肯定知道 沃日&#xff0c;以前面试运维的时候&#xff0c;问到主机地址&#xff0c;我懵逼了下&#xff0c;回了个8080 # 主机地址 80 # …

关于微信小程序(必看)

前言 为规范开发者的用户个人信息处理行为&#xff0c;保障用户的合法权益&#xff0c;自2023年9月15日起&#xff0c;对于涉及处理用户个人信息的小程序开发者&#xff0c;微信要求&#xff0c;仅当开发者主动向平台同步用户已阅读并同意了小程序的隐私保护指引等信息处理规则…

Datacom HCIE实验考试通过率90%!深圳智汇云校传来5月捷报!

坚持不懈地努力&#xff0c;才能取得成功的果实 这是不变的真理 深圳云校传来5月捷报 在Datacom HCIE实验考试中 共有10名学员应战 其中9名学员凭借出色的表现 一次性通过了考试 展现出了扎实的技术能力 通过率高达90% &#xff08;华为历年考试平均通过率约60%&#…

超级棒的时钟屏保 芝麻时钟颜值高 屏保界的天花板

太酷了&#xff01;这个时钟屏保太有个性了 屏保时钟软件推荐&#xff01;超级棒的时钟屏保 芝麻时钟颜值高 屏保界的天花板&#xff0c;今天小编给大家分享一个非常实用好看的时钟屏保&#xff08;芝麻时钟&#xff09;&#xff0c;从美观、功能、效果、操作方面去评估&#x…

【机器学习】机器学习重要方法——无监督学习:理论、算法与实践

文章目录 引言第一章 无监督学习的基本概念1.1 什么是无监督学习1.2 无监督学习的主要任务 第二章 无监督学习的核心算法2.1 聚类算法2.1.1 K均值聚类2.1.2 层次聚类2.1.3 DBSCAN聚类 2.2 降维算法2.2.1 主成分分析&#xff08;PCA&#xff09;2.2.2 t-SNE 2.3 异常检测算法2.3…

Java new HashMap 指定容量,代码怎么写? 学习源码小记

之前针对 创建map 指定容量&#xff0c;写过一篇吐槽教学文章&#xff1a;HashMap 使用的时候指定容量&#xff1f;你真的用明白了吗&#xff1f;&#xff08;值得一阅&#xff09;_new hashmap<>(4);-CSDN博客 因为我们经常要通过代码做一些数据的分组&#xff0c;比如查…

深入理解网络协议——搞懂协议在系统中的应用

1. 不精确指明的协议软件接口 在多数实现中&#xff0c;TCP/IP协议软件驻留在计算机的操作系统中。因此&#xff0c;只要应用程序使用TCP/IP通信&#xff0c;它就必须与操作系统交互并请求其服务。从程序员的观点看&#xff0c;操作系统所提供的那些例程定义了应用程序和协议软…

重庆地区媒体宣传邀约资源整理

传媒如春雨&#xff0c;润物细无声&#xff0c;大家好&#xff0c;我是51媒体网胡老师。 重庆地区媒体宣传邀约资源整理 一、主流媒体资源 电视台&#xff1a;重庆电视台&#xff1a;作为重庆地区最具影响力的电视媒体之一&#xff0c;拥有多个频道&#xff0c;涵盖新闻、综艺…

python-日历库calendar

目录 打印日历 基本日历类Calendar TextCalendar类 HTMLCalendar类 打印日历 设置日历每周开始日期(周几) import calendarcalendar.setfirstweekday(calendar.SUNDAY) # 设置日历中每周以周几为第一天显示 打印某年日历 print(calendar.calendar(2024, w2, l1, c6, m…

数据结构与算法笔记:基础篇 - 分治算法:谈一谈大规模计算框架MapReduce中的分治思想

概述 MapReduce 是 Google 大数据处理的三姐马车之一&#xff0c;另外两个事 GFS 和 Bigtable。它在倒排索引、PageRank 计算、网页分析等搜索引擎相关的技术中都有大量的应用。 尽管开发一个 MapReduce 看起来很高深。实际上&#xff0c;万变不离其宗&#xff0c;它的本质就…

重磅!首个跨平台的通用Linux端间互联组件Klink在openKylin开源

随着智能终端设备的普及&#xff0c;多个智能终端设备之间的互联互通应用场景日益丰富&#xff0c;多设备互联互通应用场景需要开发者单独实现通讯协议。因此&#xff0c;为解决跨平台互联互通问题&#xff0c;由openKylin社区理事单位麒麟软件旗下星光麒麟团队成立的Connectiv…

2024下《网络工程师》50个高频考点汇总,背就有效!

宝子们&#xff01;上半年软考已经结束一段时间了&#xff0c;准备考下半年软考中级-网络工程师的小伙伴们可以开始准备了&#xff0c;这里给大家整理了50个高频考点&#xff0c;涵盖全书90%以上重点&#xff0c;先把这个存下&#xff01;再慢慢看书&#xff0c;边看书边背这个…

数据治理创新路:建设数据集市,强化数据报送一致性新实践

随着信息化和数字化的飞速发展&#xff0c;数据已经成为企业运营和决策的核心要素。然而&#xff0c;数据治理的复杂性和多样性给企业带来了不小的挑战。为了更好地应对这些挑战&#xff0c;许多企业开始探索数据治理的创新路径&#xff0c;其中建设数据集市和强化数据报送一致…

各类存储器类型(RAM、ROM、FLASH、DRAM、SRAM)

1 计算机存储类型构成 在计算机中&#xff0c;各类存储器构成了计算机能高速高效运转程序的基石。 计算机的存储体系中&#xff0c;从速度慢到速度快对应着容量大到小&#xff0c;也就是说&#xff0c;速度越快容量越小&#xff1b;容量越大的&#xff0c;速度越慢。两者互相…

echarts 折线图 实现某两个点之间不要连线

通过插入null或NaN的数据点来实现"断开"的效果 const data [[a, 1], [b, 2], [c, 3], [d, 4], [e, 5]] data.splice(2, 0, NaN) option {xAxis: {type: "category",data: [a, b, c, d, e]},yAxis: {},series: [{data,type: "line"}] }

大语言模型架构---Transformer 模型

文章目录 输入编码多头自注意力机制前馈网络层编码器解码器当前主流的大语言模型都基于 Transformer 模型进行设计的。Transformer 是由多层的多头自注意力(Multi-head Self-attention)模块堆叠而成的神经网络模型。原始的 Transformer 模型由编码器和解码器两个部分构成,而…