使用 LLaMA Factory 微调 Llama-3 中文对话模型

原文:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing#scrollTo=gf60HoT633NY

请申请一个免费 T4 GPU 来运行该脚本

详细讲上面连接。需要科学上网

微调过程大约需要 50 分钟。

训练脚本:

from llmtuner import run_exp

%cd /content/LLaMA-Factory/

run_exp(dict(

  stage="sft",

  do_train=True,

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  dataset="identity,alpaca_gpt4_en,alpaca_gpt4_zh",

  template="llama3",

  finetuning_type="lora",

  lora_target="all",

  output_dir="llama3_lora",

  per_device_train_batch_size=2,

  gradient_accumulation_steps=4,

  lr_scheduler_type="cosine",

  logging_steps=10,

  warmup_ratio=0.1,

  save_steps=1000,

  learning_rate=5e-5,

  num_train_epochs=3.0,

  max_samples=500,

  max_grad_norm=1.0,

  quantization_bit=4,

  loraplus_lr_ratio=16.0,

  use_unsloth=True,

  fp16=True,

))

训练过程日志

04/22/2024 04:10:40 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
WARNING:llmtuner.hparams.parser:We recommend enable `upcast_layernorm` in quantized training.
04/22/2024 04:10:40 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
INFO:llmtuner.hparams.parser:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,979 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,980 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,982 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,984 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:10:42,384 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:10:42 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset identity.json...
INFO:llmtuner.data.loader:Loading dataset identity.json...
04/22/2024 04:10:42 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/identity.json.
WARNING:llmtuner.data.utils:Checksum failed: mismatched SHA-1 hash value at data/identity.json.

Generating train split: 

 91/0 [00:00<00:00, 1640.44 examples/s]

Converting format of dataset: 100%

 91/91 [00:00<00:00, 2822.67 examples/s]

04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_en.json...

Generating train split: 

 52002/0 [00:00<00:00, 117346.95 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 14816.36 examples/s]

04/22/2024 04:10:43 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_zh.json...

Generating train split: 

 48818/0 [00:00<00:00, 91511.83 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 11785.79 examples/s]

Running tokenizer on dataset: 100%

 1091/1091 [00:00<00:00, 1358.62 examples/s]

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,417 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,419 >> Model config LlamaConfig {"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}
input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
04/22/2024 04:10:45 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,579 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,581 >> Model config LlamaConfig {"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,634 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,636 >> Model config LlamaConfig {"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,702 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,704 >> Model config LlamaConfig {"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "float16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}
==((====))==  Unsloth: Fast Llama patching release 2024.4\\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False."-____-"     Free Apache license: GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
[INFO|modeling_utils.py:3257] 2024-04-22 04:10:45,813 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 04:10:45,863 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 04:10:45,871 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128001
}[INFO|modeling_utils.py:3992] 2024-04-22 04:11:13,469 >> All model checkpoint weights were used when initializing LlamaForCausalLM.[INFO|modeling_utils.py:4000] 2024-04-22 04:11:13,472 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 04:11:13,539 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 04:11:13,540 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128001
}

tokenizer_config.json: 100%

 51.0k/51.0k [00:00<00:00, 2.14MB/s]

tokenizer.json: 100%

 9.08M/9.08M [00:00<00:00, 60.7MB/s]

special_tokens_map.json: 100%

 449/449 [00:00<00:00, 31.3kB/s]

[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,466 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,468 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,469 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,472 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:14,881 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,935 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,936 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,937 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,939 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:15,312 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:11:16 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
INFO:llmtuner.model.patcher:Gradient checkpointing enabled.
04/22/2024 04:11:16 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 04:11:16 - INFO - llmtuner.model.utils - Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
INFO:llmtuner.model.utils:Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
[WARNING|logging.py:329] 2024-04-22 04:11:16,731 >> Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
04/22/2024 04:11:16 - INFO - llmtuner.model.loader - trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
INFO:llmtuner.model.loader:trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
[INFO|trainer.py:601] 2024-04-22 04:11:16,796 >> Using auto half precision backend
04/22/2024 04:11:17 - INFO - llmtuner.train.utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.
INFO:llmtuner.train.utils:Using LoRA+ optimizer with loraplus lr ratio 16.00.
[WARNING|logging.py:329] 2024-04-22 04:11:17,203 >> ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1\\   /|    Num examples = 1,091 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 408"-____-"     Number of trainable parameters = 20,971,520

 [408/408 48:57, Epoch 2/3]

StepTraining Loss
101.568300
201.478600
301.298700
401.188600
501.185700
601.200300
701.249100
801.213600
901.255900
1001.186000
1101.210600
1201.216200
1301.111400
1401.077700
1500.906100
1600.895100
1700.981500
1800.759400
1900.834800
2000.816900
2100.773200
2200.946500
2300.764600
2400.914700
2500.864800
2600.840600
2700.853600
2800.745800
2900.500800
3000.597600
3100.616400
3200.574100
3300.490300
3400.602800
3500.563700
3600.552900
3700.574400
3800.468200
3900.549200
4000.528500

[INFO|<string>:460] 2024-04-22 05:00:27,815 >> Training completed. Do not forget to share your model on huggingface.co/models =)[INFO|trainer.py:3067] 2024-04-22 05:00:27,822 >> Saving model checkpoint to llama3_lora
[INFO|configuration_utils.py:728] 2024-04-22 05:00:28,263 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:00:28,266 >> Model config LlamaConfig {"_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}[INFO|tokenization_utils_base.py:2459] 2024-04-22 05:00:28,538 >> tokenizer config file saved in llama3_lora/tokenizer_config.json
[INFO|tokenization_utils_base.py:2468] 2024-04-22 05:00:28,541 >> Special tokens file saved in llama3_lora/special_tokens_map.json
[INFO|modelcard.py:450] 2024-04-22 05:00:28,827 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
***** train metrics *****epoch                    =       2.99total_flos               = 32079633GFtrain_loss               =     0.8929train_runtime            = 0:49:10.61train_samples_per_second =      1.109train_steps_per_second   =      0.138

推理:

from llmtuner import ChatModel

from llmtuner.extras.misc import torch_gc

%cd /content/LLaMA-Factory/

chat_model = ChatModel(dict(

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  adapter_name_or_path="llama3_lora",

  finetuning_type="lora",

  template="llama3",

))

messages = []

while True:

  query = input("\nUser: ")

  if query.strip() == "exit":

    torch_gc()

    break

  if query.strip() == "clear":

    messages = []

    torch_gc()

    print("History has been removed.")

    continue

  messages.append({"role": "user", "content": query})

  print("Assistant: ", end="", flush=True)

  response = ""

  for new_text in chat_model.stream_chat(messages):

    print(new_text, end="", flush=True)

    response += new_text

  print()

  messages.append({"role": "assistant", "content": response})

推理执行日志

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,951 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,953 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,957 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,959 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 05:12:14,407 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 05:12:14 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
[INFO|configuration_utils.py:728] 2024-04-22 05:12:14,462 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:12:14,464 >> Model config LlamaConfig {"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit","architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128001,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"quantization_config": {"_load_in_4bit": true,"_load_in_8bit": false,"bnb_4bit_compute_dtype": "bfloat16","bnb_4bit_quant_type": "nf4","bnb_4bit_use_double_quant": true,"llm_int8_enable_fp32_cpu_offload": false,"llm_int8_has_fp16_weight": false,"llm_int8_skip_modules": null,"llm_int8_threshold": 6.0,"load_in_4bit": true,"load_in_8bit": false,"quant_method": "bitsandbytes"},"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.38.2","use_cache": true,"vocab_size": 128256
}
04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
INFO:llmtuner.model.patcher:Using KV cache for faster generation.
[INFO|modeling_utils.py:3257] 2024-04-22 05:12:14,509 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 05:12:14,560 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 05:12:14,569 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128001
}[INFO|modeling_utils.py:3992] 2024-04-22 05:12:21,290 >> All model checkpoint weights were used when initializing LlamaForCausalLM.[INFO|modeling_utils.py:4000] 2024-04-22 05:12:21,293 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 05:12:21,357 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 05:12:21,359 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128001
}
04/22/2024 05:12:21 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 05:12:22 - INFO - llmtuner.model.adapter - Loaded adapter(s): llama3_lora
INFO:llmtuner.model.adapter:Loaded adapter(s): llama3_lora
04/22/2024 05:12:22 - INFO - llmtuner.model.loader - all params: 8051232768
INFO:llmtuner.model.loader:all params: 8051232768
User: 你是谁
Assistant: 您好,我是 Llama-Chinese,一个由 LLaMA Factory 开发的人工智能助手。我可以根据用户的需求提供相应的回答和帮助。User: 我的蓝牙耳机坏了,我该去看牙科还是耳鼻喉科?
Assistant: 很抱歉听到您的小配件出了问题。蓝牙耳机的问题与牙科或耳鼻喉科无关。您可以尝试与制造商或当地的维修店联系,进行维修或更换。User: 谢谢你的帮助
Assistant: 不客气。我是人工智能助手 Llama-Chinese,很高兴能帮到您。User: exit



本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/1922.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Abaqus三维晶体塑性Voronoi泰森多边形晶格建模插件

插件介绍 AbyssFish Voronoi2D&3D 3D V3.0 插件可对Abaqus内已进行网格划分的部件&#xff08;Part&#xff09;生成Voronoi泰森多边形区块。插件可对任意形状的二维或三维部件、任意特征&#xff08;实体或壳&#xff09;、任意单元形状进行指派Voronoi晶格&#xff0c;可…

【STM32F4】按键开关

上一章&#xff0c;我们介绍了STM32F4的IO口作为输出的使用&#xff0c;这一章&#xff0c;将向大家介绍如何使用按键作为输入使用。 &#xff08;一&#xff09;硬件连接 根据正点原子的stm32f4阿波罗开发板&#xff0c;可以看见 按键KEY0连接在PH3上、 KEY1连接在PH2上、 …

【WEEK9】学习目标及总结【Spring Boot】【中文版】

学习目标&#xff1a; 学习SpringBoot 学习内容&#xff1a; 参考视频教程【狂神说Java】SpringBoot最新教程IDEA版通俗易懂YAML注入 学习时间及产出&#xff1a; 第九周MON~FRI 2024.4.22【WEEK9】 【DAY1】YAML配置注入第一部分【中文版】【WEEK9】 【DAY1】YAML Configur…

SQLite的DBSTAT 虚拟表(三十六)

返回&#xff1a;SQLite—系列文章目录 上一篇:SQLite运行时可加载扩展(三十五&#xff09; 下一篇&#xff1a;SQLite—系列文章目录 1. 概述 DBSTAT 虚拟表是一个只读的同名虚拟表&#xff0c;返回 有关用于存储内容的磁盘空间量的信息 的 SQLite 数据库。 示例用例…

FPGA - ZYNQ 基于Axi_Lite的PS和PL交互

前言 在FPGA - ZYNQ 基于EMIO的PS和PL交互中介绍了ZYNQ 中PS端和PL端交互的开发流程&#xff0c;接下来构建基于基于Axi_Lite的PS和PL交互。 开发流程 Axi_Lite从机 在FPGA - AXI4_Lite&#xff08;实现用户端与axi4_lite之间的交互逻辑&#xff09;中&#xff0c;详解介绍…

echarts,点击事件,点击空白处与柱状图

echarts&#xff0c;点击事件 问题&#xff1a; 折线图的情况 只有点击到折线节点的时候才能拿到返回数据或者进行下一步操作&#xff01; 期望在鼠标随便点击的情况下&#xff0c;可以自动找到最近节点的数据&#xff0c;做一些事情&#xff0c;而不是去费力费眼的去找那个小…

性能工具之 JMeter 自定义 Java Sampler 支持国密 SM2 算法

文章目录 一、前言二、加密接口1、什么是SM22、被测接口加密逻辑 三、准备工作四、JMeter 扩展实现步骤1&#xff1a;准备开发环境步骤2&#xff1a;了解实现方法步骤3&#xff1a;runTest 方法步骤4&#xff1a;getDefaultParameters 方法步骤5&#xff1a;setupTest 方法 五、…

HTX迪拜之夜盛大举行:共筑开放、互联的Web3生态系统

4月18日&#xff0c;由HTX、HTX DAO主办&#xff0c;去中心化AI云游戏协议DeepLink赞助的HTX迪拜之夜主题活动“领航者相聚&#xff0c;引领币圈新风向”在迪拜盛大举行。通过在全球第二大加密中心-迪拜的频繁亮相&#xff0c;HTX正积极塑造自己作为行业领导者的形象&#xff0…

Mysql学习一

目录 1.启动数据库&#xff1a; 2.命令行连接到MySQL&#xff08;winr输入cmd&#xff09; 3.MySQL的三重结构&#xff1a; 4.SQL语句分类&#xff1a; 1.启动数据库&#xff1a; winr——输入services.msc进入本地服务 2.命令行连接到MySQL&#xff08;winr输入cmd&#x…

学习前端第二十六天(对象 —— 原始值转换、原始类型的方法、数字类型)

一、对象 —— 原始值转换 目的&#xff1a;对象是如何转换为原始值的&#xff0c;以及如何对其进行自定义 obj[Symbol.toPrimitive] function(hint) { // 这里是将此对象转换为原始值的代码 // 它必须返回一个原始值 // hint "string"、"number" 或…

ABC350A-F题解

ABC350 A-E题解 A题目AC Code&#xff08;CPP&#xff09;&#xff1a;AC Code&#xff08;Python&#xff09;: B题目AC Code&#xff08;CPP&#xff09;&#xff1a;AC Code&#xff08;Python&#xff09;&#xff1a; C题目AC Code&#xff08;CPP&#xff09;&#xff1a…

新加坡VPS服务器Linux系统的安全性如何增强

增强新加坡VPS服务器上Linux系统的安全性是至关重要的&#xff0c;以下是一些常见的方法和建议&#xff1a; 更新系统和软件&#xff1a; 定期更新操作系统和安装的软件包&#xff0c;确保系统中的所有组件都是最新版本&#xff0c;以修补已知的漏洞和安全问题。 配置防火墙&am…

109. Python的turtle库简介

109. Python的turtle库简介 【目录】 文章目录 109. Python的turtle库简介1. 什么是turtle库&#xff1f;2. 用turtle库绘制一个爱心图案3. 库的导入方法3.1 直接导入整个库3.2 从库中导入特定的函数或类3.3 导入库中的所有内容3.4 为导入的库设置别名3.5 为导入的函数或变量设…

阿里巴巴Java开发规范——编程规约(3)

# 阿里巴巴Java开发规范——编程规约&#xff08;3&#xff09; 编程规约 &#xff08;四&#xff09; OOP规约 1.【强制】构造方法里面禁止加入任何业务逻辑&#xff0c;如果有初始化逻辑&#xff0c;请放在 init 方法中 这条编程规范的目的是为了保持代码的清晰性、可读性…

HTTP协议中的请求方法及其在前端的应用

简介&#xff1a; HTTP&#xff08;Hypertext Transfer Protocol&#xff09;是用于在网络上传输超文本的协议&#xff0c;定义了多种请求方法&#xff0c;用于指定客户端对服务器资源的操作方式。。 1. GET请求 GET请求用于从服务器获取资源&#xff0c;是最常见的请求方法之…

stm32开发三、单片机关键字extern

单片机关键字extern 1 定义 extern 用于指示变量或函数是在其他文件中定义的&#xff0c;但在当前文件中也要使用它。 2 使用场景 当你想在一个文件中使用另一个文件中定义的全局变量或函数时&#xff0c;你可以使用 extern。 它只是声明&#xff0c;不分配内存或分配存储空…

【C++提高】算法

算法 一、遍历算法1. for_each2. transform 二、查找算法1. find2. find_if3. adjacent_find4. binary_search5. count6. count_if 三、排序算法1. sort2. random_shuffle3. merge4. reverse 四、拷贝和替换算法1. copy2. replace3. replace_if4. swap 五、算术生成算法1. accu…

AOP

代理模式 提出问题 现有缺陷 假设我们有一个计算类&#xff0c;里面有加减乘除四个方法&#xff0c;现在我们要为这四个方法添加日志&#xff0c;即在方法执行的前后分别输出一句话&#xff0c;这时我们会发现如下缺陷&#xff1a; 1.对核心业务有干扰。核心业务是加减乘除…

货拉拉0-1数据指标体系构建与应用

目录 一、背景 二、指标体系搭建 2.1 指标设计 2.2 指标体系搭建 2.3 指标维度拆解 三、指标标准化建设 四、指标元数据管理 五、指标应用&未来规划 原文大佬介绍的这篇指标体系构建有借鉴意义&#xff0c;现摘抄下来用作沉淀学习。如有侵权请告知~ 一、背景 指标…

汽车摄像头匿名化处理解决方案,保护信息的安全性和隐私性

随着智能交通和自动驾驶技术的迅猛发展&#xff0c;汽车摄像头已成为现代汽车不可或缺的一部分&#xff0c;摄像头所捕捉的图像信息也引发了日益严峻的信息安全问题。如何在充分利用摄像头功能的同时&#xff0c;保障个人隐私和信息安全&#xff0c;已成为企业亟待解决的问题。…