使用 LLaMA Factory 微调 Llama-3 中文对话模型

原文:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing#scrollTo=gf60HoT633NY

请申请一个免费 T4 GPU 来运行该脚本

详细讲上面连接。需要科学上网

微调过程大约需要 50 分钟。

训练脚本:

from llmtuner import run_exp

%cd /content/LLaMA-Factory/

run_exp(dict(

  stage="sft",

  do_train=True,

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  dataset="identity,alpaca_gpt4_en,alpaca_gpt4_zh",

  template="llama3",

  finetuning_type="lora",

  lora_target="all",

  output_dir="llama3_lora",

  per_device_train_batch_size=2,

  gradient_accumulation_steps=4,

  lr_scheduler_type="cosine",

  logging_steps=10,

  warmup_ratio=0.1,

  save_steps=1000,

  learning_rate=5e-5,

  num_train_epochs=3.0,

  max_samples=500,

  max_grad_norm=1.0,

  quantization_bit=4,

  loraplus_lr_ratio=16.0,

  use_unsloth=True,

  fp16=True,

))

训练过程日志

04/22/2024 04:10:40 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
WARNING:llmtuner.hparams.parser:We recommend enable `upcast_layernorm` in quantized training.
04/22/2024 04:10:40 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
INFO:llmtuner.hparams.parser:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,979 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,980 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,982 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:10:41,984 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:10:42,384 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:10:42 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset identity.json...
INFO:llmtuner.data.loader:Loading dataset identity.json...
04/22/2024 04:10:42 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/identity.json.
WARNING:llmtuner.data.utils:Checksum failed: mismatched SHA-1 hash value at data/identity.json.

Generating train split: 

 91/0 [00:00<00:00, 1640.44 examples/s]

Converting format of dataset: 100%

 91/91 [00:00<00:00, 2822.67 examples/s]

04/22/2024 04:10:42 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_en.json...

Generating train split: 

 52002/0 [00:00<00:00, 117346.95 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 14816.36 examples/s]

04/22/2024 04:10:43 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
INFO:llmtuner.data.loader:Loading dataset alpaca_gpt4_data_zh.json...

Generating train split: 

 48818/0 [00:00<00:00, 91511.83 examples/s]

Converting format of dataset: 100%

 500/500 [00:00<00:00, 11785.79 examples/s]

Running tokenizer on dataset: 100%

 1091/1091 [00:00<00:00, 1358.62 examples/s]

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,417 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,419 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 445, 81101, 30653, 7496, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am Llama-Chinese, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
04/22/2024 04:10:45 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,579 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,581 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,634 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,636 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|configuration_utils.py:728] 2024-04-22 04:10:45,702 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 04:10:45,704 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

==((====))==  Unsloth: Fast Llama patching release 2024.4
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
[INFO|modeling_utils.py:3257] 2024-04-22 04:10:45,813 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 04:10:45,863 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 04:10:45,871 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|modeling_utils.py:3992] 2024-04-22 04:11:13,469 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4000] 2024-04-22 04:11:13,472 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 04:11:13,539 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 04:11:13,540 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

tokenizer_config.json: 100%

 51.0k/51.0k [00:00<00:00, 2.14MB/s]

tokenizer.json: 100%

 9.08M/9.08M [00:00<00:00, 60.7MB/s]

special_tokens_map.json: 100%

 449/449 [00:00<00:00, 31.3kB/s]

[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,466 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,468 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,469 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,472 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:14,881 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,935 >> loading file tokenizer.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,936 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,937 >> loading file special_tokens_map.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 04:11:14,939 >> loading file tokenizer_config.json from cache at huggingface_tokenizers_cache/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 04:11:15,312 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 04:11:16 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
INFO:llmtuner.model.patcher:Gradient checkpointing enabled.
04/22/2024 04:11:16 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 04:11:16 - INFO - llmtuner.model.utils - Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
INFO:llmtuner.model.utils:Found linear modules: k_proj,o_proj,down_proj,v_proj,up_proj,q_proj,gate_proj
[WARNING|logging.py:329] 2024-04-22 04:11:16,731 >> Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
04/22/2024 04:11:16 - INFO - llmtuner.model.loader - trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
INFO:llmtuner.model.loader:trainable params: 20971520 || all params: 8051232768 || trainable%: 0.2605
[INFO|trainer.py:601] 2024-04-22 04:11:16,796 >> Using auto half precision backend
04/22/2024 04:11:17 - INFO - llmtuner.train.utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.
INFO:llmtuner.train.utils:Using LoRA+ optimizer with loraplus lr ratio 16.00.
[WARNING|logging.py:329] 2024-04-22 04:11:17,203 >> ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,091 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 408
 "-____-"     Number of trainable parameters = 20,971,520

 [408/408 48:57, Epoch 2/3]

StepTraining Loss
101.568300
201.478600
301.298700
401.188600
501.185700
601.200300
701.249100
801.213600
901.255900
1001.186000
1101.210600
1201.216200
1301.111400
1401.077700
1500.906100
1600.895100
1700.981500
1800.759400
1900.834800
2000.816900
2100.773200
2200.946500
2300.764600
2400.914700
2500.864800
2600.840600
2700.853600
2800.745800
2900.500800
3000.597600
3100.616400
3200.574100
3300.490300
3400.602800
3500.563700
3600.552900
3700.574400
3800.468200
3900.549200
4000.528500

[INFO|<string>:460] 2024-04-22 05:00:27,815 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3067] 2024-04-22 05:00:27,822 >> Saving model checkpoint to llama3_lora
[INFO|configuration_utils.py:728] 2024-04-22 05:00:28,263 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:00:28,266 >> Model config LlamaConfig {
  "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2459] 2024-04-22 05:00:28,538 >> tokenizer config file saved in llama3_lora/tokenizer_config.json
[INFO|tokenization_utils_base.py:2468] 2024-04-22 05:00:28,541 >> Special tokens file saved in llama3_lora/special_tokens_map.json
[INFO|modelcard.py:450] 2024-04-22 05:00:28,827 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
***** train metrics *****
  epoch                    =       2.99
  total_flos               = 32079633GF
  train_loss               =     0.8929
  train_runtime            = 0:49:10.61
  train_samples_per_second =      1.109
  train_steps_per_second   =      0.138

推理:

from llmtuner import ChatModel

from llmtuner.extras.misc import torch_gc

%cd /content/LLaMA-Factory/

chat_model = ChatModel(dict(

  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit",

  adapter_name_or_path="llama3_lora",

  finetuning_type="lora",

  template="llama3",

))

messages = []

while True:

  query = input("\nUser: ")

  if query.strip() == "exit":

    torch_gc()

    break

  if query.strip() == "clear":

    messages = []

    torch_gc()

    print("History has been removed.")

    continue

  messages.append({"role": "user", "content": query})

  print("Assistant: ", end="", flush=True)

  response = ""

  for new_text in chat_model.stream_chat(messages):

    print(new_text, end="", flush=True)

    response += new_text

  print()

  messages.append({"role": "assistant", "content": response})

推理执行日志

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,951 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,953 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,957 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/special_tokens_map.json
[INFO|tokenization_utils_base.py:2046] 2024-04-22 05:12:13,959 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/tokenizer_config.json
[WARNING|logging.py:314] 2024-04-22 05:12:14,407 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
04/22/2024 05:12:14 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
INFO:llmtuner.data.template:Replace eos token: <|eot_id|>
[INFO|configuration_utils.py:728] 2024-04-22 05:12:14,462 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/config.json
[INFO|configuration_utils.py:791] 2024-04-22 05:12:14,464 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 128256
}

04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Loading ?-bit BITSANDBYTES-quantized model.
INFO:llmtuner.model.patcher:Loading ?-bit BITSANDBYTES-quantized model.
04/22/2024 05:12:14 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
INFO:llmtuner.model.patcher:Using KV cache for faster generation.
[INFO|modeling_utils.py:3257] 2024-04-22 05:12:14,509 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-04-22 05:12:14,560 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-04-22 05:12:14,569 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|modeling_utils.py:3992] 2024-04-22 05:12:21,290 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4000] 2024-04-22 05:12:21,293 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at unsloth/llama-3-8b-Instruct-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-04-22 05:12:21,357 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/3fe39f383d040c52882f7f9faaf600dc57c01a8d/generation_config.json
[INFO|configuration_utils.py:845] 2024-04-22 05:12:21,359 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

04/22/2024 05:12:21 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
04/22/2024 05:12:22 - INFO - llmtuner.model.adapter - Loaded adapter(s): llama3_lora
INFO:llmtuner.model.adapter:Loaded adapter(s): llama3_lora
04/22/2024 05:12:22 - INFO - llmtuner.model.loader - all params: 8051232768
INFO:llmtuner.model.loader:all params: 8051232768
User: 你是谁
Assistant: 您好,我是 Llama-Chinese,一个由 LLaMA Factory 开发的人工智能助手。我可以根据用户的需求提供相应的回答和帮助。

User: 我的蓝牙耳机坏了,我该去看牙科还是耳鼻喉科?
Assistant: 很抱歉听到您的小配件出了问题。蓝牙耳机的问题与牙科或耳鼻喉科无关。您可以尝试与制造商或当地的维修店联系,进行维修或更换。

User: 谢谢你的帮助
Assistant: 不客气。我是人工智能助手 Llama-Chinese,很高兴能帮到您。

User: exit



### 使用 LLaMA-Factory 对 DeepSeek-R1-Distill-Qwen-7B 模型进行微调 对于希望使用 `LLaMA-Factory` 工具对特定模型如 `DeepSeek-R1-Distill-Qwen-7B` 进行微调的情况,可以遵循以下指导: #### 准备环境与资源 确保已经按照官方说明克隆了项目仓库并安装依赖项。这一步骤至关重要,因为后续操作均基于此开发环境中执行。 ```bash git clone http://developer.sourcefind.cn/codes/OpenDAS/llama-factory.git cd llama-factory && pip install -e ".[torch,metrics]" ``` 上述命令用于获取最新版本的源码以及设置必要的Python包支持[^1]。 #### 配置微调参数文件 创建或编辑配置YAML文件以定义具体的超参设定、数据路径及其他选项。针对不同任务需求调整这些参数能够显著影响最终效果。例如,在处理图像分类问题时,可能需要特别指定输入特征维度等细节。 假设有一个名为 `my_custom_finetune_config.yaml` 的配置文件,其中包含了关于目标领域(比如时尚物品识别)、训练集位置以及其他必要信息的内容描述。 #### 执行微调过程 利用预构建脚本启动实际的微调流程。这里假定采用的是低秩适应(LoRA)技术来进行高效迁移学习,则对应的CLI指令可能是这样的形式: ```bash llamafactory-cli train examples/train_lora/mytrain_lora_sft.yaml ``` 这条语句会读取之前准备好的配置文档,并据此开始优化给定的基础架构——即此处提到的 `DeepSeek-R1-Distill-Qwen-7B` ——使之更贴合于新的应用场景下的表现期望[^2]。 #### 测试改进后的性能 一旦完成一轮或多轮迭代更新之后,应当及时评估新版本的表现差异。借助内置Web界面功能可以直接加载最新的检查点权重,并通过交互方式快速验证某些具体实例上的变化趋势。 ```bash llamafactory-cli webui ``` 此时可以在图形界面上轻松上传待测样本(如图片),观察经过定制化改造过的网络结构能否更加精准地给出预期类别标签[^3]。 #### 应用场景中的考量 考虑到实际应用背景中存在将第三方标注体系转换成本地标准的需求,直接通过对基础AI系统的针对性再教育来减少中间环节误差不失为一种有效策略。这种方式不仅有助于提升整体判断精度,同时也简化了后期维护工作流的设计复杂度[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值