大模型LLM的finetune

最新推荐文章于 2025-11-25 00:57:28 发布

原创最新推荐文章于 2025-11-25 00:57:28 发布 · 365 阅读

·

6

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

文章标签：

部署运行你感兴趣的模型镜像

output_dir = trained_model_name

training_args = TrainingArguments(
    # Learning rate
    learning_rate=1.0e-5,
    
    # Number of training epochs
    num_train_epochs=1,
    
    # Max steps to train for (each step is a batch of data)
    # Overrides num_train_epochs, if not -1
    max_steps=max_steps,
    
    # Batch size for training
    per_device_train_batch_size=1,
    
    # Directory to save model checkpoints
    output_dir=output_dir,
    
    # Other arguments
    overwrite_output_dir=False,  # Overwrite the content of the output directory
    disable_tqdm=False,  # Disable progress bars
    eval_steps=120,  # Number of update steps between two evaluations
    save_steps=120,  # After # steps model is saved
    warmup_steps=1,  # Number of warmup steps for learning rate scheduler
    per_device_eval_batch_size=1,  # Batch size for evaluation
    evaluation_strategy="steps",
    logging_strategy="steps",
    logging_steps=1,
    optim="adafactor",
    gradient_accumulation_steps = 4,
    gradient_checkpointing=False,
    
    # Parameters for early stopping
    load_best_model_at_end=True,
    save_total_limit=1,
    metric_for_best_model="eval_loss",
)

配置参数：

learning_rate=1.0e-5: 学习率，控制模型参数更新的步长大小。这里设置为0.00001，属于较小的学习率，适合精细调整。
num_train_epochs=1: 训练轮数，表示数据集会被完整遍历1次。
max_steps=max_steps: 最大训练步数，覆盖epochs设置。注意这里有个变量引用，可能在代码其他地方定义。
per_device_train_batch_size=1: 每个设备的训练批次大小为1，非常小，通常用于特别大的模型或内存受限情况。
output_dir=output_dir: 保存模型检查点的目录。
overwrite_output_dir=False: 不覆盖输出目录的现有内容。
disable_tqdm=False: 不禁用进度条显示。
eval_steps=120: 每120步进行一次评估。
save_steps=120: 每120步保存一次模型。
warmup_steps=1: 学习率预热步数，只有1步。
per_device_eval_batch_size=1: 评估时每设备批次大小为1。
evaluation_strategy="steps": 按步数进行评估，而非按轮数。
logging_strategy="steps": 按步数记录日志。
logging_steps=1: 每1步记录一次日志，非常频繁。
optim="adafactor": 使用Adafactor优化器，比Adam更节省内存。
gradient_accumulation_steps=4: 梯度累积步数，实际批次大小为1×4=4。
gradient_checkpointing=False: 不使用梯度检查点，可能会消耗更多内存。
load_best_model_at_end=True: 训练结束时加载性能最佳的模型。
save_total_limit=1: 最多保存1个检查点，节省存储空间。
metric_for_best_model="eval_loss": 使用验证损失作为选择最佳模型的指标。

您可能感兴趣的与本文相关的镜像

Llama Factory

Llama Factory

模型微调

LLama-Factory

LLaMA Factory 是一个简单易用且高效的大型语言模型（Large Language Model）训练与微调平台。通过 LLaMA Factory，可以在无需编写任何代码的前提下，在本地完成上百种预训练模型的微调

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。