R: 控制台的清理与输出内容的保存 (Clear Console & Save Console Output)

本文详细介绍了如何在R环境中清空控制台并保存控制台输出内容的方法,包括使用cat()函数、快捷键、sink()函数等技巧。同时,提供了保存输出内容到文件的步骤,便于后续查看和分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Part1 - 清空控制台

每个用R的人,尤其是像我这样programing 零基础的人,经常在调试自定义程序的时候会在控制台留下一大堆的error 和warning 信息,红彤彤的煞是好看哇有木有,但是为了掩盖自己的菜鸟性质,不让在身后出没的妹子看到满屏的错误,一定得找个办法掩饰啊~那要怎么清空控制台呢?

  • 二逼青年:无限按回车键
  • 文艺青年:合理应用Concatenate and Print公式cat()与换行符"\n",比如。。。
    • cat("\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n")
    • cat(rep("\n",100))
  • 正常青年:看看R或者RStudio里有木有啥操作键?
    • 工具栏 →_→ Edit →_→ Clear Console
    • 同样,Clear Console功能键右边提示的快捷键为 Ctrl + L
  • 牛逼青年:命令行输入cat("\014") 同样可以实现Ctrl + L的功能
=================================~嘚瑟~嘚瑟~嘚瑟~嘚瑟~的分割线~=================================

Part2 - 保存控制台输出内容
在清空控制台前,如果想把每条输入的命令和命令得到的结果记录下来或者导出到本地,应该用什么方法呢?
  • 先把所有的输入命令放到一个script文件中,如test.R. 
  • test <- function(){
      for (i in 1:5){
        if (i * 2 <5){
          warning(paste('when i = ', i, ", I love her!", sep = ""))
        }else if (i * 2 <9){
          message(paste('when i = ', i, ", I love him!", sep = ""))
        }else{
          stop(paste('when i = ', i, ", I love myself!", sep = ""))
        }
      } 
    }
    
    test()

  • 然后用sink()来记录test()的input和output。
    • sink()的full set为sink(file = NULL, append = FALSE, type = c("output", "message"), split = FALSE)
    • 常用的三个参数为:
      • file 赋值为一个可编辑的文件,如果不给file赋值,则默认为不再使用sink()函数
      • append = T时,控制台输出的内容会定向(接续)写到file定义的文件中;反之,定向输出到file文件中的内容将覆盖前一次sink()时写入的内容
      • type既可以是输入命令在控制台中的output,也可以是命令随之产生的message (message, warning, error, etc.)
  • con <- file("test.log") # 创建一个.log文件
    sink(con, append=TRUE) # 记录output
    sink(con, append=TRUE, type="message") # 记录message
    # 所有的output和message都会记录到test.log中,而控制台中不在有信息显示
    
    # 读取test.R的命令,所有的input在被解析后都会显示出来,此处会直接记录到test.log中
    source("test.R", echo=TRUE, max.deparse.length=10000)
    
    # 记录完毕后,重置output和message的记录,运行完一下两行,后续的输入命令重新显示到控制台中
    sink()
    sink(type="message")
    
    # 在控制台中显示test.log中记录下楼来的命令output和message
    cat(readLines("test.log"), sep="\n")
    
    # 导出
    write.table(cat(readLines("test.log"), sep="\n"), "log.txt")
    
    
    
这种处理方法,可以在调试千行以上的自定义函数代码时,用以记录有可能出现但是被新出现的output和message所挤出控制台行数限制的error message。

import os import torch import transformers from transformers import ( AutoModelForCausalLM, AutoTokenizer, TrainingArguments, DataCollatorForLanguageModeling, BitsAndBytesConfig, Trainer ) from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training from datasets import load_dataset import logging import psutil import gc from datetime import datetime # === 配置区域 === MODEL_NAME = &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/models/Yi-6B&quot; DATASET_PATH = &quot;./data/train_lora_formatted.jsonl&quot; OUTPUT_DIR = &quot;./yi6b-lora-optimized&quot; DEVICE_MAP = &quot;auto&quot; # 使用自动设备映射 # 确保输出目录存在 os.makedirs(OUTPUT_DIR, exist_ok=True) # === 内存优化配置 === os.environ[&quot;PYTORCH_CUDA_ALLOC_CONF&quot;] = &quot;expandable_segments:True&quot; # 减少内存碎片 torch.backends.cuda.cufft_plan_cache.clear() # 清理CUDA缓存 # === 增强的日志系统 === def setup_logging(output_dir): &quot;&quot;&quot;配置日志系统,支持文件和TensorBoard&quot;&quot;&quot; logger = logging.getLogger(__name__) logger.setLevel(logging.INFO) # 文件日志处理器 file_handler = logging.FileHandler(os.path.join(output_dir, &quot;training.log&quot;)) file_handler.setFormatter(logging.Formatter(&#39;%(asctime)s - %(name)s - %(levelname)s - %(message)s&#39;)) logger.addHandler(file_handler) # 控制台日志处理器 console_handler = logging.StreamHandler() console_handler.setFormatter(logging.Formatter(&#39;%(asctime)s - %(levelname)s - %(message)s&#39;)) logger.addHandler(console_handler) # TensorBoard日志目录 tensorboard_log_dir = os.path.join(output_dir, &quot;logs&quot;, datetime.now().strftime(&quot;%Y%m%d-%H%M%S&quot;)) os.makedirs(tensorboard_log_dir, exist_ok=True) # 安装TensorBoard回调 tb_writer = None try: from torch.utils.tensorboard import SummaryWriter tb_writer = SummaryWriter(log_dir=tensorboard_log_dir) logger.info(f&quot;TensorBoard日志目录: {tensorboard_log_dir}&quot;) except ImportError: logger.warning(&quot;TensorBoard未安装,可视化功能不可用&quot;) return logger, tb_writer logger, tb_writer = setup_logging(OUTPUT_DIR) # === 量化配置 - 使用更高效的配置 === quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=&quot;nf4&quot;, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) # === 加载模型 === logger.info(&quot;加载预训练模型...&quot;) model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, device_map=DEVICE_MAP, quantization_config=quant_config, torch_dtype=torch.bfloat16, trust_remote_code=True, attn_implementation=&quot;flash_attention_2&quot; # 使用FlashAttention优化内存 ) # === 分词器处理 === logger.info(&quot;加载分词器...&quot;) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) tokenizer.padding_side = &quot;right&quot; if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.pad_token_id = tokenizer.eos_token_id # === 准备模型训练 === model = prepare_model_for_kbit_training( model, use_gradient_checkpointing=True # 启用梯度检查点以节省内存 ) # === LoRA 配置 - 优化内存使用 === logger.info(&quot;配置LoRA...&quot;) lora_config = LoraConfig( r=64, # 降低rank以减少内存使用 lora_alpha=32, # 降低alpha值 target_modules=[&quot;q_proj&quot;, &quot;v_proj&quot;], # 减少目标模块 lora_dropout=0.05, bias=&quot;none&quot;, task_type=&quot;CAUSAL_LM&quot; ) model = get_peft_model(model, lora_config) # 记录可训练参数 trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) total_params = sum(p.numel() for p in model.parameters()) logger.info(f&quot;可训练参数: {trainable_params:,} / 总参数: {total_params:,} ({trainable_params/total_params:.2%})&quot;) # === 加载并预处理数据集 === logger.info(&quot;加载和预处理数据集...&quot;) dataset = load_dataset(&quot;json&quot;, data_files=DATASET_PATH, split=&quot;train&quot;) # 文本过滤函数 def is_valid_text(example): text = example.get(&quot;text&quot;, &quot;&quot;) return text is not None and len(text.strip()) &gt; 200 # 增加最小长度要求 dataset = dataset.filter(is_valid_text) logger.info(f&quot;过滤后数据集大小: {len(dataset)} 条&quot;) # 动态填充的分词函数 - 节省内存 def tokenize_function(examples): tokenized = tokenizer( examples[&quot;text&quot;], padding=True, # 使用动态填充 truncation=True, max_length=1024, # 降低上下文长度以减少内存使用 ) # 创建 labels - 因果语言建模需要 labels = input_ids tokenized[&quot;labels&quot;] = tokenized[&quot;input_ids&quot;].copy() return tokenized tokenized_dataset = dataset.map( tokenize_function, batched=True, remove_columns=[&quot;text&quot;], batch_size=64, # 降低批处理大小以减少内存峰值 num_proc=4, # 减少进程数以降低内存开销 ) # === 数据整理器 === data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=False # 因果语言建模 ) # === 训练参数 - 优化内存使用 === report_to_list = [&quot;tensorboard&quot;] if tb_writer else [] training_args = TrainingArguments( output_dir=OUTPUT_DIR, per_device_train_batch_size=4, # 大幅降低批次大小 gradient_accumulation_steps=4, # 增加梯度累积步数以保持有效批次大小 learning_rate=2e-5, num_train_epochs=3, logging_steps=50, save_strategy=&quot;steps&quot;, save_steps=500, bf16=True, optim=&quot;paged_adamw_32bit&quot;, report_to=report_to_list, warmup_ratio=0.05, gradient_checkpointing=True, # 启用梯度检查点 fp16=False, max_grad_norm=0.3, # 降低梯度裁剪阈值 remove_unused_columns=True, # 移除未使用的列以节省内存 dataloader_num_workers=4, # 减少数据加载工作线程 evaluation_strategy=&quot;steps&quot;, eval_steps=500, save_total_limit=2, # 减少保存的检查点数量 logging_dir=os.path.join(OUTPUT_DIR, &quot;logs&quot;), load_best_model_at_end=True, ddp_find_unused_parameters=False, logging_first_step=True, group_by_length=True, lr_scheduler_type=&quot;cosine&quot;, weight_decay=0.01, ) # === GPU监控工具 === def monitor_gpu(): &quot;&quot;&quot;监控GPU使用情况&quot;&quot;&quot; if torch.cuda.is_available(): device = torch.device(&quot;cuda&quot;) mem_alloc = torch.cuda.memory_allocated(device) / 1024**3 mem_reserved = torch.cuda.memory_reserved(device) / 1024**3 mem_total = torch.cuda.get_device_properties(device).total_memory / 1024**3 return { &quot;allocated&quot;: f&quot;{mem_alloc:.2f} GB&quot;, &quot;reserved&quot;: f&quot;{mem_reserved:.2f} GB&quot;, &quot;total&quot;: f&quot;{mem_total:.2f} GB&quot;, &quot;utilization&quot;: f&quot;{mem_alloc/mem_total*100:.1f}%&quot; } return {} # === 创建训练器 === eval_dataset = None if len(tokenized_dataset) &gt; 100: eval_dataset = tokenized_dataset.select(range(100)) trainer = Trainer( model=model, tokenizer=tokenizer, args=training_args, train_dataset=tokenized_dataset, eval_dataset=eval_dataset, data_collator=data_collator, ) # === 训练前验证 === def validate_data_and_model(): &quot;&quot;&quot;验证数据和模型是否准备好训练&quot;&quot;&quot; logger.info(&quot;\n=== 训练前验证 ===&quot;) # 检查样本格式 sample = tokenized_dataset[0] logger.info(f&quot;样本键: {list(sample.keys())}&quot;) logger.info(f&quot;input_ids 长度: {len(sample[&#39;input_ids&#39;])}&quot;) # 创建单个样本测试批次 test_batch = data_collator([sample]) # 移动数据到设备 test_batch = {k: v.to(model.device) for k, v in test_batch.items()} # 前向传播测试 model.train() outputs = model(**test_batch) loss_value = outputs.loss.item() logger.info(f&quot;测试批次损失: {loss_value:.4f}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_scalar(&quot;debug/test_loss&quot;, loss_value, 0) # 反向传播测试 outputs.loss.backward() logger.info(&quot;反向传播成功!&quot;) # 重置梯度 model.zero_grad() logger.info(&quot;验证完成,准备开始训练\n&quot;) # 记录初始GPU使用情况 gpu_status = monitor_gpu() logger.info(f&quot;初始GPU状态: {gpu_status}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(&quot;system/initial_gpu&quot;, str(gpu_status), 0) validate_data_and_model() # === 自定义回调 - 监控资源使用 === class ResourceMonitorCallback(transformers.TrainerCallback): def __init__(self, tb_writer=None): self.tb_writer = tb_writer self.start_time = datetime.now() self.last_log_time = datetime.now() def on_step_end(self, args, state, control, **kwargs): current_time = datetime.now() time_diff = (current_time - self.last_log_time).total_seconds() # 每分钟记录一次资源使用情况 if time_diff &gt; 60: self.last_log_time = current_time # GPU监控 gpu_status = monitor_gpu() logger.info(f&quot;Step {state.global_step} - GPU状态: {gpu_status}&quot;) # CPU和内存监控 cpu_percent = psutil.cpu_percent() mem = psutil.virtual_memory() logger.info(f&quot;CPU使用率: {cpu_percent}%, 内存使用: {mem.used/1024**3:.2f}GB/{mem.total/1024**3:.2f}GB&quot;) # 记录到TensorBoard if self.tb_writer: # GPU显存使用 if torch.cuda.is_available(): device = torch.device(&quot;cuda&quot;) mem_alloc = torch.cuda.memory_allocated(device) / 1024**3 self.tb_writer.add_scalar(&quot;system/gpu_mem&quot;, mem_alloc, state.global_step) # CPU使用率 self.tb_writer.add_scalar(&quot;system/cpu_usage&quot;, cpu_percent, state.global_step) # 系统内存使用 self.tb_writer.add_scalar(&quot;system/ram_usage&quot;, mem.used/1024**3, state.global_step) def on_log(self, args, state, control, logs=None, **kwargs): &quot;&quot;&quot;记录训练指标到TensorBoard&quot;&quot;&quot; if self.tb_writer and logs is not None: for metric_name, metric_value in logs.items(): if &quot;loss&quot; in metric_name or &quot;lr&quot; in metric_name or &quot;grad_norm&quot; in metric_name: self.tb_writer.add_scalar(f&quot;train/{metric_name}&quot;, metric_value, state.global_step) def on_train_end(self, args, state, control, **kwargs): &quot;&quot;&quot;训练结束时记录总时间&quot;&quot;&quot; training_time = datetime.now() - self.start_time logger.info(f&quot;训练总时间: {training_time}&quot;) if self.tb_writer: self.tb_writer.add_text(&quot;system/total_time&quot;, str(training_time)) # 添加回调 trainer.add_callback(ResourceMonitorCallback(tb_writer=tb_writer)) # === 内存清理函数 === def clear_memory(): &quot;&quot;&quot;清理内存和GPU缓存&quot;&quot;&quot; gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.ipc_collect() logger.info(&quot;内存清理完成&quot;) # === 启动训练 === try: logger.info(&quot;开始训练...&quot;) # 分阶段训练以减少内存峰值 num_samples = len(tokenized_dataset) chunk_size = 1000 # 每次处理1000个样本 for i in range(0, num_samples, chunk_size): end_idx = min(i + chunk_size, num_samples) logger.info(f&quot;训练样本 {i} 到 {end_idx-1} / {num_samples}&quot;) # 创建子数据集 chunk_dataset = tokenized_dataset.select(range(i, end_idx)) # 更新训练器 trainer.train_dataset = chunk_dataset # 训练当前块 trainer.train() # 清理内存 clear_memory() # 保存训练指标 metrics = trainer.evaluate() trainer.log_metrics(&quot;train&quot;, metrics) trainer.save_metrics(&quot;train&quot;, metrics) # 保存最佳模型 trainer.save_model(OUTPUT_DIR) tokenizer.save_pretrained(OUTPUT_DIR) logger.info(f&quot;训练完成! 模型保存: {OUTPUT_DIR}&quot;) # 记录最终指标到TensorBoard if tb_writer: for metric_name, metric_value in metrics.items(): tb_writer.add_scalar(f&quot;final/{metric_name}&quot;, metric_value) tb_writer.close() except Exception as e: logger.error(f&quot;训练出错: {e}&quot;) import traceback logger.error(traceback.format_exc()) # 尝试更小批量训练 logger.info(&quot;\n尝试更小批量训练...&quot;) small_dataset = tokenized_dataset.select(range(50)) trainer.train_dataset = small_dataset trainer.train() # 保存模型 trainer.save_model(f&quot;{OUTPUT_DIR}_small&quot;) tokenizer.save_pretrained(f&quot;{OUTPUT_DIR}_small&quot;) logger.info(f&quot;小批量训练完成! 模型保存: {OUTPUT_DIR}_small&quot;) # 记录错误到TensorBoard if tb_writer: tb_writer.add_text(&quot;error/exception&quot;, traceback.format_exc()) # 清理内存 clear_memory() # === 训练后验证 === def validate_final_model(): &quot;&quot;&quot;验证训练后的模型&quot;&quot;&quot; logger.info(&quot;\n=== 训练后验证 ===&quot;) # 加载保存的模型 from peft import PeftModel # 仅加载基础模型配置 base_model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, device_map=DEVICE_MAP, quantization_config=quant_config, torch_dtype=torch.bfloat16, trust_remote_code=True, load_in_4bit=True ) # 加载LoRA适配器 peft_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR) # 合并LoRA权重 merged_model = peft_model.merge_and_unload() # 测试生成 prompt = &quot;中国的首都是&quot; inputs = tokenizer(prompt, return_tensors=&quot;pt&quot;).to(merged_model.device) outputs = merged_model.generate( **inputs, max_new_tokens=50, # 减少生成长度 temperature=0.7, top_p=0.9, repetition_penalty=1.2, do_sample=True ) generated = tokenizer.decode(outputs[0], skip_special_tokens=True) logger.info(f&quot;提示: {prompt}&quot;) logger.info(f&quot;生成结果: {generated}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(&quot;validation/sample&quot;, f&quot;提示: {prompt}\n生成: {generated}&quot;) # 更全面的测试 test_prompts = [ &quot;人工智能的未来发展趋势是&quot;, &quot;如何学习深度学习?&quot;, &quot;写一个关于太空探索的短故事:&quot; ] for i, test_prompt in enumerate(test_prompts): inputs = tokenizer(test_prompt, return_tensors=&quot;pt&quot;).to(merged_model.device) outputs = merged_model.generate( **inputs, max_new_tokens=100, # 减少生成长度 temperature=0.7, top_p=0.9, repetition_penalty=1.2, do_sample=True ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) logger.info(f&quot;\n提示: {test_prompt}\n生成: {generated_text}\n{&#39;=&#39;*50}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(f&quot;validation/test_{i}&quot;, f&quot;提示: {test_prompt}\n生成: {generated_text}&quot;) logger.info(&quot;验证完成&quot;) # 执行验证 validate_final_model() # 关闭TensorBoard写入器 if tb_writer: tb_writer.close() logger.info(&quot;TensorBoard日志已关闭&quot;) (.venv) (base) vipuser@ubuntu22:~/ai_writer_project_final_with_fixed_output_ui$ python train_lora.py 2025-07-13 22:10:19,098 - INFO - TensorBoard日志目录: ./yi6b-lora-optimized/logs/20250713-221019 2025-07-13 22:10:19,099 - INFO - 加载预训练模型... Traceback (most recent call last): File &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/train_lora.py&quot;, line 77, in &lt;module&gt; model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/.venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py&quot;, line 566, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/.venv/lib/python3.11/site-packages/transformers/modeling_utils.py&quot;, line 3590, in from_pretrained config = cls._autoset_attn_implementation( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/.venv/lib/python3.11/site-packages/transformers/modeling_utils.py&quot;, line 1389, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/.venv/lib/python3.11/site-packages/transformers/modeling_utils.py&quot;, line 1480, in _check_and_enable_flash_attn_2 raise ImportError(f&quot;{preface} the package flash_attn seems to be not installed. {install_message}&quot;) ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
07-14
import os import torch import transformers from transformers import ( AutoModelForCausalLM, AutoTokenizer, TrainingArguments, DataCollatorForLanguageModeling, BitsAndBytesConfig, Trainer ) from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training from datasets import load_dataset import logging import psutil import gc from datetime import datetime # === 配置区域 === MODEL_NAME = &quot;/home/vipuser/ai_writer_project_final_with_fixed_output_ui/models/Yi-6B&quot; DATASET_PATH = &quot;./data/train_lora_formatted.jsonl&quot; OUTPUT_DIR = &quot;./yi6b-lora-optimized&quot; DEVICE_MAP = &quot;auto&quot; # 使用自动设备映射 # 确保输出目录存在 os.makedirs(OUTPUT_DIR, exist_ok=True) # === 内存优化配置 === os.environ[&quot;PYTORCH_CUDA_ALLOC_CONF&quot;] = &quot;expandable_segments:True&quot; # 减少内存碎片 torch.backends.cuda.cufft_plan_cache.clear() # 清理CUDA缓存 # === 增强的日志系统 === def setup_logging(output_dir): &quot;&quot;&quot;配置日志系统,支持文件和TensorBoard&quot;&quot;&quot; logger = logging.getLogger(__name__) logger.setLevel(logging.INFO) # 文件日志处理器 file_handler = logging.FileHandler(os.path.join(output_dir, &quot;training.log&quot;)) file_handler.setFormatter(logging.Formatter(&#39;%(asctime)s - %(name)s - %(levelname)s - %(message)s&#39;)) logger.addHandler(file_handler) # 控制台日志处理器 console_handler = logging.StreamHandler() console_handler.setFormatter(logging.Formatter(&#39;%(asctime)s - %(levelname)s - %(message)s&#39;)) logger.addHandler(console_handler) # TensorBoard日志目录 tensorboard_log_dir = os.path.join(output_dir, &quot;logs&quot;, datetime.now().strftime(&quot;%Y%m%d-%H%M%S&quot;)) os.makedirs(tensorboard_log_dir, exist_ok=True) # 安装TensorBoard回调 tb_writer = None try: from torch.utils.tensorboard import SummaryWriter tb_writer = SummaryWriter(log_dir=tensorboard_log_dir) logger.info(f&quot;TensorBoard日志目录: {tensorboard_log_dir}&quot;) except ImportError: logger.warning(&quot;TensorBoard未安装,可视化功能不可用&quot;) return logger, tb_writer logger, tb_writer = setup_logging(OUTPUT_DIR) # === 量化配置 - 使用更高效的配置 === quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=&quot;nf4&quot;, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) # === 加载模型 === logger.info(&quot;加载预训练模型...&quot;) model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, device_map=DEVICE_MAP, quantization_config=quant_config, torch_dtype=torch.bfloat16, trust_remote_code=True, attn_implementation=&quot;flash_attention_2&quot; # 使用FlashAttention优化内存 ) # === 分词器处理 === logger.info(&quot;加载分词器...&quot;) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) tokenizer.padding_side = &quot;right&quot; if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.pad_token_id = tokenizer.eos_token_id # === 准备模型训练 === model = prepare_model_for_kbit_training( model, use_gradient_checkpointing=True # 启用梯度检查点以节省内存 ) # === LoRA 配置 - 优化内存使用 === logger.info(&quot;配置LoRA...&quot;) lora_config = LoraConfig( r=64, # 降低rank以减少内存使用 lora_alpha=32, # 降低alpha值 target_modules=[&quot;q_proj&quot;, &quot;v_proj&quot;], # 减少目标模块 lora_dropout=0.05, bias=&quot;none&quot;, task_type=&quot;CAUSAL_LM&quot; ) model = get_peft_model(model, lora_config) # 记录可训练参数 trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) total_params = sum(p.numel() for p in model.parameters()) logger.info(f&quot;可训练参数: {trainable_params:,} / 总参数: {total_params:,} ({trainable_params / total_params:.2%})&quot;) # === 加载并预处理数据集 === logger.info(&quot;加载和预处理数据集...&quot;) dataset = load_dataset(&quot;json&quot;, data_files=DATASET_PATH, split=&quot;train&quot;) # 文本过滤函数 def is_valid_text(example): text = example.get(&quot;text&quot;, &quot;&quot;) return text is not None and len(text.strip()) &gt; 200 # 增加最小长度要求 dataset = dataset.filter(is_valid_text) logger.info(f&quot;过滤后数据集大小: {len(dataset)} 条&quot;) # 动态填充的分词函数 - 节省内存 def tokenize_function(examples): tokenized = tokenizer( examples[&quot;text&quot;], padding=True, # 使用动态填充 truncation=True, max_length=1024, # 降低上下文长度以减少内存使用 ) # 创建 labels - 因果语言建模需要 labels = input_ids tokenized[&quot;labels&quot;] = tokenized[&quot;input_ids&quot;].copy() return tokenized tokenized_dataset = dataset.map( tokenize_function, batched=True, remove_columns=[&quot;text&quot;], batch_size=64, # 降低批处理大小以减少内存峰值 num_proc=4, # 减少进程数以降低内存开销 ) # === 数据整理器 === data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=False # 因果语言建模 ) # === 训练参数 - 优化内存使用 === report_to_list = [&quot;tensorboard&quot;] if tb_writer else [] training_args = TrainingArguments( output_dir=OUTPUT_DIR, per_device_train_batch_size=4, # 大幅降低批次大小 gradient_accumulation_steps=4, # 增加梯度累积步数以保持有效批次大小 learning_rate=2e-5, num_train_epochs=3, logging_steps=50, save_strategy=&quot;steps&quot;, save_steps=500, bf16=True, optim=&quot;paged_adamw_32bit&quot;, report_to=report_to_list, warmup_ratio=0.05, gradient_checkpointing=True, # 启用梯度检查点 fp16=False, max_grad_norm=0.3, # 降低梯度裁剪阈值 remove_unused_columns=True, # 移除未使用的列以节省内存 dataloader_num_workers=4, # 减少数据加载工作线程 evaluation_strategy=&quot;steps&quot;, eval_steps=500, save_total_limit=2, # 减少保存的检查点数量 logging_dir=os.path.join(OUTPUT_DIR, &quot;logs&quot;), load_best_model_at_end=True, ddp_find_unused_parameters=False, logging_first_step=True, group_by_length=True, lr_scheduler_type=&quot;cosine&quot;, weight_decay=0.01, ) # === GPU监控工具 === def monitor_gpu(): &quot;&quot;&quot;监控GPU使用情况&quot;&quot;&quot; if torch.cuda.is_available(): device = torch.device(&quot;cuda&quot;) mem_alloc = torch.cuda.memory_allocated(device) / 1024 ** 3 mem_reserved = torch.cuda.memory_reserved(device) / 1024 ** 3 mem_total = torch.cuda.get_device_properties(device).total_memory / 1024 ** 3 return { &quot;allocated&quot;: f&quot;{mem_alloc:.2f} GB&quot;, &quot;reserved&quot;: f&quot;{mem_reserved:.2f} GB&quot;, &quot;total&quot;: f&quot;{mem_total:.2f} GB&quot;, &quot;utilization&quot;: f&quot;{mem_alloc / mem_total * 100:.1f}%&quot; } return {} # === 创建训练器 === eval_dataset = None if len(tokenized_dataset) &gt; 100: eval_dataset = tokenized_dataset.select(range(100)) trainer = Trainer( model=model, tokenizer=tokenizer, args=training_args, train_dataset=tokenized_dataset, eval_dataset=eval_dataset, data_collator=data_collator, ) # === 训练前验证 === def validate_data_and_model(): &quot;&quot;&quot;验证数据和模型是否准备好训练&quot;&quot;&quot; logger.info(&quot;\n=== 训练前验证 ===&quot;) # 检查样本格式 sample = tokenized_dataset[0] logger.info(f&quot;样本键: {list(sample.keys())}&quot;) logger.info(f&quot;input_ids 长度: {len(sample[&#39;input_ids&#39;])}&quot;) # 创建单个样本测试批次 test_batch = data_collator([sample]) # 移动数据到设备 test_batch = {k: v.to(model.device) for k, v in test_batch.items()} # 前向传播测试 model.train() outputs = model(**test_batch) loss_value = outputs.loss.item() logger.info(f&quot;测试批次损失: {loss_value:.4f}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_scalar(&quot;debug/test_loss&quot;, loss_value, 0) # 反向传播测试 outputs.loss.backward() logger.info(&quot;反向传播成功!&quot;) # 重置梯度 model.zero_grad() logger.info(&quot;验证完成,准备开始训练\n&quot;) # 记录初始GPU使用情况 gpu_status = monitor_gpu() logger.info(f&quot;初始GPU状态: {gpu_status}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(&quot;system/initial_gpu&quot;, str(gpu_status), 0) validate_data_and_model() # === 自定义回调 - 监控资源使用 === class ResourceMonitorCallback(transformers.TrainerCallback): def __init__(self, tb_writer=None): self.tb_writer = tb_writer self.start_time = datetime.now() self.last_log_time = datetime.now() def on_step_end(self, args, state, control, **kwargs): current_time = datetime.now() time_diff = (current_time - self.last_log_time).total_seconds() # 每分钟记录一次资源使用情况 if time_diff &gt; 60: self.last_log_time = current_time # GPU监控 gpu_status = monitor_gpu() logger.info(f&quot;Step {state.global_step} - GPU状态: {gpu_status}&quot;) # CPU和内存监控 cpu_percent = psutil.cpu_percent() mem = psutil.virtual_memory() logger.info( f&quot;CPU使用率: {cpu_percent}%, 内存使用: {mem.used / 1024 ** 3:.2f}GB/{mem.total / 1024 ** 3:.2f}GB&quot;) # 记录到TensorBoard if self.tb_writer: # GPU显存使用 if torch.cuda.is_available(): device = torch.device(&quot;cuda&quot;) mem_alloc = torch.cuda.memory_allocated(device) / 1024 ** 3 self.tb_writer.add_scalar(&quot;system/gpu_mem&quot;, mem_alloc, state.global_step) # CPU使用率 self.tb_writer.add_scalar(&quot;system/cpu_usage&quot;, cpu_percent, state.global_step) # 系统内存使用 self.tb_writer.add_scalar(&quot;system/ram_usage&quot;, mem.used / 1024 ** 3, state.global_step) def on_log(self, args, state, control, logs=None, **kwargs): &quot;&quot;&quot;记录训练指标到TensorBoard&quot;&quot;&quot; if self.tb_writer and logs is not None: for metric_name, metric_value in logs.items(): if &quot;loss&quot; in metric_name or &quot;lr&quot; in metric_name or &quot;grad_norm&quot; in metric_name: self.tb_writer.add_scalar(f&quot;train/{metric_name}&quot;, metric_value, state.global_step) def on_train_end(self, args, state, control, **kwargs): &quot;&quot;&quot;训练结束时记录总时间&quot;&quot;&quot; training_time = datetime.now() - self.start_time logger.info(f&quot;训练总时间: {training_time}&quot;) if self.tb_writer: self.tb_writer.add_text(&quot;system/total_time&quot;, str(training_time)) # 添加回调 trainer.add_callback(ResourceMonitorCallback(tb_writer=tb_writer)) # === 内存清理函数 === def clear_memory(): &quot;&quot;&quot;清理内存和GPU缓存&quot;&quot;&quot; gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.ipc_collect() logger.info(&quot;内存清理完成&quot;) # === 启动训练 === try: logger.info(&quot;开始训练...&quot;) # 分阶段训练以减少内存峰值 num_samples = len(tokenized_dataset) chunk_size = 1000 # 每次处理1000个样本 for i in range(0, num_samples, chunk_size): end_idx = min(i + chunk_size, num_samples) logger.info(f&quot;训练样本 {i} 到 {end_idx - 1} / {num_samples}&quot;) # 创建子数据集 chunk_dataset = tokenized_dataset.select(range(i, end_idx)) # 更新训练器 trainer.train_dataset = chunk_dataset # 训练当前块 trainer.train() # 清理内存 clear_memory() # 保存训练指标 metrics = trainer.evaluate() trainer.log_metrics(&quot;train&quot;, metrics) trainer.save_metrics(&quot;train&quot;, metrics) # 保存最佳模型 trainer.save_model(OUTPUT_DIR) tokenizer.save_pretrained(OUTPUT_DIR) logger.info(f&quot;训练完成! 模型保存: {OUTPUT_DIR}&quot;) # 记录最终指标到TensorBoard if tb_writer: for metric_name, metric_value in metrics.items(): tb_writer.add_scalar(f&quot;final/{metric_name}&quot;, metric_value) tb_writer.close() except Exception as e: logger.error(f&quot;训练出错: {e}&quot;) import traceback logger.error(traceback.format_exc()) # 尝试更小批量训练 logger.info(&quot;\n尝试更小批量训练...&quot;) small_dataset = tokenized_dataset.select(range(50)) trainer.train_dataset = small_dataset trainer.train() # 保存模型 trainer.save_model(f&quot;{OUTPUT_DIR}_small&quot;) tokenizer.save_pretrained(f&quot;{OUTPUT_DIR}_small&quot;) logger.info(f&quot;小批量训练完成! 模型保存: {OUTPUT_DIR}_small&quot;) # 记录错误到TensorBoard if tb_writer: tb_writer.add_text(&quot;error/exception&quot;, traceback.format_exc()) # 清理内存 clear_memory() # === 训练后验证 === def validate_final_model(): &quot;&quot;&quot;验证训练后的模型&quot;&quot;&quot; logger.info(&quot;\n=== 训练后验证 ===&quot;) # 加载保存的模型 from peft import PeftModel # 仅加载基础模型配置 base_model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, device_map=DEVICE_MAP, quantization_config=quant_config, torch_dtype=torch.bfloat16, trust_remote_code=True, load_in_4bit=True ) # 加载LoRA适配器 peft_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR) # 不再合并LoRA权重,直接使用 peft_model 推理 peft_model.eval() # 测试生成 prompt = &quot;中国的首都是&quot; inputs = tokenizer(prompt, return_tensors=&quot;pt&quot;).to(peft_model.device) outputs = peft_model.generate( **inputs, max_new_tokens=50, # 减少生成长度 temperature=0.7, top_p=0.9, repetition_penalty=1.2, do_sample=True ) generated = tokenizer.decode(outputs[0], skip_special_tokens=True) logger.info(f&quot;提示: {prompt}&quot;) logger.info(f&quot;生成结果: {generated}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(&quot;validation/sample&quot;, f&quot;提示: {prompt}\n生成: {generated}&quot;) # 更全面的测试 test_prompts = [ &quot;人工智能的未来发展趋势是&quot;, &quot;如何学习深度学习?&quot;, &quot;写一个关于太空探索的短故事:&quot; ] for i, test_prompt in enumerate(test_prompts): inputs = tokenizer(test_prompt, return_tensors=&quot;pt&quot;).to(peft_model.device) outputs = peft_model.generate( **inputs, max_new_tokens=100, # 减少生成长度 temperature=0.7, top_p=0.9, repetition_penalty=1.2, do_sample=True ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) logger.info(f&quot;\n提示: {test_prompt}\n生成: {generated_text}\n{&#39;=&#39; * 50}&quot;) # 记录到TensorBoard if tb_writer: tb_writer.add_text(f&quot;validation/test_{i}&quot;, f&quot;提示: {test_prompt}\n生成: {generated_text}&quot;) logger.info(&quot;验证完成&quot;) # 执行验证 validate_final_model() # 关闭TensorBoard写入器 if tb_writer: tb_writer.close() logger.info(&quot;TensorBoard日志已关闭&quot;)
最新发布
07-15
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值