LLM合成数据notes

原创已于 2025-07-14 16:38:12 修改 · 80 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#python

于 2025-07-14 11:49:46 首次发布

MoDS

使用reward-model-deberta-v3-large-v2奖励模型对QA对打分筛选高质量QA，再使用bert计算句子相似度使用kCenterGreedy选取多样性指令，再进行微调，微调后对LLM结果进行筛选低分，再补充表现不佳的数据进行SFT。
ps：看任务，我自己用reward-model-deberta-v3-large-v2测的分数感觉不行，做不了筛选。

magpie
由于LLM自回归训练所以能够自动生成用户输入，添加左侧模版，输入LLM自动生成指令，再将生成的指令输入LLM（本地/调用api）获得输出。使用guard_model_path=“meta-llama/Meta-Llama-Guard-2-8B”，reward_model_path=“sfairXC/FsfairX-LLaMA3-RM-v0.1"等调用api对生成数据进行安全性、奖励、质量、难度等打分，再筛选符合要求的数据。使用SentenceTransformer（all-mpnet-base-v2）构建Faiss索引，然后对每个文本搜索最近的k个邻居，计算余弦距离来判断相似度进行去重。
使用model = ArmoRMPipeline(“RLHFlow/ArmoRM-Llama3-8B-v0.1”, trust_remote_code=True, device_map=f"cuda:{args.device}”)对指令的回复进行打分。

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

多轮对话

"mt_append_template": "<|start_header_id|>user<|end_header_id|>\n\n",
mt_system_prompt = "You are a helpful Al assistant. The user will engage in a multi-round conversation with you, asking initial questions and following up with additional related questions. Your goal is to provide thorough,relevant and insightful responses to help the user with their queries."
print(f"Generating responses for turn {turn}...")
prompts = []
for item in batch:
    if not args.tokenizer_template:
        conv = get_conversation_template(MODEL_NAME)
        if turn == 2:
            conv.append_message(conv.roles[0], item[f'instruction'])
            conv.append_message(conv.roles[1], item[f'response'])
        else:
            conv.append_message(conv.roles[0], item[f'instruction'])
            conv.append_message(conv.roles[1], item[f'response'])
            for i in range(2, turn):
                conv.append_message(conv.roles[0], item[f'instruction_{i}'])
                conv.append_message(conv.roles[1], item[f'response_{i}'])
        conv.append_message(conv.roles[0], item[f'instruction_{turn}'])
        conv.append_message(conv.roles[1], None)
        template = conv.get_prompt()
    else:
        chat = []
        if turn == 2:
            chat.append({"role": "user", "content": item[f'instruction']})
            chat.append({"role": "assistant", "content": item[f'response']})
        else:
            chat.append({"role": "user", "content": item[f'instruction']})
            chat.append({"role": "assistant", "content": item[f'response']})
            for i in range(2, turn):
                chat.append({"role": "user", "content": item[f'instruction_{i}']})
                chat.append({"role": "assistant", "content": item[f'response_{i}']})
        chat.append({"role": "user", "content": item[f'instruction_{turn}']})
        template = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    prompts.append(template)
outputs = llm.generate(prompts, response_params)
for i, item in enumerate(batch):
    item[f'response_{turn}'] = outputs[i].outputs[0].text.strip()