使用GPT将文档生成问答对

原创已于 2024-06-27 19:03:52 修改 · 2.3k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#AI

于 2024-03-25 14:43:17 首次发布

AI 专栏收录该内容

89 篇文章

订阅专栏

文章介绍了如何使用OpenAIAPI中的GPT-3.5模型，根据给定的历史人物李世民的相关内容，生成适合作为问答对的数据，包括简洁的问题和详细的回答，以构建问答数据集。

部署运行你感兴趣的模型镜像

根据文档生成问题列表

url = 'https://api.openai.com/v1/chat/completions'

# 替换为您自己的API密钥
api_key = 'sk-xxxxxxxxx'

model = "gpt-3.5-turbo-16k"

prompt1 = '''
#01 你是一个问答对数据集处理专家。
#02 你的任务是根据我给出的内容，生成适合作为问答对数据集的问题。
#03 问题要尽量短，不要太长。
#04 一句话中只能有一个问题。
#05 生成的问题必须宏观、价值，不要生成特别细节的问题。
#06 生成问题示例：
"""
李世民是谁？
介绍一下李世民。
李世民有哪些成就？
"""
#07 以下是我给出的内容：
"""
"""
{{此处替换成你的内容}}
"""
'''

def generate_question(text_content, more=False):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    }
    content = "生成适合作为问答对的问题"
    if more:
        content = "尽可能多生成适合作为问答对的问题"
    prompt = prompt1.replace("{{此处替换成你的内容}}", text_content)
    data = {
        "model": model,
        "messages": [
            {"role": "system", "content": prompt},
            {"role": "user", "content": content}
        ]
    }
    start_time = time.time()
    response = requests.post(url, headers=headers, json=data, verify=False)
    print("耗时", time.time() - start_time)
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]['content']
    else:
        print(f"Error: {response.status_code}")
        print(response.content)
        return None

根据问题列表生成问答对

url = 'https://api.openai.com/v1/chat/completions'

# 替换为您自己的API密钥
api_key = 'sk-xxxxxxxxx'

model = "gpt-3.5-turbo-16k"

prompt2 = '''
#01 你是一个问答对数据集处理专家。
#02 你的任务是根据我的问题和我给出的内容，生成对应的问答对。
#03 答案要全面，多使用我的信息，内容要更丰富。
#04 你必须根据我的问答对示例格式来生成：
"""
{"content": "李世民是谁？", "summary": "李世民，唐朝第二位皇帝，庙号太宗，是中国历史上著名的政治家、战略家、军事家、书法家和诗人。"}
{"content": "李世民的庙号是什么？", "summary": "李世民的庙号是太宗。"}
#05 我的问题如下：
"""
{{此处替换成你上一步生成的问题}}

"""
#06 我的内容如下：
"""
{{此处替换成你的内容}}
"""
'''

def generate_qa(text_content, question_text):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    }
    prompt = prompt2.replace("{{此处替换成你上一步生成的问题}}", question_text).replace("{{此处替换成你的内容}}", text_content)
    data = {
        "model": model,
        "messages": [
            {"role": "system", "content": prompt},
            {"role": "user", "content": "拼成问答对"}
        ]
    }
    start_time = time.time()
    response = requests.post(url, headers=headers, json=data, verify=False)
    print("耗时", time.time() - start_time)
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]['content']
    else:
        print(f"Error: {response.status_code}")
        print(response.content)
        return None

您可能感兴趣的与本文相关的镜像