深入解析：Amazon Bedrock 上 Claude 3 Haiku 的微调测试报告

本文链接：https://blog.youkuaiyun.com/rralucard123/article/details/140997678

前言

2024年7月10日，Anthropic Claude 3 Haiku 的微调功能在 Amazon Bedrock 上开放预览。本篇文章将分享 Claude 3 Haiku 的微调使用步骤及微调后模型的评估结果。

LLM 细调的优势

通过细调，LLM可以获得特定领域的知识或新知识。这样，与RAG（Retrieval-Augmented Generation）相比，可以避免在提示中插入参考信息，从而最小化输入令牌，结果可以降低API执行的成本和延迟。此外，由于不需要外部存储和检索参考信息，因此可以减少外部数据库的管理成本和缩短检索所需的时间。（另一方面，将Fine-Tuning与RAG结合使用可能会进一步提高精度。）

使用步骤和验证内容

申请使用
创建数据集
将数据集上传到S3
执行细调作业
购买预配置吞吐量
运行细调后的模型
评估模型

申请使用

截至2024年7月27日，在Amazon Bedrock上对Claude3 Haiku进行fine-tuning需要向AWS支持提交申请。创建支持票时，请选择“Bedrock”作为服务，并选择“Models”作为类别。

数据集的创建

本次验证的目的是让Claude3 Haiku获得关于Amazon Bedrock的域知识，为此我们准备了一个专门用于fine-tuning的数据集。数据集由AWS官方文档中的问题和答案对组成。接下来，将介绍在进行本次验证时考虑的事项、验证策略以及数据集的准备和创建方法。

考虑使用的数据集

作为公开的典型日语数据集，可以提到databricks-dolly-15k-ja和databricks-dolly-15k-ja-gozaru等。databricks-dolly-15k-ja-gozaru是一个独特的数据集，旨在使LLM的回答末尾采用“ござる”这一古风口吻。然而，考虑到Claude3 Haiku的性能，即使不进行fine-tuning，通过系统提示也能达到类似效果。因此，使用这个数据集进行fine-tuning可能难以感受到其效果。

因此，本次验证的目的不是让Claude3 Haiku学习输出格式，而是获得域知识。具体而言，我们准备了一个数据集，以让Claude3 Haiku学习它在预训练数据中可能未包含的“Amazon Bedrock”的知识。

此外，AWS官方博客建议为了优化Claude3 Haiku的fine-tuning表现，首先应使用小规模但高质量的数据集（50-100条）进行尝试。根据这一建议，本次验证也采用了不足100条的数据集进行fine-tuning。

Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality | AWS Machine Learning Blog

databricks-dolly-15k 是 Databricks 公开的包含15,000个指示-响应对的数据集。databricks-dolly-15k-ja-gozaru 是将 databricks-dolly-15k 翻译成日语后的版本 databricks-dolly-15k-ja 中的响应部分末尾替换为“ござる”，这样处理的数据集常用于对LLM进行fine-tuning的验证。

利用的训练数据

在本次验证中，我们使用了 AWS Machine Learning Blog 文章 “Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker” 中使用的 Amazon Bedrock FAQs 数据集作为 fine-tuning 的训练数据。该数据集已在以下仓库中公开。

https://github.com/aws-samples/fine-tune-embedding-models-on-sagemaker/blob/main/sentence-transformer/multiple-negatives-ranking-loss/training.json

本数据集基于Amazon Bedrock FAQs创建，以 JSON 格式存储了共 85 个问题和答案对。以下是数据集的部分内容。在 JSON 中，键“sentence1”表示问题，“sentence2”表示答案。

[
  {
    "sentence1": "What is Amazon Bedrock and its key features?",
    "sentence2": "Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models along with a broad set of capabilities for building generative AI applications, simplifying development with security, privacy, and responsible AI features."
  },
  {
    "sentence1": "How can I get started with using Amazon Bedrock?",
    "sentence2": "With the serverless experience of Amazon Bedrock, you can quickly get started by navigating to the service in the AWS console and trying out the foundation models in the playground or creating and testing an agent."
  }
]

由于本数据集仅包含85个问题和答案对，并不算多，因此决定不将这些数据分为训练数据和验证数据，而是另外创建验证数据。

选择使用上述数据集进行本次验证的原因是数据质量高且在许可方面没有问题。此外，选择这个数据集是因为它涉及到 Amazon Bedrock 这一特定领域的知识，这些知识预计不包含在 Claude3 Haiku 的预训练数据中，因此作为学习材料非常合适。

验证数据做成

为了创建验证数据，我们基于以下 AWS 官方文档，在 Claude3 Opus 中生成了验证数据。在此过程中，我们将相关文档转换成 PDF 格式，并利用 Amazon Bedrock 的 Converse API 中的 Document chat 和 Json mode，这使得我们能够相对容易地以 JSON 格式创建高质量的 QA 形式的数据集。

What is Amazon Bedrock? - Amazon Bedrock

我使用以下代码生成了32对问题和答案。

以下是为了设置工具使用的代码 tool_config.py 和为了创建验证数据的代码 create_val_dataset.py。在 tool_config.py 中，设置生成一个包含“question”和“answer”键的 JSON 数组形式，并指示生成32对。注意，由于是在 Json mode 中使用，所以没有定义工具本身。

class ToolConfig:
    tool_name = "QA_dataset_generator"
    no_of_dataset = 32

    description = f"""
    与えられるドキュメントに基づいて、LLMのFine-Tuning用のValidationデータセットを作成します。
    具体的には、ドキュメントの内容を利用し、Amazon Bedrockに関する質問文と回答文のペアを生成します。

    <example>
    question: What is Amazon Bedrock and its key features?
    answer: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models along with a broad set of capabilities for building generative AI applications, simplifying development with security, privacy, and responsible AI features.
    </example>

    <rules>
    - 必ず{no_of_dataset}個の質問文と回答文のペアを生成すること。
    - 英語で回答すること。
    - JSON形式で回答すること。
    - Amazon Bedrockについて、多様な質問と回答を作成すること。
    </rules>
    """

    tool_definition = {
        "toolSpec": {
            "name": tool_name,
            "description": description,
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "description": f"Validationデータ用の質問文と回答文のセット。必ず{no_of_dataset}個生成すること。",
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "question": {
                                        "type": "string",
                                        "description": "Validationデータ用の質問文。",
                                    },
                                    "answer": {
                                        "type": "string",
                                        "description": "Validationデータ用の回答文。",
                                    },
                                },
                                "required": ["question", "answer"],
                            },
                        },
                    },
                    "required": ["dataset"],
                }
            },
        }
    }

在上述代码中，我们将32对 QA 形式的验证数据保存在外部的 JSON 文件中。以下是实际生成的验证数据的一部分，可以确认数据是按照提示指示的 QA 形式生成的。

[
  {
    "question": "What is Amazon Bedrock?",
    "answer": "Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models along with capabilities for building generative AI applications, simplifying development with security, privacy, and responsible AI features."
  },
  {
    "question": "What can you do with Amazon Bedrock?",
    "answer": "With Amazon Bedrock, you can experiment with and evaluate top foundation models for your use cases, privately customize them with your own data using techniques like fine-tuning and retrieva