使用dotnet/docs项目评估AI模型响应质量教程
docs This repository contains .NET Documentation. 项目地址: https://gitcode.com/gh_mirrors/docs2/docs
前言
在AI应用开发中,评估语言模型的响应质量是确保应用可靠性的关键环节。本教程将指导您如何使用.NET生态中的Microsoft.Extensions.AI.Evaluation库来构建一个完整的AI响应质量评估系统。
准备工作
环境要求
- .NET 8或更高版本
- 可选:Visual Studio Code编辑器
Azure OpenAI服务配置
- 在Azure门户创建OpenAI服务资源
- 部署模型时选择
gpt-4o
模型
创建测试项目
初始化项目
dotnet new mstest -o TestAIWithReporting
cd TestAIWithReporting
添加必要NuGet包
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Abstractions
dotnet add package Microsoft.Extensions.AI.Evaluation
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
配置安全凭据
dotnet user-secrets init
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-Azure-OpenAI-endpoint>
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-ID>
核心代码实现
1. 配置AI聊天客户端
private static IChatClient GetAzureOpenAIChatConfiguration()
{
var config = new ConfigurationBuilder()
.AddUserSecrets<MyTests>()
.Build();
var endpoint = config["AZURE_OPENAI_ENDPOINT"]!;
var modelName = config["AZURE_OPENAI_GPT_NAME"]!;
var tenantId = config["AZURE_TENANT_ID"];
var credential = string.IsNullOrEmpty(tenantId)
? new DefaultAzureCredential()
: new DefaultAzureCredential(new DefaultAzureCredentialOptions
{ TenantId = tenantId });
return new AzureOpenAIChatClient(
new Uri(endpoint),
credential,
modelName);
}
2. 设置报告功能
private static readonly ReportingConfiguration s_defaultReportingConfiguration = new()
{
ChatConfiguration = GetAzureOpenAIChatConfiguration(),
ResultStore = new DiskBasedResultStore(@"C:\TestReports\results"),
ResponseCache = new DiskBasedResponseCache(@"C:\TestReports\cache"),
ExecutionName = DateTime.UtcNow.ToString("yyyyMMdd-HHmmss")
};
3. 自定义评估器实现
创建一个不依赖AI的简单评估器,计算响应中的单词数量:
public class WordCountEvaluator : IEvaluator
{
public string Name => "WordCount";
public Task<EvaluationResult> EvaluateAsync(EvaluationContext context, CancellationToken cancellationToken = default)
{
var wordCount = context.Response.Split(' ', StringSplitOptions.RemoveEmptyEntries).Length;
var metric = new NumericMetric("WordCount", wordCount)
{
Interpretation = NumericInterpretation.Create(
goodMin: 6,
goodMax: 100,
idealMin: 10,
idealMax: 50)
};
return Task.FromResult(new EvaluationResult(Name, [metric]));
}
}
4. 获取评估器集合
private static IReadOnlyList<IEvaluator> GetEvaluators()
{
return
[
new WordCountEvaluator(),
new CoherenceEvaluator(),
new GroundednessEvaluator(),
new RelevanceEvaluator()
];
}
5. 获取AI模型响应
private static async Task<ChatResponse> GetAstronomyConversationAsync(
IChatClient chatClient, string question)
{
var messages = new ChatMessage[]
{
new SystemChatMessage("You are a helpful AI assistant."),
new UserChatMessage(question)
};
var options = new ChatOptions
{
Temperature = 0.7f,
MaxTokens = 800
};
return await chatClient.CompleteChatAsync(messages, options);
}
6. 验证评估结果
private static void ValidateEvaluationResult(EvaluationResult result)
{
Assert.IsNotNull(result);
Assert.IsTrue(result.Metrics.Count > 0);
foreach (var metric in result.Metrics)
{
if (metric is NumericMetric numericMetric)
{
Assert.IsTrue(numericMetric.Value >= 0);
Assert.IsNotNull(numericMetric.Interpretation);
}
}
}
7. 完整的测试方法
[TestMethod]
public async Task TestAstronomyResponseQuality()
{
var evaluators = GetEvaluators();
await using var scenarioRun = await s_defaultReportingConfiguration
.CreateScenarioRunAsync(
$"{nameof(MyTests)}.{nameof(TestAstronomyResponseQuality)}",
description: "Evaluate the quality of GPT-4's response to an astronomy question",
evaluatorNames: evaluators.Select(e => e.Name).ToList());
var response = await GetAstronomyConversationAsync(
scenarioRun.ChatClient,
"What are the main differences between a comet and an asteroid?");
var evaluationResult = await scenarioRun.EvaluateAsync(evaluators, response);
ValidateEvaluationResult(evaluationResult);
}
运行测试与生成报告
运行测试
dotnet test
安装报告工具
dotnet tool install --local Microsoft.Extensions.AI.Evaluation.Console
生成HTML报告
dotnet tool run aieval report --path C:\TestReports --output report.html
报告解读
生成的HTML报告包含以下关键信息:
- 场景名称和执行时间
- 对话历史记录
- 各项评估指标的得分和解释
- 评估结果的详细原因说明
报告采用分层视图展示结果,便于分析不同测试场景下的表现趋势。
进阶应用建议
- 多响应采样评估:对同一问题获取多个响应进行评估,了解模型响应的稳定性
- 自定义AI评估器:实现更复杂的评估逻辑,如测量系统检测器
- CI/CD集成:将评估结果与持续集成系统结合,监控模型性能变化
- 长期趋势分析:通过历史执行记录分析模型质量变化趋势
通过本教程,您已经掌握了使用.NET生态工具评估AI模型响应质量的核心方法,这将帮助您构建更可靠的AI应用。
docs This repository contains .NET Documentation. 项目地址: https://gitcode.com/gh_mirrors/docs2/docs
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考