使用PromptFlow在Azure上进行流程运行管理的完整指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00719/article/details/148415887

使用PromptFlow在Azure上进行流程运行管理的完整指南

promptflow Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring. 项目地址: https://gitcode.com/gh_mirrors/pr/promptflow

前言

在现代AI工作流管理中，PromptFlow作为微软推出的流程编排工具，为开发者提供了强大的工作流管理和执行能力。本文将深入探讨如何在Azure云环境中使用PromptFlow进行高效的流程运行管理，包括远程数据引用、运行间依赖管理以及连接覆盖等高级功能。

准备工作

在开始之前，请确保您已具备以下条件：

有效的Azure账户和订阅
已配置好的Azure ML工作区
Python开发环境（建议3.9+版本）
已安装PromptFlow SDK

1. 连接Azure ML工作区

1.1 导入必要的库

首先需要导入Azure身份验证相关库和PromptFlow客户端：

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml.entities import Data
from azure.core.exceptions import ResourceNotFoundError

from promptflow.azure import PFClient
from promptflow.entities import Run

1.2 配置认证凭证

Azure SDK提供了多种认证方式，推荐使用DefaultAzureCredential，它会自动尝试多种认证方式：

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

1.3 连接到工作区

使用配置文件和凭证获取工作区句柄：

pf = PFClient.from_config(credential=credential)

1.4 创建必要连接

PromptFlow中的连接用于安全存储和管理API密钥等敏感信息。对于使用Azure OpenAI服务的流程，需要预先配置好相关连接。

2. 使用远程数据创建运行

2.1 创建或更新远程数据

将本地数据上传到工作区作为版本化数据资产：

data_name, data_version = "flow_run_test_data", "1"

try:
    data = pf.ml_client.data.get(name=data_name, version=data_version)
except ResourceNotFoundError:
    data = Data(
        name=data_name,
        version=data_version,
        path="../../flows/standard/web-classification/data.jsonl",
        type="uri_file",
    )
    data = pf.ml_client.data.create_or_update(data)

2.2 准备远程数据ID

获取数据资产的完整标识符：

data_id = f"azureml:{data.name}:{data.version}"

2.3 创建带远程数据的运行

创建运行时可指定计算资源配置：

run = Run(
    flow="../../flows/standard/web-classification",
    data=data_id,
    # 可选资源定制
    # resources={
    #     "instance_type": "STANDARD_DS11_V2",
    #     "compute": "my_compute_instance"
    # }
)

base_run = pf.runs.create_or_update(run=run)

2.4 监控运行状态

实时查看运行日志：

pf.runs.stream(base_run)

3. 引用已有运行的输入创建新运行

PromptFlow支持运行间的输入输出引用，实现工作流串联：

run = Run(
    flow="../../flows/standard/web-classification",
    run=run,  # 引用已有运行
    column_mapping={
        "url": "${run.inputs.url}",
        "answer": "${run.inputs.answer}",
        "evidence": "${run.inputs.evidence}",
    },
)

base_run = pf.runs.create_or_update(run=run)
pf.runs.stream(base_run)

4. 使用连接覆盖创建运行

在不修改原始流程定义的情况下，运行时动态替换连接：

run = Run(
    flow="../../flows/standard/web-classification",
    data="../../flows/standard/web-classification/data.jsonl",
    connections={
        "classify_with_llm": {"connection": "azure_open_ai_connection"},
        "summarize_text_content": {"connection": "azure_open_ai_connection"},
    },
)

base_run = pf.runs.create_or_update(run=run)
pf.runs.stream(base_run)