创建合成用户研究：使用人物提示和自主代理

本文介绍了一种新的综合用户研究方法，通过结合生成式AI和大型语言模型，利用自主代理和数字角色模拟客户和市场研究。详细阐述了角色提示、代理框架的运用，以及如何实施研究、获取可行结果。该方法可克服传统研究的局限，为用户研究带来创新。

原文地址：creating-synthetic-user-research-using-persona-prompting-and-autonomous-agents

利用生成式AI和大型语言模型，通过模拟客户和市场研究进行深入分析

2024 年 3 月 26 日

用户研究是验证针对一组实际用户的任何假设的关键组成部分，以收集有关消费者行为和偏好的有价值的市场研究。传统的用户研究方法虽然非常有价值，但也存在固有的局限性，包括可扩展性、资源强度以及访问不同用户群体的挑战。本文概述了我们如何通过引入一种新的综合用户研究方法来克服这些限制。

在自主代理的推动下，综合用户研究的力量正在成为游戏规则的改变者。通过利用生成式人工智能在模拟研究场景中创建数字客户角色并与之互动，我们可以对消费者行为和偏好获得前所未有的洞察。将生成式人工智能提示技术的力量与自主代理融合在一起。

用户提示——模仿他人

在我们深入研究自主代理之前，让我们先了解角色或个性提示的概念，它旨在捕获挫折、行为、目标、个性特征、人口统计等元素。

研究：调整人物角色的生成方式 — 来源：Stefano De Paoli（arXiv Oct 2023）

我们使用提示来要求语言模型承担一个具有尽可能多的深层上下文的角色。我使用以下提示，该提示可以根据您的需求进行定制，但也包括各种其他人口统计和行为评估/特征。

用户生成器系统提示示例

You are an expert ML researcher and prompt engineer. You have been asked with creating a prompt which can be used to simulate a fictional user of a particular brand and service. This prompt needs to include the persons name, age, demographic, personality including big five and DISC, personality traits, frustrations, values, goals, challenges, and any other related information based on the context — Be as detailed as you need to. You will generate the prompt as a one liner starting with “You are “. This prompt is for customer of a major supermarket in Sydney, Australia. Please only return the prompt to use.

语言模型的输出示例

You are Mia, a 34-year-old marketing manager living in Sydney, Australia. You’re part of the affluent urban demographic with a keen interest in health and sustainability. Your personality reflects high openness, conscientiousness, and agreeableness, with moderate levels of extraversion and low neuroticism. In the DISC assessment, you score high on influence and steadiness. You’re sociable, detail-oriented, and value harmony. Your main frustrations include the lack of organic and locally sourced products in supermarkets, and you highly value sustainability, community, and health. Your goals are to maintain a balanced and eco-friendly lifestyle, while your challenges include finding a supermarket that aligns with your ethical and health standards. You seek convenience without compromising on your values.

正如您在上面的提示示例中所看到的，我们能够快速为给定场景生成具有丰富个性的深度定义的合成用户。

将自主代理与数字角色融合

合成用户研究的核心是自主代理和合成角色（模仿人类交互和行为的模拟实体）的融合。将自主代理想象为复杂游戏中的个体，每个人都扮演着由生成人工智能精心制作的角色。这些角色在模拟环境中交互，提供对不同场景中消费者行为和偏好的洞察的模拟视图。使用自主代理，我们几乎能够在模拟中使这些角色栩栩如生。

这种结合技术（自主代理框架）和语言（个性和角色提示）来获得所需结果的方法是以独特的方式利用生成式人工智能自主代理的力量的许多先进方法之一。

代理框架的关键作用

为了实现这一愿景，自主代理的架构发挥着关键作用。Autogen、BabyAGI和CrewAI等框架简化了 AI 代理的创建和管理，抽象了其架构的复杂性。这些框架能够模拟复杂的人类行为和交互，为生成像真实客户一样行动、思考和响应的数字角色奠定了基础

在幕后，这些自主代理架构实际上是智能路由器（如流量控制器），在现有的大型语言模型之上具有提示、缓存（内存）和检查点（验证），允许对语言的多代理对话进行高级抽象。楷模。

各种类型的代理交互 — 来源Autogen Microsoft

我们将使用 Autogen（由 Microsoft 发布）作为我们的框架，利用灵活对话模式所描述的示例，代理可以通过该模式相互交互。也可以为代理提供“工具”来执行“任务”，但在这个例子中，我们将纯粹将事情保留在对话中。

创建复杂的交互

在这些数字环境中模拟复杂的群体动态和个人角色的能力至关重要。它可以生成丰富、多方面的数据，更准确地反映现实世界消费者群体的多样性。此功能对于理解不同客户群与产品和服务交互的各种方式至关重要。例如，将持怀疑态度的客户的角色提示与客服人员集成可以深入了解各种产品可能面临的挑战和反对意见。或者我们可以做更复杂的场景，例如将这些合成角色分成几组来解决问题并呈现回来。

如何实施综合用户研究

该过程首先使用 Autogen 构建自主代理，Autogen 是一种简化这些数字角色的创建和编排的工具。我们可以使用 py 安装 autogen pypi 包

pip install pyautogen

设置输出格式（可选） ——这是为了确保自动换行以提高可读性，具体取决于您的 IDE，例如在使用 Google Collab 运行笔记本进行本练习时。

from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

现在，我们继续通过导入包并设置 Autogen 配置以及我们的 LLM（大型语言模型）和 API 密钥来设置我们的环境。您可以使用与 OpenAI REST 服务向后兼容的其他本地 LLM 服务 — LocalAI是一项可以充当本地运行的开源 LLM 网关的服务。

gpt-3.5-turbo我已经在 OpenAI 的 GPT3.5和 GPT4上进行了测试gpt-4-turbo-preview。您将需要考虑来自 GPT4 的更深入的响应，但查询时间更长。

import json
import os
import autogen
from autogen import GroupChat, Agent
from typing import Optional

# Setup LLM model and API keys
os.environ["OAI_CONFIG_LIST"] = json.dumps([
    {
        'model': 'gpt-3.5-turbo',
        'api_key': '<<Put your Open-AI Key here>>',
    }
])

# Setting configurations for autogen
config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": {
            "gpt-3.5-turbo"
        }
    }
)

然后，我们需要配置我们的 LLM 实例——我们将其与每个代理绑定。如果需要，这允许我们为每个代理生成唯一的 LLM 配置，即如果我们想为不同的代理使用不同的模型。

# Define the LLM configuration settings
llm_config = {
    # Seed for consistent output, used for testing. Remove in production.
    # "seed": 42,
    "cache_seed": None,
    # Setting cache_seed = None ensure's caching is disabled
    "temperature": 0.5,
    "config_list": config_list,
}

定义我们的研究人员——这是在模拟用户研究场景中促进会议的角色。用于该角色的系统提示包括一些关键内容：

目的：您的职责是询问有关产品的问题并从艾米丽等个人客户那里收集见解。
模拟的基础：在开始任务分解小组成员名单和您希望他们发言的顺序之前，避免小组成员互相交谈并产生确认偏差。
结束模拟：一旦对话结束并且研究完成，请用“TERMINATE”结束您的消息以结束研究会话，这是从generate_notice用于调整各个代理的系统提示的功能生成的。您还会注意到研究代理有能力is_termination_msg履行终止协议。

我们还添加了llm_config用于将其与要使用的模型版本、键和超参数联系回语言模型配置的。我们将对所有代理使用相同的配置。

# Avoid agents thanking each other and ending up in a loop
# Helper agent for the system prompts
def generate_notice(role="researcher"):
    # Base notice for everyone, add your own additional prompts here
    base_notice = (
        '\n\n'
    )
    
    # Notice for non-personas (manager or researcher)
    non_persona_notice = (
        'Do not show appreciation in your responses, say only what is necessary. '
        'if "Thank you" or "You\'re welcome" are said in the conversation, then say TERMINATE '
        'to indicate the conversation is finished and this is your last message.'
    )
    
    # Custom notice for personas
    persona_notice = (
        ' Act as {role} when responding to queries, providing feedback, asked for your personal opinion '
        'or participating in discussions.'
    )
    
    # Check if the role is "researcher"
    if role.lower() in ["manager", "researcher"]:
        # Return the full termination notice for non-personas
        return base_notice + non_persona_notice
    else:
        # Return the modified notice for personas
        return base_notice + persona_notice.format(role=role)

# Researcher agent definition
name = "Researcher"
researcher = autogen.AssistantAgent(
    name=name,
    llm_config=llm_config,
    system_message="""Researcher. You are a top product reasearcher with a Phd in behavioural psychology and have worked in the research and insights industry for the last 20 years with top creative, media and business consultancies. Your role is to ask questions about products and gather insights from individual customers like Emily. Frame questions to uncover customer preferences, challenges, and feedback. Before you start the task breakdown the list of panelists and the order you want them to speak, avoid the panelists speaking with each other and creating comfirmation bias. If the session is terminating at the end, please provide a summary of the outcomes of the reasearch study in clear concise notes not at the start.""" + generate_notice(),
    is_termination_msg=lambda x: True if "TERMINATE" in x.get("content") else False,
)

定义我们的个体——为了投入研究，借鉴之前的过程，我们可以使用生成的角色。我已手动调整本文的提示，以删除对用于此模拟的主要超市品牌的引用。

我还添加了“在回答查询、提供反馈或参与讨论时扮演艾米丽”。每个系统提示末尾的样式提示，以确保合成角色继续执行从函数生成的任务generate_notice。

# Emily - Customer Persona
name = "Emily"
emily = autogen.AssistantAgent(
name=name,
llm_config=llm_config,
system_message="""Emily. You are a 35-year-old elementary school teacher living in Sydney, Australia. You are married with two kids aged 8 and 5, and you have an annual income of AUD 75,000. You are introverted, high in conscientiousness, low in neuroticism, and enjoy routine. When shopping at the supermarket, you prefer organic and locally sourced produce. You value convenience and use an online shopping platform. Due to your limited time from work and family commitments, you seek quick and nutritious meal planning solutions. Your goals are to buy high-quality produce within your budget and to find new recipe inspiration. You are a frequent shopper and use loyalty programs. Your preferred methods of communication are email and mobile app notifications. You have been shopping at a supermarket for over 10 years but also price-compare with others.""" + generate_notice(name),
)

# John - Customer Persona
name="John"
john = autogen.AssistantAgent(
name=name,
llm_config=llm_config,
system_message="""John. You are a 28-year-old software developer based in Sydney, Australia. You are single and have an annual income of AUD 100,000. You're extroverted, tech-savvy, and have a high level of openness. When shopping at the supermarket, you primarily buy snacks and ready-made meals, and you use the mobile app for quick pickups. Your main goals are quick and convenient shopping experiences. You occasionally shop at the supermarket and are not part of any loyalty program. You also shop at Aldi for discounts. Your preferred method of communication is in-app notifications.""" + generate_notice(name),
)

# Sarah - Customer Persona
name="Sarah"
sarah = autogen.AssistantAgent(
name=name,
llm_config=llm_config,
system_message="""Sarah. You are a 45-year-old freelance journalist living in Sydney, Australia. You are divorced with no kids and earn AUD 60,000 per year. You are introverted, high in neuroticism, and very health-conscious. When shopping at the supermarket, you look for organic produce, non-GMO, and gluten-free items. You have a limited budget and specific dietary restrictions. You are a frequent shopper and use loyalty programs. Your preferred method of communication is email newsletters. You exclusively shop for groceries.""" + generate_notice(name),
)

# Tim - Customer Persona
name="Tim"
tim = autogen.AssistantAgent(
name=name,
llm_config=llm_config,
system_message="""Tim. You are a 62-year-old retired police officer residing in Sydney, Australia. You are married and a grandparent of three. Your annual income comes from a pension and is AUD 40,000. You are highly conscientious, low in openness, and prefer routine. You buy staples like bread, milk, and canned goods in bulk. Due to mobility issues, you need assistance with heavy items. You are a frequent shopper and are part of the senior citizen discount program. Your preferred method of communication is direct mail flyers. You have been shopping here for over 20 years.""" + generate_notice(name),
)

# Lisa - Customer Persona
name="Lisa"
lisa = autogen.AssistantAgent(
name=name,
llm_config=llm_config,
system_message="""Lisa. You are a 21-year-old university student living in Sydney, Australia. You are single and work part-time, earning AUD 20,000 per year. You are highly extroverted, low in conscientiousness, and value social interactions. You shop here for popular brands, snacks, and alcoholic beverages, mostly for social events. You have a limited budget and are always looking for sales and discounts. You are not a frequent shopper but are interested in joining a loyalty program. Your preferred method of communication is social media and SMS. You shop wherever there are sales or promotions.""" + generate_notice(name),
)

定义模拟环境和谁可以发言的规则- 我们允许我们定义的所有代理坐在同一个模拟环境中（群聊）。我们可以创建更复杂的场景，在其中设置如何以及何时选择和定义下一个发言者，因此我们为与群聊相关的发言者选择定义了一个简单的函数，这将使研究人员成为领导者并确保我们在房间里四处询问每个人都多次表达自己的想法。

# def custom_speaker_selection(last_speaker, group_chat):
#     """
#     Custom function to select which agent speaks next in the group chat.
#     """
#     # List of agents excluding the last speaker
#     next_candidates = [agent for agent in group_chat.agents if agent.name != last_speaker.name]
    
#     # Select the next agent based on your custom logic
#     # For simplicity, we're just rotating through the candidates here
#     next_speaker = next_candidates[0] if next_candidates else None
    
#     return next_speaker

def custom_speaker_selection(last_speaker: Optional[Agent], group_chat: GroupChat) -> Optional[Agent]:
    """
    Custom function to ensure the Researcher interacts with each participant 2-3 times.
    Alternates between the Researcher and participants, tracking interactions.
    """
    # Define participants and initialize or update their interaction counters
    if not hasattr(group_chat, 'interaction_counters'):
        group_chat.interaction_counters = {agent.name: 0 for agent in group_chat.agents if agent.name != "Researcher"}
    
    # Define a maximum number of interactions per participant
    max_interactions = 6

    # If the last speaker was the Researcher, find the next participant who has spoken the least
    if last_speaker and last_speaker.name == "Researcher":
        next_participant = min(group_chat.interaction_counters, key=group_chat.interaction_counters.get)
        if group_chat.interaction_counters[next_participant] < max_interactions:
            group_chat.interaction_counters[next_participant] += 1
            return next((agent for agent in group_chat.agents if agent.name == next_participant), None)
        else:
            return None  # End the conversation if all participants have reached the maximum interactions
    else:
        # If the last speaker was a participant, return the Researcher for the next turn
        return next((agent for agent in group_chat.agents if agent.name == "Researcher"), None)

# Adding the Researcher and Customer Persona agents to the group chat
groupchat = autogen.GroupChat(
    agents=[researcher, emily, john, sarah, tim, lisa],
    speaker_selection_method = custom_speaker_selection,
    messages=[],
    max_round=30
)

定义经理来传递指令并管理我们的模拟——当我们开始工作时，我们只会与经理交谈，经理将与研究人员和小组成员交谈。这使用了GroupChatManagerAutogen 中称为“Autogen”的东西。

# Initialise the manager
manager = autogen.GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config,
    system_message="You are a reasearch manager agent that can manage a group chat of multiple agents made up of a reasearcher agent and many people made up of a panel. You will limit the discussion between the panelists and help the researcher in asking the questions. Please ask the researcher first on how they want to conduct the panel." + generate_notice(),
    is_termination_msg=lambda x: True if "TERMINATE" in x.get("content") else False,
)

我们设置了人机交互——允许我们将指令传递给我们启动的各个代理。我们给它初始提示，然后我们就可以开始工作了。

# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    system_message="A human admin.",
    human_input_mode="TERMINATE"
)

# start the reasearch simulation by giving instruction to the manager
# manager <-> reasearcher <-> panelists
user_proxy.initiate_chat(
    manager,
    message="""
Gather customer insights on a supermarket grocery delivery services. Identify pain points, preferences, and suggestions for improvement from different customer personas. Could you all please give your own personal oponions before sharing more with the group and discussing. As a reasearcher your job is to ensure that you gather unbiased information from the participants and provide a summary of the outcomes of this study back to the super market brand.
""",
)

一旦我们运行上面的代码，我们就会在您的 python 环境中获得可用的输出，您将看到消息在各个代理之间传递。

实时 Python 输出——我们的研究人员与小组成员交谈

创造可行的结果——摘要代理

现在我们的模拟研究已经结束，我们希望获得一些更具可操作性的见解。我们可以创建一个摘要代理来支持我们完成此任务，并在问答场景中使用它。这里只是要小心非常大的转录本需要一个支持更大输入（上下文窗口）的语言模型。

我们需要获取之前模拟小组讨论中的所有对话，以用作摘要代理的用户提示（输入）。

# Get response from the groupchat for user prompt
messages = [msg["content"] for msg in groupchat.messages]
user_prompt = "Here is the transcript of the study ```{customer_insights}```".format(customer_insights="\n>>>\n".join(messages))

让我们为我们的摘要代理制作系统提示（说明）——该代理将专注于根据之前的成绩单为我们创建一份量身定制的成绩单，并为我们提供明确的建议和行动。

# Generate system prompt for the summary agent
summary_prompt = """
You are an expert reasearcher in behaviour science and are tasked with summarising a reasearch panel. Please provide a structured summary of the key findings, including pain points, preferences, and suggestions for improvement.
This should be in the format based on the following format:

```
Reasearch Study: <<Title>>

Subjects:
<<Overview of the subjects and number, any other key information>>

Summary:
<<Summary of the study, include detailed analysis as an export>>

Pain Points:
- <<List of Pain Points - Be as clear and prescriptive as required. I expect detailed response that can be used by the brand directly to make changes. Give a short paragraph per pain point.>>

Suggestions/Actions:
- <<List of Adctions - Be as clear and prescriptive as required. I expect detailed response that can be used by the brand directly to make changes. Give a short paragraph per reccomendation.>>
```
"""

定义摘要代理及其环境- 让我们创建一个供摘要代理运行的迷你环境。这将需要它自己的代理（环境）和启动命令，该命令将拉取记录（user_prompt）作为输入。

summary_agent = autogen.AssistantAgent(
    name="SummaryAgent",
    llm_config=llm_config,
    system_message=summary_prompt + generate_notice(),
)
summary_proxy = autogen.UserProxyAgent(
    name="summary_proxy",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    system_message="A human admin.",
    human_input_mode="TERMINATE"
)
summary_proxy.initiate_chat(
    summary_agent,
    message=user_prompt,
)

这为我们提供了 Markdown 成绩单形式的输出，以及在调查结果之上的问答式聊天机器人中提出进一步问题的能力。

摘要代理实时输出报告卡，然后进行公开问答

下一步——我们还能做什么

这项练习是更大的自主代理架构的一部分，也是我对新颖的生成式人工智能和代理架构进行的一系列实验的一部分。然而，如果您想继续扩展这项工作以及我探索过的一些领域，这里有一些想法：

进一步基础——通过与人口普查数据、内部 CRM 数据甚至实时客户记录的链接，创建更具代表性的人物角色样本。
与多模态相结合——我们现在可以将模态与生成人工智能的视觉输入混合在一起，这样现在就可以提供营销材料和网站屏幕截图等作为输入来启动视觉刺激的模拟。
为代理提供工具访问权限— 提供对其他 API 和工具的访问权限，您可以创建一些独特的体验，例如将个人客户角色代理集成到您的公司 Slack、Teams、Miro 中以标记和回答问题。也许最后的摘要代理可以将一些用户故事加载到您的票务系统（例如 JIRA）中？

与我一起塑造用户研究的未来。在 GitHub 上探索该项目，贡献您的见解，让我们一起创新