让我们把这个 expense 工具从 n8n 迁移到 Elastic One Workflow

原创于 2025-12-21 12:04:17 发布 · 570 阅读

11 ·

CC 4.0 BY-SA版权

本文为博主原创文章，未经博主允许不得转载。

文章标签：

#运维 #信息可视化 #elasticsearch #大数据 #搜索引擎 #全文检索 #ai

Elastic 同时被 3 个专栏收录

2200 篇文章

订阅专栏

Elasticsearch

1469 篇文章

订阅专栏

673 篇文章

订阅专栏

作者：来自 Elastic Vladimir_Filonov

使用 Elastic One Workflow、Gemini 和 Telegram 构建对话式费用助手

不久前，我偶然看到 Som 的一篇非常棒的实战指南（昨天发布）：

“使用 Elasticsearch Agent Builder、Telegram、n8n 和 Bedrock 构建对话式费用助手”（发布前会有修改）

这个想法非常优雅。Telegram 作为聊天 UI，n8n 负责整体编排，STT，Bedrock 用于意图识别和信息抽取，Elasticsearch + Agent Builder 用于存储和查询。非常干净。出奇地流畅。说实话？它立刻给了我灵感。这种情况在技术类实战文章中并不常见。

但由于我大部分时间都在使用 Elastic 的 Workflow Engine，我开始思考：

“如果我重新创建一个简化版的 Som 助手，但用 Elastic One Workflow 作为编排器，而不是 n8n，会怎么样？”

同样的想法，同样“和你的费用对话”的魔法，但整个流程原生运行在 Elastic 中。这看起来应该是可行的。而且确实基本可行。

One Workflow 还处于早期阶段，还没有 n8n 那么完整的能力。我做了一些简化：使用 Gemini 来完成 LLM 相关任务（一部分是为了 LLM 多样性，一部分是因为我想试试它），并且使用轮询 Telegram 而不是 webhooks。轮询运行得不错，不过 webhooks 会更好。

于是我把它做出来了。这是一个紧凑的费用助手。支持语音和文本。进行意图分类。将结构化费用数据连同语义 embedding 一起索引。使用 ES|QL 工具做分析。在 Telegram 中直接回复。这大概就是整体情况。

和 Som 版本的主要区别？Elastic One Workflow 替代了 n8n，Gemini 替代了 Bedrock，我使用的是定时轮询而不是 Telegram webhooks。一切都发生在 Elastic 内部，这正是重点所在。我并不是想做一个 1:1 的克隆 —— 只是想看看当所有东西都放在 ELK stack 里时，同样的想法会是什么效果。不过说实话，我并不确定长期来看每 15 秒轮询一次是不是最好的方式。Webhooks 会更干净，但 One Workflow 目前还不支持。所以只能用轮询。

它每 15 秒轮询一次 Telegram。把语音转换成文本 —— 我用的是 Deepgram，不过任何 STT 服务都可以。使用 Gemini 将意图分类为 INGEST 或 QUERY。然后要么索引费用数据，要么通过 Agent Builder 工具运行 ES|QL 查询。如果置信度较低？它会请求澄清。整个流程在完成初始设置之后其实相当直接。

image_placeholder.png

你需要具备带有 Agent Builder 和 inference endpoints 的 Elasticsearch。在 GCP 上通过 Vertex AI 配置好 Gemini。一个 Telegram bot token（如果你还没有，可以从 @BotFatherBotFather 获取）。以及一个 STT 提供方 —— 我使用的是 Deepgram，因为它很容易设置，不过也有其他选择。

设置索引

首先创建 Elasticsearch 索引。这里没有什么复杂的东西：

PUT /expenses
{
  "mappings": {
    "properties": {
      "@timestamp": { "type": "date" },
      "amount":       { "type": "float" },
      "merchant":     { "type": "keyword" },
      "category":     { "type": "keyword" },
      "payment_method": { "type": "keyword" },
      "raw_transcript": { "type": "text" },
      "user_id": { "type": "keyword" },
      "telegram_chat_id": { "type": "keyword" },
      "semantic_text": {
        "type": "semantic_text",
        "inference_id": "gemini_embeddings"
      }
    }
  }
}

设置 Gemini

这一部分比我预期花了更长时间。GCP 的设置比较繁琐。你知道流程：创建一个项目，启用 Vertex AI API，创建一个带有 Vertex AI User 角色的服务账号，下载 JSON key。选择一个 Vertex AI 可用的区域 —— 我用的是 us-central1，因为我之前已经配置好了。也许有更好的区域，我并没有仔细研究。

然后在 Kibana 中，进入 Stack Management → Connectors，创建一个 Gemini connector。名字随便取。将 API URL 设置为你所在区域的 endpoint。填入你的 project ID 和 region。模型使用 gemini-2.5-pro（或者你想用的其他版本）。把整个服务账号 JSON 粘贴到 credentials 字段里。是整个 JSON，不是其中的一部分。我第一次就犯了这个错误。

对于 embeddings，你需要一个 inference endpoint。像这样创建它：

PUT _inference/text_embedding/gemini_embeddings
{
  "service": "googlevertexai",
  "service_settings": {
    "project_id": "my-gemini-project-12345",
    "location": "us-central1",
    "model_id": "text-embedding-004"
  }
}

然后在 Kibana 中创建 .inference connector。路径：Stack Management → Connectors → AI Connector。
将 task type 设置为 text_embedding，provider 设置为 googlevertexai，inference ID 设置为 gemini_embeddings —— 这必须与你在索引映射中使用的一致，否则无法工作。在 secrets 部分粘贴相同的服务账号 JSON。保存时，connector 会自动创建/更新 inference endpoint。至少理论上是这样，我当时刷新了几次才生效。

重要提示：inference_id 必须与你的索引映射一致。在你能用 semantic_text 字段索引文档之前，endpoint 必须存在。Elasticsearch 在索引时会自动生成 embeddings。至少文档是这么说的 —— 我最初遇到了一些问题，但最终还是成功了。

示例文档：

POST /expenses/_doc
{
  "@timestamp": "2025-01-15T10:30:00Z",
  "amount": 250.0,
  "merchant": "cafe",
  "category": "food",
  "payment_method": "credit_card",
  "raw_transcript": "Spent 250 on lunch at the cafe",
  "semantic_text": "Spent 250 on lunch at the cafe",
  "user_id": "user123",
  "telegram_chat_id": "chat456"
}

semantic_text 字段会自动生成 embeddings。你可以检查它是否成功，尽管我不完全确定失败时的输出是什么。我只是根据查询返回了结果，就假设它成功了：

GET /expenses/_search
{
  "_source": {
    "includes": ["*", "_inference_fields"]
  },
  "query": {
    "match_all": {}
  },
  "size": 1
}

创建 ES|QL 工具

我为 Agent Builder 创建了两个 ES|QL 工具。嗯，我本来想创建更多，但这两个已经足够我的需求了。如果需要，之后可能还可以添加更多。第一个按日期范围搜索：

POST kbn://api/agent_builder/tools
{
  "id": "search_expenses_by_date",
  "type": "esql",
  "description": "Search expenses within a date range. Returns amount, merchant, category, and payment method.",
  "tags": ["expenses", "analytics"],
  "configuration": {
    "query": "FROM expenses | WHERE @timestamp >= ?start_date AND @timestamp <= ?end_date | WHERE category == ?category | STATS total = SUM(amount) BY category, payment_method | SORT total DESC",
    "params": {
      "start_date": {
        "type": "date",
        "description": "Start date in ISO format (e.g., 2025-01-01)"
      },
      "end_date": {
        "type": "date",
        "description": "End date in ISO format (e.g., 2025-01-31)"
      },
      "category": {
        "type": "keyword",
        "description": "Category filter (optional - use empty string for all categories)",
        "optional": true,
        "defaultValue": ""
      }
    }
  }
}

ES|QL 工具使用 ?param_name 语法。category 参数是可选的 —— 空字符串返回所有类别。我想是这样，我没有测试所有边界情况。可能本该测试，但对我的用例来说已经可以用了。

第二个工具做语义搜索。这比我预期的更有用。语义搜索效果实际上相当不错：

POST kbn://api/agent_builder/tools
{
  "id": "semantic_search_expenses",
  "type": "esql",
  "description": "Semantically search expenses using natural language query. Useful for finding expenses by description, merchant name, or context.",
  "tags": ["expenses", "semantic-search"],
  "configuration": {
    "query": "FROM expenses METADATA _score | WHERE MATCH(semantic_text, ?query) | SORT _score DESC, @timestamp DESC | LIMIT ?limit",
    "params": {
      "query": {
        "type": "text",
        "description": "Natural language search query"
      },
      "limit": {
        "type": "integer",
        "description": "Maximum number of results",
        "optional": true,
        "defaultValue": 10
      }
    }
  }
}

MATCH 函数在 semantic_text 字段上使用映射中的 inference endpoint 执行语义搜索。你需要使用 METADATA _score 按相关性排序 —— 我是在疑惑为什么结果没有正确排序后才学到这一点的。错误信息也不是特别有帮助。

工作流

这个工作流每 15 秒运行一次。轮询 Telegram。处理消息。它很长，因为要处理语音和文本、意图分类、置信度检查、路由。可能可以更短，但我没怎么优化。如下：

name: "Expense Assistant Workflow"
description: "Scheduled workflow that polls Telegram and processes expense messages"
enabled: true

triggers:
  - type: scheduled
    with:
      every: "15s"

consts:
  telegram_bot_token: "<TELEGRAM_BOT_TOKEN>"
  telegram_api_url: "https://api.telegram.org/bot"
  last_update_id: 0  # This will be stored/retrieved from Elasticsearch

steps:
    # Step 1: Poll Telegram for new messages
    - name: poll_telegram
      type: http
      with:
        url: "{{ consts.telegram_api_url }}{{ consts.telegram_bot_token }}/getUpdates"
        method: GET
        body:
          offset: "{{ steps.get_last_update_id.output._source.last_update_id | default: 0 | plus: 1 }}"
      on-failure:
        continue: true  # Skip if there's a conflict, will retry on next run
    
    # Step 2: Check if there are new messages
    - name: check_new_messages
      type: if
      condition: "{{ steps.poll_telegram.output.result }}"
      steps:
        # Step 3: Process each message
        - name: process_messages
          type: foreach
          foreach: "{{ steps.poll_telegram.output.result | json: 2 }}"
          steps:
            # Extract message data from Telegram update
            - name: extract_message_data
              type: console
              with:
                message: "Processing message from user {{ foreach.item.message.from.id | json: 2 }}"
            
            # Check if message has text or voice
            - name: check_message_type
              type: if
              condition: "{{ foreach.item.message.voice != null or foreach.item.message.audio != null }}"
              steps:
                # Voice message - get file and transcribe
                - name: get_voice_file
                  type: http
                  with:
                    url: "{{ consts.telegram_api_url }}{{ consts.telegram_bot_token }}/getFile"
                    method: GET
                    body:
                      file_id: "{{ foreach.item.message.voice.file_id | default: foreach.item.message.audio.file_id }}"
                
                - name: transcribe_voice
                  type: http
                  with:
                    url: "https://api.deepgram.com/v1/listen"
                    method: POST
                    headers:
                      Authorization: "Token YOUR_DEEPGRAM_KEY"
                      Content-Type: "application/json"
                    body:
                      url: "https://api.telegram.org/file/bot{{ consts.telegram_bot_token }}/{{ steps.get_voice_file.output.result.file_path }}"
                  on-failure:
                    fallback:
                      - name: fallback_transcription
                        type: http
                        with:
                          url: "http://localhost:8000/transcribe"
                          method: POST
                          body:
                            audio_url: "https://api.telegram.org/file/bot{{ consts.telegram_bot_token }}/{{ steps.get_voice_file.output.result.file_path }}"
              else:
                # Text message
                - name: use_text_directly
                  type: console
                  with:
                    message: "Processing text message: {{ foreach.item.message.text | json: 2 }}"
            
            # Extract transcript (from voice or text)
            - name: get_transcript
              type: console
              with:
                message: "Transcript: {{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
            
            # Intent Classification
            - name: classify_intent
              type: .gemini
              connector-id: "gemini-expense-assistant-connector-id"
              with:
                subAction: invokeAI
                subActionParams:
                  model: "gemini-2.5-pro"
                  messages:
                    - role: user
                      content: "{{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
                  systemInstruction: |
                    You are an intent classifier for an expense assistant.
                    Classify the user's message as either:
                    - INGEST: User wants to add/record an expense (e.g., "Spent 250 on lunch", "Add dinner for 350")
                    - QUERY: User wants to query/search expenses (e.g., "How much did I spend last week?", "Show my food expenses")
                    
                    Always respond with valid JSON only:
                    {
                      "intent": "INGEST" or "QUERY",
                      "confidence": 0.0-1.0,
                      "reasoning": "brief explanation"
                    }
    
            - name: check_confidence
              type: if
              # TODO: This condition needs to be adjusted based on your workflow structure.
              # The template extracts the confidence, but KQL needs a field name to compare.
              # Consider restructuring to extract the confidence value first, then reference it.
              condition: "confidence < 0.7"
              steps:
                - name: request_clarification
                  type: http
                  with:
                    url: "{{ consts.telegram_api_url }}{{ consts.telegram_bot_token }}/sendMessage"
                    method: POST
                    headers:
                      Content-Type: "application/json"
                    body:
                      chat_id: "{{ foreach.item.message.from.id | json: 2 }}"
                      text: "{{ steps.classify_intent.output.message.reasoning }}. Could you please clarify: Are you trying to add an expense or ask about existing expenses?"
              else:
                - name: route_by_intent
                  type: if
                  # TODO: This condition needs to be adjusted. KQL needs a field name.
                  # Consider restructuring to extract the intent value first, then reference it.
                  condition: "intent: INGEST"
                  steps:
                    # INGEST BRANCH
                    - name: extract_expense_data
                      type: .gemini
                      connector-id: "gemini-expense-assistant-connector-id"
                      with:
                        subAction: invokeAI
                        subActionParams:
                          model: "gemini-2.5-pro"
                          messages:
                            - role: user
                              content: "{{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
                          systemInstruction: |
                            Extract expense information from the user's message.
                            Return valid JSON with:
                            {
                              "amount": number,
                              "merchant": string,
                              "category": string (food, transport, entertainment, etc.),
                              "payment_method": string (credit_card, cash, debit, etc.),
                              "date": string (ISO format, default to today if not specified)
                            }
                    
                    - name: index_expense
                      type: elasticsearch.index
                      with:
                        index: "expenses"
                        document:
                          "@timestamp": "{{ steps.extract_expense_data.output.message.date | default: 'now' }}"
                          amount: "{{ steps.extract_expense_data.output.message.amount }}"
                          merchant: "{{ steps.extract_expense_data.output.message.merchant }}"
                          category: "{{ steps.extract_expense_data.output.message.category }}"
                          payment_method: "{{ steps.extract_expense_data.output.message.payment_method }}"
                          raw_transcript: "{{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
                          semantic_text: "{{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
                          user_id: "{{ foreach.item.message.from.id | json: 1 }}"
                          telegram_chat_id: "{{ foreach.item.message.chat.id | json: 1 }}"
                    
                    - name: send_ingest_response
                      type: http
                      with:
                        url: "{{ consts.telegram_api_url }}{{ consts.telegram_bot_token }}/sendMessage"
                        method: POST
                        headers:
                          Content-Type: "application/json"
                        body:
                              chat_id: "{{ foreach.item.message.from.id | json: 1 }}"
                              text: "✅ Added expense: {{ steps.extract_expense_data.output.message.amount }} at {{ steps.extract_expense_data.output.message.merchant }} ({{ steps.extract_expense_data.output.message.category }})"
                      
                  else:
                    # QUERY BRANCH
                    - name: query_agent
                      type: http
                      with:
                        url: "http://localhost:5601/api/agent_builder/mcp"
                        method: POST
                        headers:
                          Authorization: "ApiKey YOUR_API_KEY"
                          Content-Type: "application/json"
                        body:
                          method: "tools/call"
                          params:
                            name: "semantic_search_expenses"
                            arguments:
                              query: "{{ steps.transcribe_voice.output.results?.channels[0]?.alternatives[0]?.transcript or foreach.item.message.text }}"
                              limit: 10
                    
                    - name: send_query_response
                      type: http
                      with:
                        url: "{{ consts.telegram_api_url }}{{ consts.telegram_bot_token }}/sendMessage"
                        method: POST
                        headers:
                          Content-Type: "application/json"
                        body:
                          chat_id: "{{ foreach.item.message.from.id | json: 2 }}"
                          text: "{{ steps.query_agent.output.content[0].text | default: 'I found your expense information.' }}"
            
            # Step 4: Update last_update_id (store in Elasticsearch for persistence)
            # Update on every iteration - the last one will be the final value
            # This avoids needing array[length-1] syntax which LiquidJS doesn't support
            - name: update_last_update_id
              type: elasticsearch.index
              with:
                index: "telegram-bot-state"
                id: "last_update_id"
                document:
                  last_update_id: "{{ foreach.item.update_id }}"
                  updated_at: "now"

测试

我发送了 “Spent 250 on lunch”（应该会被 ingest）和 “How much did I spend on food last week?”（应该会查询）。

结果失败了.。唉，正如我之前提到的 —— One Workflow 还处于非常早期阶段，并不是所有东西都顺利运行。幸运的是，我已经找到了大部分 bug，最终工作流在我的本地环境中可以运行。所以，我只需要一点时间把所有问题整理到 PR 中并合并，希望很快你就能让一切正常工作 —— 我会更新这篇文章来保持信息同步 =)

我的收获

重新实现 Som 的费用助手不仅仅是一次技术实验。这是一次机会，让我看到当编排、搜索、语义和 AI 推理全部在同一平台上运行时会发生什么。说实话？感觉非常协调。就像一切本该协同工作，这很少见。

Elastic 一直在存储和搜索数据方面非常强大。但看到一个工作流从 Telegram 拉取消息、处理语音或文本、用 Gemini 分类意图、丰富并索引文档、运行 ES|QL 分析、并提供对话式回复 —— 所有这些都不离开 Elastic 生态系统 —— 真是有趣。Elasticsearch 显然正在发展成不仅仅是搜索引擎的东西。它正在成为一个平台，让 AI agents 和操作性工作流可以真正融合，而不会相互冲突。

这个助手远不是最终状态。在基础搭建好之后，你可以在它之上构建很多东西。支出洞察。异常检测（“嘿，这笔支出看起来怪怪的…”）。月度总结。对话式仪表板。主动通知。预算管理。多用户 bot。常规功能。所有这些都由同一个引擎驱动，这正是它有趣的地方。

如果你想了解 AI agents 在与真实可观测数据配合时的表现 —— 而不仅仅是漂浮在云端的聊天完成 —— 这种项目是一种出乎意料的、非常直观的探索方式。至少比我预期的更直观。

也许你的支出终于会开始回答你的问题。我的还没有，但它们正越来越接近。

原文：https://discuss.elastic.co/t/dec-4th-2025-en-lets-migrate-this-expense-tool-from-n8n-to-elastic-one-workflow/383586