search_with_lepton与大语言模型集成：实现智能问答搜索-优快云博客

search_with_lepton与大语言模型集成：实现智能问答搜索

【免费下载链接】search_with_lepton 项目地址: https://gitcode.com/GitHub_Trending/se/search_with_lepton

一、智能问答搜索的痛点与解决方案

你是否还在为传统搜索引擎返回海量无关结果而烦恼？是否希望获得直接、准确的答案而非链接列表？search_with_lepton通过深度集成大语言模型（LLM）与多源搜索能力，打造了新一代智能问答系统。本文将系统讲解其技术架构、实现细节与部署流程，帮助开发者快速构建类似应用。

读完本文你将掌握：

如何构建"搜索→增强→生成"的RAG架构
多搜索引擎无缝切换的实现方案
流式响应与前端实时渲染的技术细节
生产级部署的最佳实践与性能优化

二、系统架构与工作流程

2.1 整体架构

search_with_lepton采用模块化设计，主要包含四大核心组件：

mermaid

2.2 核心工作流程

系统遵循RAG（检索增强生成）范式，实现流程如下：

mermaid

三、核心技术实现

3.1 后端引擎 (search_with_lepton.py)

3.1.1 多搜索引擎抽象

系统设计了统一的搜索接口，支持Bing、Google、Serper等多后端无缝切换：

# 搜索引擎抽象实现
def search_with_bing(query: str, subscription_key: str):
    params = {"q": query, "mkt": BING_MKT}
    response = requests.get(
        BING_SEARCH_V7_ENDPOINT,
        headers={"Ocp-Apim-Subscription-Key": subscription_key},
        params=params,
        timeout=DEFAULT_SEARCH_ENGINE_TIMEOUT,
    )
    # 结果格式化...
    return contexts[:REFERENCE_COUNT]

# 类似实现search_with_google, search_with_serper等

配置通过环境变量实现动态切换：

# 后端初始化逻辑
self.backend = os.environ["BACKEND"].upper()
if self.backend == "BING":
    self.search_function = lambda query: search_with_bing(query, self.search_api_key)
elif self.backend == "GOOGLE":
    self.search_function = lambda query: search_with_google(query, self.search_api_key, os.environ["GOOGLE_SEARCH_CX"])
# 其他后端...

3.1.2 LLM集成与提示工程

采用Lepton AI的Client SDK实现模型调用，核心提示模板设计如下：

_rag_query_text = """
You are a large language AI assistant built by Lepton AI. You are given a user question, and please write clean, concise and accurate answer to the question. You will be given a set of related contexts to the question, each starting with a reference number like [[citation:x]], where x is a number. Please use the context and cite the context at the end of each sentence if applicable.

Your answer must be correct, accurate and written by an expert using an unbiased and professional tone. Please limit to 1024 tokens. Do not give any information that is not related to the question, and do not repeat. Say "information is missing on" followed by the related topic, if the given context do not provide sufficient information.

Here are the set of contexts:

{context}

Remember, don't blindly repeat the contexts verbatim. And here is the user question:
"""

动态生成增强上下文：

context = "\n\n".join([f"[[citation:{i+1}]] {c['snippet']}" for i, c in enumerate(contexts)])
response = self.local_client().chat.completions.create(
    model=self.model,
    messages=[
        {"role": "system", "content": _rag_query_text.format(context=context)},
        {"role": "user", "content": query},
    ],
    stream=True,  # 启用流式响应
    temperature=0.9,
)

3.1.3 结果缓存与并发处理

使用Lepton KV实现分布式缓存，提升重复查询性能：

# 缓存结果存储
self.kv = KV(os.environ["KV_NAME"], create_if_not_exists=True)
# 缓存命中逻辑
if search_uuid:
    try:
        result = self.kv.get(search_uuid)
        return StreamingResponse(str_to_generator(result))
    except KeyError:
        logger.info(f"Key {search_uuid} not found, will generate again.")

采用线程池处理并发任务：

# 并行生成相关问题
self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=self.handler_max_concurrency * 2)
related_questions_future = self.executor.submit(self.get_related_questions, query, contexts)

3.2 前端实现 (Next.js)

3.2.1 搜索组件设计

// web/src/app/components/search.tsx
export const Search: FC = () => {
  const [value, setValue] = useState("");
  const router = useRouter();
  return (
    <form
      onSubmit={(e) => {
        e.preventDefault();
        if (value) {
          setValue("");
          router.push(getSearchUrl(encodeURIComponent(value), nanoid()));
        }
      }}
    >
      <label className="relative bg-white flex items-center justify-center border ring-8 ring-zinc-300/20 py-2 px-2 rounded-lg gap-2">
        <input
          id="search-bar"
          value={value}
          onChange={(e) => setValue(e.target.value)}
          autoFocus
          placeholder="Ask Lepton AI anything ..."
          className="px-2 pr-6 w-full rounded-md flex-1 outline-none bg-white"
        />
        <button
          type="submit"
          className="w-auto py-1 px-2 bg-black border-black text-white fill-white active:scale-95 border overflow-hidden relative rounded-xl"
        >
          <ArrowRight size={16} />
        </button>
      </label>
    </form>
  );
};

3.2.2 流式响应处理

实现高效的流式数据处理与渲染：

// web/src/app/utils/fetch-stream.ts
async function pump(
  reader: ReadableStreamDefaultReader<Uint8Array>,
  controller: ReadableStreamDefaultController,
  onChunk?: (chunk: Uint8Array) => void,
  onDone?: () => void,
): Promise<ReadableStreamReadResult<Uint8Array> | undefined> {
  const { done, value } = await reader.read();
  if (done) {
    onDone && onDone();
    controller.close();
    return;
  }
  onChunk && onChunk(value);
  controller.enqueue(value);
  return pump(reader, controller, onChunk, onDone);
}

export const fetchStream = (
  response: Response,
  onChunk?: (chunk: Uint8Array) => void,
  onDone?: () => void,
): ReadableStream<string> => {
  const reader = response.body!.getReader();
  return new ReadableStream({
    start: (controller) => pump(reader, controller, onChunk, onDone),
  });
};

四、环境配置与部署指南

4.1 环境变量配置

参数名称	必选	描述	示例值
BACKEND	是	搜索后端类型	BING/SERPER/GOOGLE/LEPTON
LLM_MODEL	是	LLM模型名称	mixtral-8x7b/llama2-70b
LEPTON_WORKSPACE_TOKEN	是	Lepton工作区令牌	从控制台获取
BING_SEARCH_V7_SUBSCRIPTION_KEY	否	Bing搜索密钥	xxxxxxxx
GOOGLE_SEARCH_API_KEY	否	Google搜索密钥	xxxxxxxx
KV_NAME	否	缓存名称	search-with-lepton
RELATED_QUESTIONS	否	是否生成相关问题	true/false

4.2 快速启动步骤

安装依赖

# 后端依赖
pip install -U leptonai openai
# 前端依赖
cd web && npm install

配置环境变量

export LEPTON_WORKSPACE_TOKEN="your_token_here"
export BACKEND="BING"
export BING_SEARCH_V7_SUBSCRIPTION_KEY="your_bing_key"

构建前端

cd web && npm run build

启动服务

python search_with_lepton.py

4.3 生产级部署

使用Lepton AI进行一键部署：

lep photon run -n search-with-lepton \
  -m search_with_lepton.py \
  --env BACKEND=BING \
  --env LLM_MODEL=mixtral-8x7b \
  --secret BING_SEARCH_V7_SUBSCRIPTION_KEY=your_key \
  --secret LEPTON_WORKSPACE_TOKEN=your_token

部署配置建议：

资源规格：cpu.small（2核4G）足够支撑中等流量
自动扩缩容：最小1实例，最大5实例
缓存策略：默认TTL 24小时，热门查询可延长至7天

五、高级配置与性能优化

5.1 核心参数调优

参数	推荐值	调整策略
REFERENCE_COUNT	8	增加可提升回答丰富度，但延长响应时间
DEFAULT_SEARCH_ENGINE_TIMEOUT	5s	根据网络状况调整，建议3-10s
handler_max_concurrency	16	每核CPU可处理4-8并发，根据实例规格调整
LLM temperature	0.9	知识型问答建议0.3-0.5，创意型建议0.7-1.0

5.2 搜索后端切换

支持多种搜索后端无缝切换：

# 使用Serper作为搜索后端
export BACKEND="SERPER"
export SERPER_SEARCH_API_KEY="your_serper_key"

# 使用SearchApi作为搜索后端
export BACKEND="SEARCHAPI"
export SEARCHAPI_API_KEY="your_searchapi_key"

5.3 自定义LLM模型

修改LLM_MODEL环境变量切换不同模型：

# 使用Llama2-70B
export LLM_MODEL="llama2-70b"
# 使用GPT-4 (需配置OpenAI密钥)
export LLM_MODEL="gpt-4"
export OPENAI_API_KEY="your_openai_key"

六、常见问题与解决方案

6.1 搜索结果为空

排查步骤：检查搜索引擎密钥有效性→测试API调用→查看网络连接
解决方案：切换备用搜索后端→调整REFERENCE_COUNT→增加超时时间

6.2 LLM响应缓慢

优化策略：
1. 使用更小模型如mixtral-8x7b替代llama2-70b
2. 减少REFERENCE_COUNT至5-6
3. 启用KV缓存减少重复计算

6.3 前端渲染异常

常见原因：
- 流式响应解析错误
- Markdown渲染器不兼容
- CSS样式冲突
解决方法：检查parse-streaming.ts逻辑→更新react-markdown版本→使用CSS隔离

七、总结与展望

search_with_lepton通过巧妙整合搜索能力与大语言模型，构建了一个高性能、易扩展的智能问答系统。其核心优势在于：

模块化设计：各组件松耦合，便于替换与扩展
多引擎支持：灵活适配不同搜索服务
流式响应：提供实时交互体验
智能缓存：优化性能并降低API成本

未来发展方向：

多模态输入支持（图片、语音）
用户个性化搜索偏好
本地知识库集成
实时数据订阅与更新

通过本文介绍的架构与实现方案，开发者可以快速构建自己的智能问答系统，并根据实际需求进行定制优化。立即尝试部署，体验下一代搜索引擎的强大能力！

如果觉得本文有帮助，请点赞、收藏并关注作者，下期将带来《深入理解RAG：从原理到实践》

【免费下载链接】search_with_lepton 项目地址: https://gitcode.com/GitHub_Trending/se/search_with_lepton

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考