使用GPTCache优化OpenAI语言翻译性能的技术实践

荣正青

于 2025-06-05 09:16:25 发布

阅读量280

点赞数 5

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00060/article/details/148443371

使用GPTCache优化OpenAI语言翻译性能的技术实践

GPTCache Semantic cache for LLMs. Fully integrated with LangChain and llama_index. 项目地址: https://gitcode.com/gh_mirrors/gp/GPTCache

引言

在自然语言处理应用中，语言翻译是一个常见且重要的功能。随着大语言模型(LLM)的发展，我们可以轻松实现高质量的翻译效果。然而，在实际应用中，重复或相似的翻译请求会导致不必要的API调用，增加成本并降低响应速度。本文将介绍如何利用GPTCache项目来优化OpenAI语言翻译的性能。

基础环境准备

在开始之前，我们需要确保已经完成以下准备工作：

安装必要的Python包：openai和gptcache
设置OpenAI API密钥为环境变量
了解基本的OpenAI API调用方式

传统OpenAI翻译实现

我们先来看一个基本的翻译实现示例，将英文翻译成法语、西班牙语和日语：

import time
import openai

def response_text(openai_resp):
    return openai_resp["choices"][0]["text"]

start_time = time.time()
response = openai.Completion.create(
  model="text-davinci-003",
  prompt="Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat rooms do you have available?\n\n1.",
  temperature=0.3,
  max_tokens=100,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

print(f"\nAnswer: 1.{response_text(response)}")
print("Time consuming: {:.2f}s".format(time.time() - start_time))

这种实现方式简单直接，但每次调用都会向OpenAI服务器发送请求，无论是否曾经处理过相同或相似的请求。

引入GPTCache进行精确匹配缓存

GPTCache可以为OpenAI API调用添加缓存层，首先我们实现精确匹配缓存：

from gptcache import cache
from gptcache.processor.pre import get_prompt

# 初始化缓存
cache.init(pre_embedding_func=get_prompt)
cache.set_openai_key()

questions = [
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhich rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat kind of rooms do you have available?\n\n1.",
]

for question in questions:
    start_time = time.time()
    response = openai.Completion.create(
                  model="text-davinci-003",
                  prompt=question,
                  temperature=0.3,
                  max_tokens=100,
                  top_p=1.0,
                  frequency_penalty=0.0,
                  presence_penalty=0.0
                )
    print(f"\nAnswer: 1.{response_text(response)}")
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

这种模式下，GPTCache会缓存完全相同的查询请求，当再次遇到完全相同的请求时，直接从缓存返回结果，避免API调用。

实现语义相似匹配缓存

更高级的用法是配置GPTCache支持语义相似的查询匹配。这需要设置更复杂的组件：

from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

# 使用ONNX嵌入模型
onnx = Onnx()
# 配置数据管理器，使用SQLite存储元数据，FAISS存储向量
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))

# 初始化缓存
cache.init(
    pre_embedding_func=get_prompt,
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()

# 同样的查询列表
for question in questions:
    start_time = time.time()
    response = openai.Completion.create(
                  model="text-davinci-003",
                  prompt=question,
                  temperature=0.3,
                  max_tokens=100,
                  top_p=1.0,
                  frequency_penalty=0.0,
                  presence_penalty=0.0
                )
    print(f"\nAnswer: 1.{response_text(response)}")
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

在这种配置下，GPTCache会：