使用OCI生成式AI和LangChain进行文本嵌入_langchain整合oci ai-优快云博客

技术背景介绍

Oracle Cloud Infrastructure (OCI) 生成式AI是一个完全托管的服务，提供了一套先进且可定制的大型语言模型（LLMs），覆盖广泛的用例，并可通过单一API进行访问。无论您是需要使用预训练模型还是基于自己的数据创建和托管自定义模型，OCI生成式AI都能满足需求。这篇文章将介绍如何结合LangChain使用OCI的生成式AI模型来进行文本嵌入。

核心原理解析

OCI生成式AI通过提供不同的认证方法（如API Key和Session Token等），使用户可以灵活地选择最适合的授权方式来调用服务。结合LangChain，用户可以轻松地将文本转换为由嵌入模型生成的高维向量，用于各种自然语言处理任务。

代码实现演示

下面，我们将通过示例代码展示如何利用LangChain和OCI的生成式AI模型进行文本嵌入操作。

首先，确保你安装了OCI SDK：

!pip install -U oci

然后，我们使用API Key进行认证并嵌入文本：

from langchain_community.embeddings import OCIGenAIEmbeddings

# 使用默认的API Key认证方式
embeddings = OCIGenAIEmbeddings(
    model_id="MY_EMBEDDING_MODEL",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",  # OCI生成式AI API端点
    compartment_id="MY_OCID",
)

# 嵌入单个查询
query = "This is a query in English."
response = embeddings.embed_query(query)
print("Query Embeddings:", response)

# 嵌入多个文档
documents = ["This is a sample document", "and here is another one"]
response = embeddings.embed_documents(documents)
print("Documents Embeddings:", response)

如果您想使用Session Token方式进行认证，可以参考以下代码：

# 使用Session Token进行认证
embeddings = OCIGenAIEmbeddings(
    model_id="MY_EMBEDDING_MODEL",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="MY_OCID",
    auth_type="SECURITY_TOKEN",
    auth_profile="MY_PROFILE",  # 替换为您的配置文件名
)

# 嵌入查询和文档
query = "This is a sample query"
response = embeddings.embed_query(query)
print("Query Embeddings:", response)

documents = ["This is a sample document", "and here is another one"]
response = embeddings.embed_documents(documents)
print("Documents Embeddings:", response)