使用Azure Cosmos DB进行NoSQL矢量搜索

最新推荐文章于 2026-01-10 23:52:04 发布

原创最新推荐文章于 2026-01-10 23:52:04 发布 · 358 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#数据库 #azure #nosql #python

技术背景介绍

Azure Cosmos DB 是一种广泛用于AI应用程序的数据库，它支持多种数据库模型，其中包括NoSQL。Azure Cosmos DB 最近引入了矢量索引和搜索功能，特别适合高维度的矢量数据，可以用于高效的矢量相似度搜索。此功能的引入简化了对大规模数据的管理和检索，尤其是在AI驱动的应用中，例如推荐系统、图像和文本搜索。

核心原理解析

矢量搜索的核心在于将待搜索的内容表示为高维矢量，然后通过近似最近邻算法（例如余弦距离、欧几里得距离等）进行相似度计算。在Azure Cosmos DB中，每个文档的数据可以与矢量数据共存，矢量索引与存储于同一逻辑单元中，这样可以高效地进行索引和搜索操作。

代码实现演示

以下是通过Azure Cosmos DB进行矢量搜索的Python代码示例：

import openai
from azure.cosmos import CosmosClient, PartitionKey
from langchain_community.vectorstores.azure_cosmos_db_no_sql import AzureCosmosDBNoSqlVectorSearch
from langchain_openai import AzureOpenAIEmbeddings

# 设置API和Cosmos DB客户端
client = openai.OpenAI(
    base_url='https://yunwu.ai/v1',
    api_key='your-api-key'
)

HOST = "AZURE_COSMOS_DB_ENDPOINT"
KEY = "AZURE_COSMOS_DB_KEY"

cosmos_client = CosmosClient(HOST, KEY)
database_name = "langchain_python_db"
container_name = "langchain_python_container"
partition_key = PartitionKey(path="/id")
cosmos_container_properties = {"partition_key": partition_key}

# 初始化OpenAI嵌入实例
openai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment='text-embedding-ada-002',
    api_version='2023-05-15',
    azure_endpoint='YOUR_ENDPOINT',
    openai_api_key='OPENAI_API_KEY',
)

# 文档插入
documents = [...]  # 示例文档列表

vector_search = AzureCosmosDBNoSqlVectorSearch.from_documents(
    documents=docs,
    embedding=openai_embeddings,
    cosmos_client=cosmos_client,
    database_name=database_name,
    container_name=container_name,
    vector_embedding_policy={
        "vectorEmbeddings": [
            {
                "path": "/embedding",
                "dataType": "float32",
                "distanceFunction": "cosine",
                "dimensions": 1536,
            }
        ]
    },
    indexing_policy={
        "indexingMode": "consistent",
        "includedPaths": [{"path": "/*"}],
        "excludedPaths": [{"path": '/"_etag"/?'}],
        "vectorIndexes": [{"path": "/embedding", "type": "quantizedFlat"}],
    },
    cosmos_container_properties=cosmos_container_properties,
)

# 查询数据
query = "What were the compute requirements for training GPT 4"
results = vector_search.similarity_search(query)

print(results[0].page_content)

应用场景分析

推荐系统：根据用户行为记录的矢量表示，推荐相似的物品。
图像和文本搜索：通过图像或文本的矢量表示，快速搜索相似的内容。
情感分析：将文本情感表示为矢量，进行相似情感的文本匹配。

实践建议

确保矢量的维度与使用的模型相匹配。
索引策略的设计要考虑到数据规模和访问模式。
确保Cosmos DB的配置与索引策略一致，以保持查询性能。

结束语：‘如果遇到问题欢迎在评论区交流。’

—END—