探索Couchbase中的向量搜索：从入门到精通

最新推荐文章于 2025-12-02 19:01:55 发布

原创最新推荐文章于 2025-12-02 19:01:55 发布 · 353 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#python

引言

在当今数据驱动的世界中，NoSQL数据库正变得越来越受欢迎。Couchbase因其分布式架构和NoSQL特性而备受关注，尤其是在云、移动、AI和边缘计算应用中。Couchbase提供了强大的向量搜索功能，使开发者能够在其应用中进行更加智能和快速的数据检索。本篇文章将详细介绍如何在Couchbase中使用向量搜索功能，帮助你掌握这种高效的搜索方式。

主要内容

安装与准备

在开始之前，你需要安装langchain-couchbase伙伴包：

pip install -qU langchain-couchbase

然后，访问Couchbase网站创建一个新连接，并确保保存数据库的用户名和密码。

import getpass

COUCHBASE_CONNECTION_STRING = getpass.getpass("Enter the connection string for the Couchbase cluster: ")
DB_USERNAME = getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD = getpass.getpass("Enter the password for the Couchbase cluster: ")

# Uncomment below if you want automated tracing
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

初始化连接

首先，我们需要创建一个连接到Couchbase集群的对象，并使用之前获得的用户名和密码进行身份验证。

from datetime import timedelta
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))

创建向量存储

接下来，我们将使用该集群信息创建向量存储对象，并设置搜索索引名称。

from langchain_couchbase.vectorstores import CouchbaseVectorStore
from langchain_core.embeddings import FakeEmbeddings

BUCKET_NAME = "langchain_bucket"
SCOPE_NAME = "_default"
COLLECTION_NAME = "default"
SEARCH_INDEX_NAME = "langchain-test-index"

embeddings = FakeEmbeddings(size=4096)

vector_store = CouchbaseVectorStore(
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    index_name=SEARCH_INDEX_NAME,
)

代码示例

下面是一个完整的使用Couchbase向量搜索的代码示例，包括添加和查询文档的功能。

from uuid import uuid4
from langchain_core.documents import Document

# 创建样本文档
document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

# 添加文档到向量存储
vector_store.add_documents(documents=[document_1], ids=[str(uuid4())])

# 执行相似性搜索
results = vector_store.similarity_search(
    "breakfast", 
    k=1
)

for res in results:
    print(f"* {res.page_content} [{res.metadata}]")