chromadb自定义Embedding
环境
chroma-hnswlib 0.7.3
chromadb 0.4.24
python 3.10.10
import chromadb
import numpy as np
chroma_client = chromadb.Client()
from text2vec import SentenceModel
model = SentenceModel(model_name_or_path='/root/.cache/modelscope/hub/Jerry0/text2vec-base-chinese')
from chromadb import Documents, EmbeddingFunction, Embeddings
class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, texts: Documents) -> Embeddings:
embeddings = [list(model.encode(text,normalize_embeddings=True).astype(float)) for text in texts]
return embeddings
# 使用自定义MyEmbeddingFunction
collection = chroma_client.create_collection(name="my_collection", embedding_function=MyEmbeddingFunction())
# 添加内容
collection.add(
documents=["This is a document", "This is another document"],
# permission 这是我自己尝试添加的内容
metadatas=[{"source": "my_source","permission":1}, {"source": "my_source","permission":1}],
ids=["id1", "id2"]
)
# 查询测试
results = collection.query(
query_texts=["This is a query document"],
n_results=2
)
# 输出最终结果
print(results)
{
'ids': [['id1', 'id2']],
'distances': [[0.2559460997581482, 0.3458625078201294]],
'metadatas': [[{'permission': 1, 'source': 'my_source'}, {'permission': 1, 'source': 'my_source'}]],
'embeddings': None,
'documents': [['This is a document', 'This is another document']],
'uris': None,
'data': None
}
说明
我这里使用的是 text2vec-base-chinese 模型,该模型国内的下载地址(感谢)
https://www.modelscope.cn/models/Jerry0/text2vec-base-chinese/summary
反正自己是了好几次都不成功,有错误欢迎大家指导。
本文介绍了如何在ChromaDB环境中创建自定义嵌入函数,使用text2vec模型对中文文档进行编码,并在查询时应用这些嵌入进行相似度搜索。作者提到在使用过程中遇到下载模型的问题,寻求读者指导。
2098





