from sentence_transformers import SentenceTransformer, util
函数使用
util.paraphrase_mining
它将所有句子与所有其他句子进行比较,并返回一个包含具有最高余弦相似度分数的对的列表
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')# Single list of sentences - Possible tens of thousands of sentences
sentences =['The cat sits outside','A man is playing guitar','I love pasta','The new movie is awesome','The cat plays in the garden','A woman watches TV','The new movie is so great','Do you like pizza?']
paraphrases = util.paraphrase_mining(model, sentences)for paraphrase in paraphrases[0:10]:
score, i, j = paraphrase
print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], score))
结果
Do you like pizza? I love pasta Score: 0.6845
The cat sits outside The new movie is awesome Score: 0.6035
The new movie is awesome The new movie is so great Score: 0.5867
The new movie is awesome Do you like pizza? Score: 0.5748
The new movie is so great I love pasta Score: 0.5489
I love pasta The new movie is awesome Score: 0.5480
A man is playing guitar The cat plays in the garden Score: 0.5179
The new movie is awesome The cat plays in the garden Score: 0.5111
The cat sits outside Do you like pizza? Score: 0.4982
The new movie is so great Do you like pizza? Score: 0.4945
deftest_kmeans():"""
This is a simple application for sentence embeddings: clustering
Sentences are mapped to sentence embeddings and then k-mean clustering is applied.
"""
embedder = SentenceTransformer(model_path3)# Corpus with example sentences
corpus =['A man is eating food.','A man is eating a piece of bread.','A man is eating pasta.','The girl is carrying a baby.','The baby is carried by the woman','A man is riding a horse.','A man is riding a white horse on an enclosed ground.','A monkey is playing drums.','Someone in a gorilla costume is playing a set of drums.','A cheetah is running behind its prey.','A cheetah chases prey on across a field.']
corpus_embeddings = embedder.encode(corpus)# Perform kmean clustering
num_clusters =5
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings