SentenceTransfomer是一个基于Sentence-BERT开发的Python框架,可以为句子和短文本产生高质量的嵌入向量。包括英文和中文的许多语言都支持。同BERT模型相比,sentencetransformer运用更加简单,传入文本既可直接获得向量。
安装
推荐 Python 3.6或者更高版本, PyTorch 1.6.0 或者更高版本 ,以及huggingface开发的transformers v4.6.0或者更高版本. Python2.7的环境不会工作。
有时候,安装可能不成功,可能是pip版本兼容问题,可以先Upgrade一下pip包
python3 -m pip install --upgrade pip
然后
pip install -U sentence-transformers
import 包
from sentence_transformers import SentenceTransformer
英文例句
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
#Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)
#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
print("Sentence:", sentence)
print("Embedding:", embedding)
print("")
Sentence: This framework generates embeddings for each input sentence
Embedding: [-1.76214531e-01 1.20601252e-01 -2.93624073e-01 -2.29858026e-01
-8.22923928e-02 2.37709522e-01 3.39984864e-01 -7.80964196e-01
1.18127614e-01 1.63373962e-01 -1.37715712e-01 2.40282789e-01
4.25125599e-01 1.72417849e-01 1.05279692e-01 5.18164098e-01
6.22218400e-02 3.99285793e-01 -1.81652278e-01 -5.85578680e-01
4.49722409e-02 -1.72750309e-01 -2.68443495e-01 -1.47386149e-01
-1.89217970e-01 1.92150578e-01 -3.83842468e-01 -3.96007091e-01
4.30648863e-01 -3.15320134e-01 3.65949631e-01 6.05158620e-02
3.57325703e-01 1.59736529e-01 -3.00983816e-01 2.63250291e-01
-3.94311100e-01 1.84855521e-01 -3.99549276e-01 -2.67889529e-01
-5.45117497e-01 -3.13403942e-02 -4.30644333e-01 1.33278117e-01
-1.74793795e-01 -4.35465544e-01 -4.77379113e-01 7.12555572e-02
-7.37001151e-02 5.69137156e-01 -2.82579720e-01 5.24975285e-02
-8.20007861e-01 1.98296756e-01 1.69511825e-01 2.71780342e-01
2.64610827e-01 -2.55737714e-02 -1.74096107e-01 1.63314253e-01
-3.95260930e-01 -3.17556299e-02 -2.62556046e-01 3.52754712e-01
3.01434875e-01 -1.47197291e-01 2.10075796e-01 -1.84010491e-01
-4.12896037e-01 4.14775789e-01 -1.89769492e-01 -1.35482445e-01
-3.79272133e-01 -4.68020439e-02 -3.33601385e-02 9.00394097e-02
-3.30133140e-01 -3.87316942e-02 3.75082314e-01 -1.46996319e-01
4.34959829e-01 5.38325727e-01 -2.65445173e-01 1.64445907e-01
4.17078644e-01 -4.72508594e-02 -7.48731196e-02 -4.26261097e-01
-1.96994558e-01 6.10316209e-02 -4.74262655e-01 -6.48334742e-01
3.71462464e-01 2.50957102e-01 1.22529611e-01 8.88766572e-02
-1.06724210e-01 5.33984490e-02 9.74507183e-02 -3.46660167e-02
-1.02882817e-01 2.32289001e-01 -2.53739536e-01 -5.13112307e-01
1.85216278e-01 -3.04357797e-01 -3.55209075e-02 -1.26975372e-01
-7.71632940e-02 -5.15330076e-01 -2.28071719e-01 2.03343164e-02
7.38175958e-02 -1.52558655e-01 -4.00837570e-01 -2.47749180e-01
3.97470325e-01 -2.60260701e-01 2.50906169e-01 1.68228924e-01
1.33900508e-01 -2.10833233e-02 -4.70035732e-01 4.78850156e-01
2.80345589e-01 -4.64546800e-01 3.21747035e-01 2.34207422e-01
2.45772451e-01 -4.71482307e-01 5.00400960e-01 4.10190076e-01
5.15216827e-01 2.62549460e-01 2.11593546e-02 -3.89687568e-01
-2.41742760e-01 -2.14834630e-01 -8.62650797e-02 -1.65323570e-01
-5.21895029e-02 3.41874868e-01 4.50314462e-01 -3.06973577e-01
-2.02294186e-01 6.85521722e-01 -5.33892572e-01 3.58471543e-01
1.45286605e-01 -7.07056001e-02 -1.50529072e-01 -8.56279582e-02
-7.67851025e-02 1.89544857e-01 -1.04067773e-01 5.33544004e-01
-5.27887225e-01 2.42332090e-02 -2.64348090e-01 -2.23186895e-01
-3.81208718e-01 7.59914368e-02 -4.64485109e-01 -3.36549252e-01
4.21229839e-01 1.07479207e-01 1.90457791e-01 2.89487489e-03
-1.08513705e-01 1.53545350e-01 3.16023648e-01 -2.70840749e-02
-5.40594459e-01 8.97286758e-02 -1.15549676e-01 3.97803992e-01
-4.97683346e-01 -2.84893364e-01 4.99861799e-02 3.61279696e-01
6.90535665e-01 1.46821439e-01 1.73396602e-01 -1.74582347e-01
-3.15702260e-01 6.72999769e-02 2.17250243e-01 9.78535116e-02
-1.29472464e-01 -1.86929435e-01 1.34878129e-01 -1.53885290e-01
7.44715557e-02 -1.85536250e-01 -2.80628383e-01 -1.14144213e-01
4.12249625e-01 6.39491975e-02 -1.45715117e-01 -9.82065052e-02
-1.33081883e-01 -1.88410461e-01 -2.84838937e-02 -3.49510163e-02
3.34258713e-02 6.98896796e-02 1.90354511e-01 -2.96724051e-01
2.64706067e-03 1.09140947e-01 1.70892701e-02 2.60589242e-01
3.29038620e-01 -6.61560148e-02 2.39665717e-01 -2.26194620e-01
-3.36869545e-02 1.49400130e-01 -3.21265638e-01 -2.68577904e-01
5.72632015e-01 -4.92308497e-01 2.00666577e-01 -3.49261820e-01
-2.89886612e-02 6.09010458e-01 -5.72333157e-01 2.35000670e-01
6.47180574e-03 -3.14952508e-02 2.78108083e-02 -3.90340954e-01
-2.08950117e-01 -3.04452837e-01 -7.20199272e-02 -8.29840004e-02
3.73792857e-01 7.38937110e-02 -2.21076086e-02 9.88139287e-02
-1.51426882e-01 -1.40430734e-01 2.26017952e-01 2.76089966e-01
-8.87747630e-02 -1.12816028e-01 -2.66286045e-01 2.77834296e-01
-4.75609973e-02 6.71005547e-02 -2.78584175e-02 -2.39991937e-02
2.51708686e-01 4.68793

SentenceTransformer是一个基于Sentence-BERT的Python库,能够为句子及短文本快速生成高质量嵌入向量,支持中英文等多种语言。相较于BERT更易用,通过直接调用即可获取文本向量。
最低0.47元/天 解锁文章
5907

被折叠的 条评论
为什么被折叠?



