简单又强大:基于深度学习的短文本向量嵌入方法

SentenceTransformer是一个基于Sentence-BERT的Python库,能够为句子及短文本快速生成高质量嵌入向量,支持中英文等多种语言。相较于BERT更易用,通过直接调用即可获取文本向量。

SentenceTransfomer是一个基于Sentence-BERT开发的Python框架,可以为句子和短文本产生高质量的嵌入向量。包括英文和中文的许多语言都支持。同BERT模型相比,sentencetransformer运用更加简单,传入文本既可直接获得向量。

安装

推荐 Python 3.6或者更高版本, PyTorch 1.6.0 或者更高版本 ,以及huggingface开发的transformers v4.6.0或者更高版本. Python2.7的环境不会工作。

有时候,安装可能不成功,可能是pip版本兼容问题,可以先Upgrade一下pip包

python3 -m pip install --upgrade pip

然后
pip install -U sentence-transformers

import 包

from sentence_transformers import SentenceTransformer

英文例句

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.']

#Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)

#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")
Sentence: This framework generates embeddings for each input sentence
Embedding: [-1.76214531e-01  1.20601252e-01 -2.93624073e-01 -2.29858026e-01
 -8.22923928e-02  2.37709522e-01  3.39984864e-01 -7.80964196e-01
  1.18127614e-01  1.63373962e-01 -1.37715712e-01  2.40282789e-01
  4.25125599e-01  1.72417849e-01  1.05279692e-01  5.18164098e-01
  6.22218400e-02  3.99285793e-01 -1.81652278e-01 -5.85578680e-01
  4.49722409e-02 -1.72750309e-01 -2.68443495e-01 -1.47386149e-01
 -1.89217970e-01  1.92150578e-01 -3.83842468e-01 -3.96007091e-01
  4.30648863e-01 -3.15320134e-01  3.65949631e-01  6.05158620e-02
  3.57325703e-01  1.59736529e-01 -3.00983816e-01  2.63250291e-01
 -3.94311100e-01  1.84855521e-01 -3.99549276e-01 -2.67889529e-01
 -5.45117497e-01 -3.13403942e-02 -4.30644333e-01  1.33278117e-01
 -1.74793795e-01 -4.35465544e-01 -4.77379113e-01  7.12555572e-02
 -7.37001151e-02  5.69137156e-01 -2.82579720e-01  5.24975285e-02
 -8.20007861e-01  1.98296756e-01  1.69511825e-01  2.71780342e-01
  2.64610827e-01 -2.55737714e-02 -1.74096107e-01  1.63314253e-01
 -3.95260930e-01 -3.17556299e-02 -2.62556046e-01  3.52754712e-01
  3.01434875e-01 -1.47197291e-01  2.10075796e-01 -1.84010491e-01
 -4.12896037e-01  4.14775789e-01 -1.89769492e-01 -1.35482445e-01
 -3.79272133e-01 -4.68020439e-02 -3.33601385e-02  9.00394097e-02
 -3.30133140e-01 -3.87316942e-02  3.75082314e-01 -1.46996319e-01
  4.34959829e-01  5.38325727e-01 -2.65445173e-01  1.64445907e-01
  4.17078644e-01 -4.72508594e-02 -7.48731196e-02 -4.26261097e-01
 -1.96994558e-01  6.10316209e-02 -4.74262655e-01 -6.48334742e-01
  3.71462464e-01  2.50957102e-01  1.22529611e-01  8.88766572e-02
 -1.06724210e-01  5.33984490e-02  9.74507183e-02 -3.46660167e-02
 -1.02882817e-01  2.32289001e-01 -2.53739536e-01 -5.13112307e-01
  1.85216278e-01 -3.04357797e-01 -3.55209075e-02 -1.26975372e-01
 -7.71632940e-02 -5.15330076e-01 -2.28071719e-01  2.03343164e-02
  7.38175958e-02 -1.52558655e-01 -4.00837570e-01 -2.47749180e-01
  3.97470325e-01 -2.60260701e-01  2.50906169e-01  1.68228924e-01
  1.33900508e-01 -2.10833233e-02 -4.70035732e-01  4.78850156e-01
  2.80345589e-01 -4.64546800e-01  3.21747035e-01  2.34207422e-01
  2.45772451e-01 -4.71482307e-01  5.00400960e-01  4.10190076e-01
  5.15216827e-01  2.62549460e-01  2.11593546e-02 -3.89687568e-01
 -2.41742760e-01 -2.14834630e-01 -8.62650797e-02 -1.65323570e-01
 -5.21895029e-02  3.41874868e-01  4.50314462e-01 -3.06973577e-01
 -2.02294186e-01  6.85521722e-01 -5.33892572e-01  3.58471543e-01
  1.45286605e-01 -7.07056001e-02 -1.50529072e-01 -8.56279582e-02
 -7.67851025e-02  1.89544857e-01 -1.04067773e-01  5.33544004e-01
 -5.27887225e-01  2.42332090e-02 -2.64348090e-01 -2.23186895e-01
 -3.81208718e-01  7.59914368e-02 -4.64485109e-01 -3.36549252e-01
  4.21229839e-01  1.07479207e-01  1.90457791e-01  2.89487489e-03
 -1.08513705e-01  1.53545350e-01  3.16023648e-01 -2.70840749e-02
 -5.40594459e-01  8.97286758e-02 -1.15549676e-01  3.97803992e-01
 -4.97683346e-01 -2.84893364e-01  4.99861799e-02  3.61279696e-01
  6.90535665e-01  1.46821439e-01  1.73396602e-01 -1.74582347e-01
 -3.15702260e-01  6.72999769e-02  2.17250243e-01  9.78535116e-02
 -1.29472464e-01 -1.86929435e-01  1.34878129e-01 -1.53885290e-01
  7.44715557e-02 -1.85536250e-01 -2.80628383e-01 -1.14144213e-01
  4.12249625e-01  6.39491975e-02 -1.45715117e-01 -9.82065052e-02
 -1.33081883e-01 -1.88410461e-01 -2.84838937e-02 -3.49510163e-02
  3.34258713e-02  6.98896796e-02  1.90354511e-01 -2.96724051e-01
  2.64706067e-03  1.09140947e-01  1.70892701e-02  2.60589242e-01
  3.29038620e-01 -6.61560148e-02  2.39665717e-01 -2.26194620e-01
 -3.36869545e-02  1.49400130e-01 -3.21265638e-01 -2.68577904e-01
  5.72632015e-01 -4.92308497e-01  2.00666577e-01 -3.49261820e-01
 -2.89886612e-02  6.09010458e-01 -5.72333157e-01  2.35000670e-01
  6.47180574e-03 -3.14952508e-02  2.78108083e-02 -3.90340954e-01
 -2.08950117e-01 -3.04452837e-01 -7.20199272e-02 -8.29840004e-02
  3.73792857e-01  7.38937110e-02 -2.21076086e-02  9.88139287e-02
 -1.51426882e-01 -1.40430734e-01  2.26017952e-01  2.76089966e-01
 -8.87747630e-02 -1.12816028e-01 -2.66286045e-01  2.77834296e-01
 -4.75609973e-02  6.71005547e-02 -2.78584175e-02 -2.39991937e-02
  2.51708686e-01  4.68793
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值