TransE算法原理与案例
TransE
知识图谱基础
三元组(h,r,t)
知识表示
即将实体和关系向量化,embedding
算法描述
思想:一个正确的三元组的embedding会满足:h+r=t
定义距离d表示向量之间的距离,一般取L1或者L2,期望正确的三元组的距离越小越好,而错误的三元组的距离越大越好。为此给出目标函数为:
梯度求解:
代码分析
- 定义类:
参数:
目标函数的常数——margin 学习率——learningRate 向量维度——dim 实体列表——entityList(读取文本文件,实体+id) 关系列表——relationList(读取文本文件,关系 + id) 三元关系列表——tripleList(读取文本文件,实体 + 实体 + 关系) 损失值——loss
距离公式——L1
- 向量初始化
规定初始化维度和取值范围(TransE算法原理中的取值范围)
涉及的函数:
init:随机生成值
norm:归一化
- 训练向量
getSample——随机选取部分三元关系,Sbatch
getCorruptedTriplet(sbatch)——随机替换三元组的实体,h、t中任意一个被替换,但不同时替换。
update——更新
L2更新向量的推导过程:
python 函数
uniform(a, b)#随机生成a,b之间的数,左闭右开。
求向量的模,var = linalg.norm(list)
"""
@version: 3.7
@author: jiayalu
@file: trainTransE.py
@time: 22/08/2019 10:56
@description: 用于对知识图谱中的实体、关系基于TransE算法训练获取向量
数据:三元关系
实体id和关系id
结果为:两个文本文件,即entityVector.txt和relationVector.txt 实体 [array向量]
“”"
from random import uniform, sample
from numpy import *
from copy import deepcopy
class TransE:
def init(self, entityList, relationList, tripleList, margin = 1, learingRate = 0.00001, dim = 10, L1 = True):
self.margin = margin
self.learingRate = learingRate
self.dim = dim#向量维度
self.entityList = entityList#一开始,entityList是entity的list;初始化后,变为字典,key是entity,values是其向量(使用narray)。
self.relationList = relationList#理由同上
self.tripleList = tripleList#理由同上
self.loss = 0
self.L1 = L1
<span class="token keyword">def</span> <span class="token function">initialize</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token triple-quoted-string string">'''
初始化向量
'''</span>
entityVectorList <span class="token operator">=</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
relationVectorList <span class="token operator">=</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
<span class="token keyword">for</span> entity <span class="token keyword">in</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">:</span>
n <span class="token operator">=</span> <span class="token number">0</span>
entityVector <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">while</span> n <span class="token operator"><</span> self<span class="token punctuation">.</span>dim<span class="token punctuation">:</span>
ram <span class="token operator">=</span> init<span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token comment">#初始化的范围</span>
entityVector<span class="token punctuation">.</span>append<span class="token punctuation">(</span>ram<span class="token punctuation">)</span>
n <span class="token operator">+=</span> <span class="token number">1</span>
entityVector <span class="token operator">=</span> norm<span class="token punctuation">(</span>entityVector<span class="token punctuation">)</span><span class="token comment">#归一化</span>
entityVectorList<span class="token punctuation">[</span>entity<span class="token punctuation">]</span> <span class="token operator">=</span> entityVector
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"entityVector初始化完成,数量是%d"</span><span class="token operator">%</span><span class="token builtin">len</span><span class="token punctuation">(</span>entityVectorList<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> relation <span class="token keyword">in</span> self<span class="token punctuation">.</span> relationList<span class="token punctuation">:</span>
n <span class="token operator">=</span> <span class="token number">0</span>
relationVector <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">while</span> n <span class="token operator"><</span> self<span class="token punctuation">.</span>dim<span class="token punctuation">:</span>
ram <span class="token operator">=</span> init<span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token comment">#初始化的范围</span>
relationVector<span class="token punctuation">.</span>append<span class="token punctuation">(</span>ram<span class="token punctuation">)</span>
n <span class="token operator">+=</span> <span class="token number">1</span>
relationVector <span class="token operator">=</span> norm<span class="token punctuation">(</span>relationVector<span class="token punctuation">)</span><span class="token comment">#归一化</span>
relationVectorList<span class="token punctuation">[</span>relation<span class="token punctuation">]</span> <span class="token operator">=</span> relationVector
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"relationVectorList初始化完成,数量是%d"</span><span class="token operator">%</span><span class="token builtin">len</span><span class="token punctuation">(</span>relationVectorList<span class="token punctuation">)</span><span class="token punctuation">)</span>
self<span class="token punctuation">.</span>entityList <span class="token operator">=</span> entityVectorList
self<span class="token punctuation">.</span>relationList <span class="token operator">=</span> relationVectorList
<span class="token keyword">def</span> <span class="token function">transE</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> cI <span class="token operator">=</span> <span class="token number">20</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"训练开始"</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> cycleIndex <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>cI<span class="token punctuation">)</span><span class="token punctuation">:</span>
Sbatch <span class="token operator">=</span> self<span class="token punctuation">.</span>getSample<span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">)</span>
Tbatch <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token comment">#元组对(原三元组,打碎的三元组)的列表 :{((h,r,t),(h',r,t'))}</span>
<span class="token keyword">for</span> sbatch <span class="token keyword">in</span> Sbatch<span class="token punctuation">:</span>
tripletWithCorruptedTriplet <span class="token operator">=</span> <span class="token punctuation">(</span>sbatch<span class="token punctuation">,</span> self<span class="token punctuation">.</span>getCorruptedTriplet<span class="token punctuation">(</span>sbatch<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token comment"># print(tripletWithCorruptedTriplet)</span>
<span class="token keyword">if</span><span class="token punctuation">(</span>tripletWithCorruptedTriplet <span class="token operator">not</span> <span class="token keyword">in</span> Tbatch<span class="token punctuation">)</span><span class="token punctuation">:</span>
Tbatch<span class="token punctuation">.</span>append<span class="token punctuation">(</span>tripletWithCorruptedTriplet<span class="token punctuation">)</span>
self<span class="token punctuation">.</span>update<span class="token punctuation">(</span>Tbatch<span class="token punctuation">)</span>
<span class="token keyword">if</span> cycleIndex <span class="token operator">%</span> <span class="token number">100</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">:</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"第%d次循环"</span><span class="token operator">%</span>cycleIndex<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>loss<span class="token punctuation">)</span>
self<span class="token punctuation">.</span>writeRelationVector<span class="token punctuation">(</span><span class="token string">"E:\pythoncode\knownlageGraph\\transE-master\\relationVector.txt"</span><span class="token punctuation">)</span>
self<span class="token punctuation">.</span>writeEntilyVector<span class="token punctuation">(</span><span class="token string">"E:\pythoncode\knownlageGraph\\transE-master\\entityVector.txt"</span><span class="token punctuation">)</span>
self<span class="token punctuation">.</span>loss <span class="token operator">=</span> <span class="token number">0</span>
<span class="token keyword">def</span> <span class="token function">getSample</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> size<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">return</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>tripleList<span class="token punctuation">,</span> size<span class="token punctuation">)</span>
<span class="token keyword">def</span> <span class="token function">getCorruptedTriplet</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> triplet<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token triple-quoted-string string">'''
training triplets with either the head or tail replaced by a random entity (but not both at the same time)
:param triplet:
:return corruptedTriplet:
'''</span>
i <span class="token operator">=</span> uniform<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> i <span class="token operator"><</span> <span class="token number">0</span><span class="token punctuation">:</span> <span class="token comment"># 小于0,打坏三元组的第一项</span>
<span class="token keyword">while</span> <span class="token boolean">True</span><span class="token punctuation">:</span>
entityTemp <span class="token operator">=</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
<span class="token keyword">if</span> entityTemp <span class="token operator">!=</span> triplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
<span class="token keyword">break</span>
corruptedTriplet <span class="token operator">=</span> <span class="token punctuation">(</span>entityTemp<span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span> <span class="token comment"># 大于等于0,打坏三元组的第二项</span>
<span class="token keyword">while</span> <span class="token boolean">True</span><span class="token punctuation">:</span>
entityTemp <span class="token operator">=</span> sample<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
<span class="token keyword">if</span> entityTemp <span class="token operator">!=</span> triplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
<span class="token keyword">break</span>
corruptedTriplet <span class="token operator">=</span> <span class="token punctuation">(</span>triplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> entityTemp<span class="token punctuation">,</span> triplet<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> corruptedTriplet
<span class="token keyword">def</span> <span class="token function">update</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> Tbatch<span class="token punctuation">)</span><span class="token punctuation">:</span>
copyEntityList <span class="token operator">=</span> deepcopy<span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">)</span>
copyRelationList <span class="token operator">=</span> deepcopy<span class="token punctuation">(</span>self<span class="token punctuation">.</span>relationList<span class="token punctuation">)</span>
<span class="token keyword">for</span> tripletWithCorruptedTriplet <span class="token keyword">in</span> Tbatch<span class="token punctuation">:</span>
headEntityVector <span class="token operator">=</span> copyEntityList<span class="token punctuation">[</span>
tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token comment"># tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tuple</span>
tailEntityVector <span class="token operator">=</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
relationVector <span class="token operator">=</span> copyRelationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
headEntityVectorWithCorruptedTriplet <span class="token operator">=</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
tailEntityVectorWithCorruptedTriplet <span class="token operator">=</span> copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
headEntityVectorBeforeBatch <span class="token operator">=</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>
tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token comment"># tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tuple</span>
tailEntityVectorBeforeBatch <span class="token operator">=</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
relationVectorBeforeBatch <span class="token operator">=</span> self<span class="token punctuation">.</span>relationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">=</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">=</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
<span class="token keyword">if</span> self<span class="token punctuation">.</span>L1<span class="token punctuation">:</span>
distTriplet <span class="token operator">=</span> distanceL1<span class="token punctuation">(</span>headEntityVectorBeforeBatch<span class="token punctuation">,</span> tailEntityVectorBeforeBatch<span class="token punctuation">,</span>
relationVectorBeforeBatch<span class="token punctuation">)</span>
distCorruptedTriplet <span class="token operator">=</span> distanceL1<span class="token punctuation">(</span>headEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
tailEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
relationVectorBeforeBatch<span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
distTriplet <span class="token operator">=</span> distanceL2<span class="token punctuation">(</span>headEntityVectorBeforeBatch<span class="token punctuation">,</span> tailEntityVectorBeforeBatch<span class="token punctuation">,</span>
relationVectorBeforeBatch<span class="token punctuation">)</span>
distCorruptedTriplet <span class="token operator">=</span> distanceL2<span class="token punctuation">(</span>headEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
tailEntityVectorWithCorruptedTripletBeforeBatch<span class="token punctuation">,</span>
relationVectorBeforeBatch<span class="token punctuation">)</span>
eg <span class="token operator">=</span> self<span class="token punctuation">.</span>margin <span class="token operator">+</span> distTriplet <span class="token operator">-</span> distCorruptedTriplet
<span class="token keyword">if</span> eg <span class="token operator">></span> <span class="token number">0</span><span class="token punctuation">:</span> <span class="token comment"># [function]+ 是一个取正值的函数</span>
self<span class="token punctuation">.</span>loss <span class="token operator">+=</span> eg
<span class="token keyword">if</span> self<span class="token punctuation">.</span>L1<span class="token punctuation">:</span>
tempPositive <span class="token operator">=</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
tailEntityVectorBeforeBatch <span class="token operator">-</span> headEntityVectorBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
tempNegtative <span class="token operator">=</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
tempPositiveL1 <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
tempNegtativeL1 <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>dim<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token comment"># 不知道有没有pythonic的写法(比如列表推倒或者numpy的函数)?</span>
<span class="token keyword">if</span> tempPositive<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">>=</span> <span class="token number">0</span><span class="token punctuation">:</span>
tempPositiveL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
tempPositiveL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> tempNegtative<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">>=</span> <span class="token number">0</span><span class="token punctuation">:</span>
tempNegtativeL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
tempNegtativeL1<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
tempPositive <span class="token operator">=</span> array<span class="token punctuation">(</span>tempPositiveL1<span class="token punctuation">)</span>
tempNegtative <span class="token operator">=</span> array<span class="token punctuation">(</span>tempNegtativeL1<span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token comment">#根据损失函数的求梯度</span>
tempPositive <span class="token operator">=</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
tailEntityVectorBeforeBatch <span class="token operator">-</span> headEntityVectorBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
tempNegtative <span class="token operator">=</span> <span class="token number">2</span> <span class="token operator">*</span> self<span class="token punctuation">.</span>learingRate <span class="token operator">*</span> <span class="token punctuation">(</span>
tailEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> headEntityVectorWithCorruptedTripletBeforeBatch <span class="token operator">-</span> relationVectorBeforeBatch<span class="token punctuation">)</span>
headEntityVector <span class="token operator">=</span> headEntityVector <span class="token operator">+</span> tempPositive<span class="token comment">#更新向量</span>
tailEntityVector <span class="token operator">=</span> tailEntityVector <span class="token operator">-</span> tempPositive
relationVector <span class="token operator">=</span> relationVector <span class="token operator">+</span> tempPositive <span class="token operator">-</span> tempNegtative
headEntityVectorWithCorruptedTriplet <span class="token operator">=</span> headEntityVectorWithCorruptedTriplet <span class="token operator">-</span> tempNegtative
tailEntityVectorWithCorruptedTriplet <span class="token operator">=</span> tailEntityVectorWithCorruptedTriplet <span class="token operator">+</span> tempNegtative
<span class="token comment"># 只归一化这几个刚更新的向量,而不是按原论文那些一口气全更新了</span>
copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> norm<span class="token punctuation">(</span>headEntityVector<span class="token punctuation">)</span>
copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> norm<span class="token punctuation">(</span>tailEntityVector<span class="token punctuation">)</span>
copyRelationList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> norm<span class="token punctuation">(</span>relationVector<span class="token punctuation">)</span>
copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> norm<span class="token punctuation">(</span>headEntityVectorWithCorruptedTriplet<span class="token punctuation">)</span>
copyEntityList<span class="token punctuation">[</span>tripletWithCorruptedTriplet<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> norm<span class="token punctuation">(</span>tailEntityVectorWithCorruptedTriplet<span class="token punctuation">)</span>
self<span class="token punctuation">.</span>entityList <span class="token operator">=</span> copyEntityList
self<span class="token punctuation">.</span>relationList <span class="token operator">=</span> copyRelationList
<span class="token keyword">def</span> <span class="token function">writeEntilyVector</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token builtin">dir</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"写入实体"</span><span class="token punctuation">)</span>
entityVectorFile <span class="token operator">=</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token builtin">dir</span><span class="token punctuation">,</span> <span class="token string">'w'</span><span class="token punctuation">,</span> encoding<span class="token operator">=</span><span class="token string">"utf-8"</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> entity <span class="token keyword">in</span> self<span class="token punctuation">.</span>entityList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span>entity <span class="token operator">+</span> <span class="token string">" "</span><span class="token punctuation">)</span>
entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>entityList<span class="token punctuation">[</span>entity<span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
entityVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">"\n"</span><span class="token punctuation">)</span>
entityVectorFile<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">def</span> <span class="token function">writeRelationVector</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token builtin">dir</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"写入关系"</span><span class="token punctuation">)</span>
relationVectorFile <span class="token operator">=</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token builtin">dir</span><span class="token punctuation">,</span> <span class="token string">'w'</span><span class="token punctuation">,</span> encoding<span class="token operator">=</span><span class="token string">"utf-8"</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> relation <span class="token keyword">in</span> self<span class="token punctuation">.</span>relationList<span class="token punctuation">.</span>keys<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span>relation <span class="token operator">+</span> <span class="token string">" "</span><span class="token punctuation">)</span>
relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>relationList<span class="token punctuation">[</span>relation<span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
relationVectorFile<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">"\n"</span><span class="token punctuation">)</span>
relationVectorFile<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>
def init(dim):
return uniform(-6/(dim0.5), 6/(dim0.5))
def norm(list):
‘’’
归一化
:param 向量
:return: 向量的平方和的开方后的向量
‘’’
var = linalg.norm(list)
i = 0
while i < len(list):
list[i] = list[i]/var
i += 1
return array(list)
def distanceL1(h, t ,r):
s = h + r - t
sum = fabs(s).sum()
return sum
def distanceL2(h, t, r):
s = h + r - t
sum = (s*s).sum()
return sum
def openDetailsAndId(dir,sp=" "):
idNum = 0
list = []
with open(dir,“r”, encoding=“utf-8”) as file:
lines = file.readlines()
for line in lines:
DetailsAndId = line.strip().split(sp)
list.append(DetailsAndId[0])
idNum += 1
return idNum, list
def openTrain(dir,sp=" "):
num = 0
list = []
with open(dir, “r”, encoding=“utf-8”) as file:
lines = file.readlines()
for line in lines:
triple = line.strip().split(sp)
if(len(triple)<3):
continue
list.append(tuple(triple))
num += 1
return num, list
if name == ‘main’:
dirEntity = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\entity2id.txt”
entityIdNum, entityList = openDetailsAndId(dirEntity)
dirRelation = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\relation2id.txt”
relationIdNum, relationList = openDetailsAndId(dirRelation)
dirTrain = “E:\pythoncode\ZXknownlageGraph\TransEgetvector\train.txt”
tripleNum, tripleList = openTrain(dirTrain)
# print(tripleNum, tripleList)
print(“打开TransE”)
transE = TransE(entityList,relationList,tripleList, margin=1, dim = 128)
print(“TranE初始化”)
transE.initialize()
transE.transE(1500)
transE.writeRelationVector(“E:\pythoncode\ZXknownlageGraph\TransEgetvector\relationVector.txt”)
transE.writeEntilyVector(“E:\pythoncode\ZXknownlageGraph\TransEgetvector\entityVector.txt”)
数据
结果向量












































































这些知识库是为了各种各样的目的建立的,因此很难用到其他系统上面。为了发挥知识库的图(graph)性,也为了得到统计学习(包括机器学习和深度学习)的优势,我们需要将知识库嵌入(embedding)到一个低维空间里(比如10、20、50维)。我们都知道,获得了向量后,就可以运用各种数学工具进行分析。它为许多知识获取任务和下游应用铺平了道路。
总的来说,废话这么多,所谓知识表示学习,就
</dl>

算法伪代码
SGD中的向量更新
代码实现
关于TransE,博客上各种博文漫天飞,对于原理我就不做重复性劳动,只多说一句,TransE是知识表示算法翻译算法系列中的最基础算法,此处还有TransH、TransD等等;个人觉得翻译算法的叫法是不太合适的,translating,叫做平移或者变换算法可能更加符合作者的原本意图,利用向量的平移不变性去做链路预测。了解原理个人觉得以下两篇足够…


文章目录TransE 算法详解算法背景知识图谱是什么知识表示是什么基本思想算法描述梯度参考文献
算法背景
知识图谱是什么
一条知识图谱可以表示为一个三元组(sub,rel,obj)。举个例子:小明的爸爸是大明,表示成三元组是(小明,爸爸,大明)。前者是主体,中间是关系,后者是客体。主体和客体统称为实体(entity)。关…





先看下train2id.txt,大概是这样子:
253 3643 35
438 10640 94
36 13172 18
8484 35 17
406 3869 38
6039 6038 384
5771 8658 50
7111 683 10
7293 9471 61
4312 2557 382
就只有head tail relatio…


paper:Tra…

最新评论
- TransE算法原理与案例
onion: 我也是,我试了FB15K跟WN18都是几百,是因为数据集选的不合适吗,loss最后到多少算合适呢
- 主题模型
qq_39717003: drugdescrib1.csv 想要这个文件
- TransE算法原理与案例
firstelfin: 如博主所说,自己准备一些数据吧,我也没有数据集
- TransE算法原理与案例
Nothing can stop me 、: 你好 请问有咩有拿到数据集呀 1083859574@qq.com如果拿到的话 希望大佬可以分享给我
- TransE算法原理与案例
qq_37983076: 想问一下为什么2范数直接用(h+r-t)的平方表示