FM(factorization Machines)

最新推荐文章于 2021-03-16 22:18:30 发布

Marina-ju

最新推荐文章于 2021-03-16 22:18:30 发布

阅读量245

点赞数

分类专栏：人工智能 paper

本文链接：https://blog.youkuaiyun.com/weixin_43055882/article/details/98049969

版权

人工智能同时被 2 个专栏收录

16 篇文章

订阅专栏

paper

1 篇文章

订阅专栏

《Factorization Machines》paper的阅读笔记，仅为了整理个人思路。

个人觉得FM的本质就是预测值=偏置+权重1单变量+权重2变量之间的相互作用。
偏置和权重都可以是标量，也可以是向量

下面是本人认为重要的文章内容摘抄与翻译，能力有限，水平不足，不喜请绕道。

一 FM的优点

- 能够估计SVM所不能的稀疏矩阵的参数
（FMs allow parameter estimation under very sparse data where SVMs fail）

- FM 具有线性复杂性（相当于SVM中的多项核），能够在原始数据中进行优化，无需像SVM一样依赖支持向量。
（FMs have linear complexity,can be optimized in the primal and do not rely on support vectors like SVMs）

- FM 具有一般性，能够适用于任何真实值的特征向量,能够模拟偏置MF，SVD++，PITF，FPMC等最先进的模型。
（FMs are a general predictor that can work with any real valued feature vector.In contrast to this ,other state-of-the-art factorization models work only on very restricted input data.We will show that just by defining the feature vectors of the input data,FMs can mimic state-of-the- art models like biased MF ,SVD++,PITF,or FPMC.）

二 FM模型的公式

$\hat{y}(x) = w_0 +\sum_{i=1}^{n}w_ix_i + \sum_{i-1}^{n}\sum_{j=i+1}{n}<v_i,v_j>x_ix_j$
$w_0 \in R$ , $w\in R^n$ , $V\in R^{n*k}$ ,
<.,.>是大小为K的两个向量的点积， $<v_i,v_j> = \sum_{f=1}^{k}v_{i,f}.v_{j,f}$

V中的行向量 $v_i$ 代表的是有K个因子的第i个变量。
$\in N_{0}^{+}$ 是定义因子的超参。

（A row $w_i$ within V describes the $i$ -th variable with k factors. $\in N_{0}^{+}$ is a hyperparameter that defines the dimensionality of the factorization）

自由度为2的FM能够捕捉单变量和变量之间相互作用。
(A 2-way FM(degree d = 2) captures all single and pairwise interactions between variables)

$w_0$ 是全局变量
$w_i$ 模拟第i个变量的strength（个人觉得其实就是权重，models the strength of the i-th variable）
$\hat w_{i,j} = <v_i,v_j>$ 模拟第i和第j个变量之间的相互作用。（个人觉得其实就是权重，models the interaction between the i-th and j-th variable）

三 FM模型的表达能力

假设K足够大，对于任何正定矩阵W，存在一个矩阵V满足 $V.V_t$ 。也就是说，如果K的选择足够大，FM便能够表达任意的相互作用向量W。为了使模型具有更好的泛化能力，在稀疏数据集中，通常选在比较小的K,。

（It is well known that for any positive definite matrix W, there exists a matrix V such thta $W=V.V^t$ provided that $k$ is large enough. Nevertheless, in sparse settings,typically a small $k$ shold be chosen because there is not engough data to estimate complex interactions W.Restricting K - and thus the expressiveness of the FM -leads to better generalization and thus improved interaction matrics under sparsity）