推荐系统 | 学习笔记：Field-aware Factorization Machines for CTR Prediction

本文链接：https://blog.youkuaiyun.com/cat_xing/article/details/88757788

本文介绍了Field-aware Factorization Machines（FFMs）在点击率预测中的应用，通过对比实验展示了FFMs相较于Poly2和FM的优势。FFMs利用字段信息提升模型效果，对于特定分类问题非常有用。实验部分详细讨论了参数影响、早停策略以及并行化实现，证实FFMs在某些数据集上表现出色，但也存在对数值特征处理的挑战。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

ABSTRACT

First, we propose efficient implementations for training
FFMs.
Then we comprehensively analyze FFMs and compare
this approach with competing models. Experiments
show that FFMs are very useful for certain classification
problems.
Finally, we have released a package of FFMs for
public use.

1. INTRODUCTION

Code used for experiments in this paper and the package LIBFFM are respectively available at:
http://www.csie.ntu.edu.tw/˜cjlin/ffm/exps
http://www.csie.ntu.edu.tw/˜cjlin/libffm

2. POLY2 AND FM

FMs can be better than Poly2 when the data set is sparse

3. FFM

In FMs, every feature has only one latent vector to learn the latent effect with any other features, however, in FFMs, each feature has several latent vectors.
usually,
$k_{FFM} << k_{FM}$

3.1 Solving the Optimization Problem

在这里插入图片描述

3.2 Parallelization on Shared-memory Systems

In Section 4.4 we run extensive experiments to investigate the effectiveness of parallelization.

3.3 Adding Field Information

在这里插入图片描述

Categorical Features

在这里插入图片描述

Numerical Features

在这里插入图片描述

Single-field Features

在这里插入图片描述

4. EXPERIMENTS

we first provide the details about the experimental setting in Section 4.1.
Then, we investigate the impact of parameters.
in Section 4.3, we discuss this issue（FFM is sensitive to the number of epochs） in detail before proposing an early stopping trick.
The speedup of parallelization is studied in Section 4.4
in Sections 4.5-4.6, we compare FFMs with other models including Poly2
and FMs.

4.1 Experiment Settings

Data Sets

在这里插入图片描述

Platform

Evaluation

Implementation

use SSE instructions to boost the efficiency of inner products
The parallelization discussed in Section 3.2 is implemented by OpenMP

4.2 Impact of Parameters

k does not affect the logloss much
If λ is too large, the model is not able to achieve a good performance. On the contrary, with a small λ, the model gets better results, but it easily over-
fits the data.
if we apply a small η, FFMs will obtain its best performance slowly. with a large η, FFMs are able to quickly reduce the logloss, but then over-fitting occurs.

4.3 Early Stopping

4.4 Speedup

4.5 Comparison with LMs, Poly2, and FMs on Two CTR Competition Data Sets

FFMs outperform other models in terms of logloss, but it also requires
longer training time than LMs and FMs.
though the logloss of LMs is worse than other models, it is significantly faster.
Poly2 is the slowest among all models
FM is a good balance between logloss and speed.

4.6 Comparison on More Data Sets

When a data set contains only numerical features, FFMs may not have an obvious advantage
If we use dummy fields, then FFMs do not out-perform FMs, a result indicating that the field information is not helpful.
On the other hand, if we discretize numerical features, though FFMs is the best among all models, the performance is much worse than that of using dummy fields.

FFMs should be effective for data sets that contain categorical features and are transformed to binary features.
If the transformed set is not sparse enough, FFMs seem to bring less benefit.
It is more difficult to apply FFMs on numerical data sets.