ABSTRACT
- First, we propose efficient implementations for training
FFMs. - Then we comprehensively analyze FFMs and compare
this approach with competing models. Experiments
show that FFMs are very useful for certain classification
problems. - Finally, we have released a package of FFMs for
public use.
1. INTRODUCTION
Code used for experiments in this paper and the package LIBFFM are respectively available at:
http://www.csie.ntu.edu.tw/˜cjlin/ffm/exps
http://www.csie.ntu.edu.tw/˜cjlin/libffm
2. POLY2 AND FM
FMs can be better than Poly2 when the data set is sparse
3. FFM
-
In FMs, every feature has only one latent vector to learn the latent effect with any other features, however, in FFMs, each feature has several latent vectors.
-
-
usually,
k F F M < < k F M k_{FFM} << k_{FM} kFFM<<kFM
3.1 Solving the Optimization Problem
3.2 Parallelization on Shared-memory Systems
In Section 4.4 we run extensive experiments to investigate the effectiveness of parallelization.
3.3 Adding Field Information
Categorical Features
Numerical Features
Single-field Features
4. EXPERIMENTS
- we first provide the details about the experimental setting in Section 4.1.
- Then, we investigate the impact of parameters.
- in Section 4.3, we discuss this issue(FFM is sensitive to the number of epochs) in detail before proposing an early stopping trick.
- The speedup of parallelization is studied in Section 4.4
- in Sections 4.5-4.6, we compare FFMs with other models including Poly2
and FMs.
4.1 Experiment Settings
Data Sets
Platform
Evaluation
Implementation
- use SSE instructions to boost the efficiency of inner products
- The parallelization discussed in Section 3.2 is implemented by OpenMP
4.2 Impact of Parameters
- k does not affect the logloss much
- If λ is too large, the model is not able to achieve a good performance. On the contrary, with a small λ, the model gets better results, but it easily over-
fits the data. - if we apply a small η, FFMs will obtain its best performance slowly. with a large η, FFMs are able to quickly reduce the logloss, but then over-fitting occurs.
4.3 Early Stopping
4.4 Speedup
4.5 Comparison with LMs, Poly2, and FMs on Two CTR Competition Data Sets
- FFMs outperform other models in terms of logloss, but it also requires
longer training time than LMs and FMs. - though the logloss of LMs is worse than other models, it is significantly faster.
- Poly2 is the slowest among all models
- FM is a good balance between logloss and speed.
4.6 Comparison on More Data Sets
- When a data set contains only numerical features, FFMs may not have an obvious advantage
- If we use dummy fields, then FFMs do not out-perform FMs, a result indicating that the field information is not helpful.
- On the other hand, if we discretize numerical features, though FFMs is the best among all models, the performance is much worse than that of using dummy fields.
- FFMs should be effective for data sets that contain categorical features and are transformed to binary features.
- If the transformed set is not sparse enough, FFMs seem to bring less benefit.
- It is more difficult to apply FFMs on numerical data sets.