recommendation anytime, anywhere @ Hulu 笔记

最新推荐文章于 2024-09-24 15:47:10 发布

原创最新推荐文章于 2024-09-24 15:47:10 发布 · 743 阅读

0 ·

CC 4.0 BY-SA版权

Hulu通过改进推荐算法解决协同过滤的问题，包括去除不相关因素的影响，并引入基于用户行为的主题模型。他们考虑了节目的元数据、共同观看比例及点击率等因素，并通过机器学习融合这些因素。

hulu关于related show的演化可以由下面的一张图表现出来：

xiangliang在这次报到中提到了单纯使用CF会造成如下问题（直接从ppt中摘录）

CF works in most cases. However, given a fact that users who watch show A also watch B,
other than the relevance between A and B, there could be lots of reasons:

1. A and B release new videos in the same day;
2. B is very popular;
3. A and B are new released shows;
4. A and B do not have same life cycle;
5. A and B are often shown in homepage;
We need to remove such effects when calculating show relevance through CF.

有个关于相关性和CTR（点击率）的trade-off：

如果选择直接优化CTR，会造成推荐产品相关度不高，一般流行的items，或者在用户访问的那个页面上新的items会获得很高的CTR.

所以现在的CTR只是模型的一个因子而已。

Hulu他们还发现一些有关孩子的shows会和一些新闻的shows联系在一起，后来他们认为是因为父母和孩子共享了一个账号，

所以他们发明了一种新的topic model类型：基于用户行为的topic model.

他们发现show A 和 show B 的相关性受一下因素的影响：

1. Metadata of show A and show B
2. How much percentage of users watch both A and B
3. CTR of show B on show A's page

他们使用机器学习的方法来融合这所有的factor.

他们使用了如下一个框架：

使用CF和Content方法算出每两部电影的相似度，然后每个fator都有他们的权重，然后再由领域专家打个label(具体是不是没两部电影之间都要用专家打label就不知道了)

然后他们使用ensemble的方法来训练，ensemble的方法：

We have tried
Logistic Regression
Decision Tree (C4.5, Random Forest, Boosting Tree)
And
Decision Tree performs much better than Logistic Regression
Because
This is a non-linear learning problem