Building recommender systems with Azure Machine Learning service

本文介绍推荐系统的两大类型:协同过滤和基于内容的过滤,并提供了一套GitHub资源库,内含Python最佳实践示例,用于利用Azure Machine Learning服务构建和评估推荐系统。涵盖数据准备、模型构建、评估、调优及部署等关键任务。

Posted on May 1, 2019

Heather Spetalnick Program Manager, ML Platform

Title card: Building recommendation systems locally and in the cloud with Azure Machine Learning Service.

 

Recommendation systems are used in a variety of industries, from retail to news and media. If you’ve ever used a streaming service or ecommerce site that has surfaced recommendations for you based on what you’ve previously watched or purchased, you’ve interacted with a recommendation system. With the availability of large amounts of data, many businesses are turning to recommendation systems as a critical revenue driver. However, finding the right recommender algorithms can be very time consuming for data scientists. This is why Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.

What is a recommendation system?

There are two main types of recommendation systems: collaborative filtering and content-based filtering. Collaborative filtering (commonly used in e-commerce scenarios), identifies interactions between users and the items they rate in order to recommend new items they have not seen before. Content-based filtering (commonly used by streaming services) identifies features about users’ profiles or item descriptions to make recommendations for new content. These approaches can also be combined for a hybrid approach.

Recommender systems keep customers on a businesses’ site longer, they interact with more products/content, and it suggests products or content a customer is likely to purchase or engage with as a store sales associate might. Below, we’ll show you what this repository is, and how it eases pain points for data scientists building and implementing recommender systems.

Easing the process for data scientists

The recommender algorithm GitHub repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Data preparation - Preparing and loading data for each recommender algorithm
  • Modeling - Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM)
  • Evaluating - Evaluating algorithms with offline metrics
  • Model selection and optimization - Tuning and optimizing hyperparameters for recommender models
  • Operationalizing - Operationalizing models in a production environment on Azure

Several utilities are provided in reco utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in an organization or data scientists’ own applications.
In the image below, you’ll find a list of recommender algorithms available in the repository. We’re always adding more recommender algorithms, so go to the GitHub repository to see the most up-to-date list.

  

Let’s take a closer look at how the recommender repository addresses data scientists’ pain points.

  1. It’s time consuming to evaluate different options for recommender algorithms

    • One of the key benefits of the recommender GitHub repository is that it provides a set of options and shows which algorithms are best for solving certain types of problems. It also provides a rough framework for how to switch between different algorithms. If model performance accuracy isn’t enough, an algorithm better suited for real-time results is needed, or the originally chosen algorithm isn’t the best fit for the type of data being used, a data scientist may want to switch to a different algorithm.
  2. Choosing, understanding, and implementing newer models for recommender systems can be costly

    • Selecting the right recommender algorithm from scratch and implementing new models for recommender systems can be costly as they require ample time for training and testing as well as large amounts of compute power. The recommender GitHub repository streamlines the selection process, reducing costs by saving data scientists time in testing many algorithms that are not a good fit for their projects/scenarios. This, coupled with Azure’s various pricing options, reduces data scientists’ costs on testing and organization’s costs in deployment.
  3. Implementing more state-of-the-art algorithms can appear daunting

    • When asked to build a recommender system, data scientists will often turn to more commonly known algorithms to alleviate the time and costs needed to choose and test more state-of-the-art algorithms, even if these more advanced algorithms may be a better fit for the project/data set. The recommender GitHub repository provides a library of well-known and state-of-the-art recommender algorithms that best fit certain scenarios. It also provides best practices that, when followed, make implementing more state-of-the-art algorithms easier to approach.
  4. Data scientists are unfamiliar with how to use Azure Machine Learning service to train, test, optimize, and deploy recommender algorithms

    • Finally, the recommender GitHub repository provides best practices for how to train, test, optimize, and deploy recommender models on Azure and Azure Machine Learning (Azure ML) service. In fact, there are several notebooks available on how to run the recommender algorithms in the repository on Azure ML service. Data scientists can also take any notebook that has already been created and submit it to Azure with minimal or no changes.

Azure ML can be used intensively across various notebooks for tasks relating to AI model development, such as:

  • Hyperparameter tuning
  • Tracking and monitoring metrics to enhance the model creation process
  • Scaling up and out on compute like DSVM and Azure ML Compute
  • Deploying a web service to Azure Kubernetes Service
  • Submitting pipelines

Learn more

Utilize the GitHub repository for your own recommender systems.

Learn more about the Azure Machine Learning service.

Get started with a free trial of Azure Machine Learning service.

### 关于 Wide & Deep Learning for Recommender Systems 的出处 Wide & Deep 学习框架首次被提出是在由 Google 发表的一篇重要论文中,这篇论文名为《Wide & Deep Learning for Recommender Systems》[^1]。此论文主要探讨了一种新的机器学习架构——宽深网络(Wide & Deep Network),其目的是为了在推荐系统中更好地平衡记忆化(Memorization)和泛化能力(Generalization)。这种架构通过联合训练广度模型(Wide Model)和深度神经网络模型(Deep Neural Network, DNN Model),从而实现了短期收益最大化的同时兼顾长期探索的可能性。 具体而言,Wide 部分负责处理高频特征及其组合的记忆功能,而 Deep 部分则专注于捕捉低频甚至未见过的特征组合,以此增强系统的泛化性能[^4]。这一设计使得该模型特别适合应用于数据稀疏性较高的场景下,例如个性化推荐服务等领域。 因此,《Wide & Deep Learning for Recommender Systems》可以被认为是原始文献的主要来源之一,并且已经被广泛认可为推荐算法研究领域内的经典之作。 ```python # 示例代码片段用于说明如何实现基本的 Wide&Deep 架构 import tensorflow as tf def build_wide_deep_model(wide_columns, deep_columns): feature_columns = wide_columns + deep_columns # 定义输入层 inputs = { col.name: tf.keras.layers.Input(name=col.name, shape=(1,), dtype=tf.float32) for col in feature_columns } # 广度部分 (线性模型) wide = tf.feature_column.input_layer(inputs, wide_columns) # 深度部分 (DNN 模型) deep = tf.feature_column.input_layer(inputs, deep_columns) for units in [128, 64]: deep = tf.keras.layers.Dense(units)(deep) deep = tf.keras.layers.BatchNormalization()(deep) deep = tf.keras.layers.ReLU()(deep) # 合并两部分输出 combined_output = tf.keras.layers.concatenate([wide, deep]) output = tf.keras.layers.Dense(1, activation='sigmoid')(combined_output) model = tf.keras.Model(inputs, output) return model ``` #### 参考上述内容总结得出结论: 综上所述,《Wide & Deep Learning for Recommender Systems》一文出自谷歌团队的研究成果,它是当前许多实际生产环境中所采用的技术基础理论依据。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值