Building recommender systems with Azure Machine Learning service

最新推荐文章于 2024-07-06 10:32:05 发布

转载最新推荐文章于 2024-07-06 10:32:05 发布 · 368 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://azure.microsoft.com/en-us/blog/building-recommender-systems-with-azure-machine-learning-service/

文章标签：

#Data Science #AI #ML #Recommend system

推荐系统专栏收录该内容

2 篇文章

订阅专栏

本文介绍推荐系统的两大类型：协同过滤和基于内容的过滤，并提供了一套GitHub资源库，内含Python最佳实践示例，用于利用Azure Machine Learning服务构建和评估推荐系统。涵盖数据准备、模型构建、评估、调优及部署等关键任务。

Posted on May 1, 2019

Heather Spetalnick Program Manager, ML Platform

Title card: Building recommendation systems locally and in the cloud with Azure Machine Learning Service.

Recommendation systems are used in a variety of industries, from retail to news and media. If you’ve ever used a streaming service or ecommerce site that has surfaced recommendations for you based on what you’ve previously watched or purchased, you’ve interacted with a recommendation system. With the availability of large amounts of data, many businesses are turning to recommendation systems as a critical revenue driver. However, finding the right recommender algorithms can be very time consuming for data scientists. This is why Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.

What is a recommendation system?

There are two main types of recommendation systems: collaborative filtering and content-based filtering. Collaborative filtering (commonly used in e-commerce scenarios), identifies interactions between users and the items they rate in order to recommend new items they have not seen before. Content-based filtering (commonly used by streaming services) identifies features about users’ profiles or item descriptions to make recommendations for new content. These approaches can also be combined for a hybrid approach.

Recommender systems keep customers on a businesses’ site longer, they interact with more products/content, and it suggests products or content a customer is likely to purchase or engage with as a store sales associate might. Below, we’ll show you what this repository is, and how it eases pain points for data scientists building and implementing recommender systems.

Easing the process for data scientists

The recommender algorithm GitHub repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

Data preparation - Preparing and loading data for each recommender algorithm
Modeling - Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM)
Evaluating - Evaluating algorithms with offline metrics
Model selection and optimization - Tuning and optimizing hyperparameters for recommender models
Operationalizing - Operationalizing models in a production environment on Azure

Several utilities are provided in reco utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in an organization or data scientists’ own applications.
In the image below, you’ll find a list of recommender algorithms available in the repository. We’re always adding more recommender algorithms, so go to the GitHub repository to see the most up-to-date list.

Let’s take a closer look at how the recommender repository addresses data scientists’ pain points.

It’s time consuming to evaluate different options for recommender algorithms
- One of the key benefits of the recommender GitHub repository is that it provides a set of options and shows which algorithms are best for solving certain types of problems. It also provides a rough framework for how to switch between different algorithms. If model performance accuracy isn’t enough, an algorithm better suited for real-time results is needed, or the originally chosen algorithm isn’t the best fit for the type of data being used, a data scientist may want to switch to a different algorithm.
Choosing, understanding, and implementing newer models for recommender systems can be costly
- Selecting the right recommender algorithm from scratch and implementing new models for recommender systems can be costly as they require ample time for training and testing as well as large amounts of compute power. The recommender GitHub repository streamlines the selection process, reducing costs by saving data scientists time in testing many algorithms that are not a good fit for their projects/scenarios. This, coupled with Azure’s various pricing options, reduces data scientists’ costs on testing and organization’s costs in deployment.
Implementing more state-of-the-art algorithms can appear daunting
- When asked to build a recommender system, data scientists will often turn to more commonly known algorithms to alleviate the time and costs needed to choose and test more state-of-the-art algorithms, even if these more advanced algorithms may be a better fit for the project/data set. The recommender GitHub repository provides a library of well-known and state-of-the-art recommender algorithms that best fit certain scenarios. It also provides best practices that, when followed, make implementing more state-of-the-art algorithms easier to approach.
Data scientists are unfamiliar with how to use Azure Machine Learning service to train, test, optimize, and deploy recommender algorithms
- Finally, the recommender GitHub repository provides best practices for how to train, test, optimize, and deploy recommender models on Azure and Azure Machine Learning (Azure ML) service. In fact, there are several notebooks available on how to run the recommender algorithms in the repository on Azure ML service. Data scientists can also take any notebook that has already been created and submit it to Azure with minimal or no changes.