Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing论文翻译——中英文对照

SnailTyan

已于 2025-02-10 17:10:17 修改

阅读量160

点赞数

分类专栏：因果推断文章标签： Uplift Model

于 2025-02-10 17:09:06 首次发布

原文链接：https://arxiv.org/abs/2211.15728

版权

因果推断专栏收录该内容

1 篇文章

订阅专栏

声明：作者翻译论文仅为学习，如有侵权请联系作者删除博文，谢谢！

翻译论文汇总：https://github.com/SnailTyan/deep-learning-papers-translation

文章作者：Tyan
博客：noahsnail.com | 优快云 | 简书

Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing

Abstract

Marketing is an important mechanism to increase user engagement and improve platform revenue, and heterogeneous causal learning can help develop more effective strategies. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning
(ML) and operation research (OR) — the first stage predicts the model parameters and they are fed to the optimization in the second stage. However, the error of the predicted parameters in ML cannot be respected and a series of complex mathematical operations in OR lead to the increased accumulative errors. Essentially, the improved precision on the prediction parameters may not have a positive correlation on the final solution due to the side-effect from the decoupled design.

摘要

营销是增加用户参与度和提高平台收入的重要机制，异质因果学习可以帮助开发更有效的策略。营销中的大多数决策问题都可以表述为资源分配问题，并且已经研究了几十年。现有的工作通常将解决方案过程分为两个完全解耦的阶段，即机器学习（ML）和运筹学（OR）——第一阶段预测模型参数，第二阶段将其输入到优化中。然而，机器学习阶段预测参数的误差无法被充分考虑，而在运筹学阶段的一系列复杂数学操作会导致累积误差增加。本质上，由于解耦设计的副作用，预测参数精度的提高可能与最终解决方案没有正相关性。

In this paper, we propose a novel approach for solving resource allocation problems to mitigate the side-effects. Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor. Furthermore, we design a customized loss function that can conduct direct heterogeneous causal learning on the decision factor, an unbiased estimation of which can be guaranteed when the loss converges. As a case study, we apply our approach to two crucial problems in marketing: the binary treatment assignment problem and the budget allocation problem with multiple treatments. Both large-scale simulations and online A/B Tests demonstrate that our approach achieves significant improvement compared with state-of-the-art.

在本文中，我们提出了一种解决资源分配问题的新方法来减轻副作用。我们的关键直觉是引入决策因子来建立机器学习和运筹学之间的桥梁，从而使解决方案可以直接在运筹学中通过仅对决策因子进行排序或比较操作来获得。此外，我们设计了一个定制的损失函数，可以在决策因子上进行直接的异质因果学习，当损失收敛时，可以保证对决策因子的无偏估计。作为案例研究，我们将我们的方法应用于营销中的两个关键问题：二元干预分配问题和具有多种干预的预算分配问题。大规模仿真和在线A/B测试都表明，与最先进的方法相比，我们的方法取得了显著的改进。

Introduction

Marketing is one of the most effective mechanisms for the improvement of user engagement and platform revenue. Thus, various marketing campaigns have been widely deployed in many online Internet platforms. For example, markdowns of perishable products in Freshippo are used to boost sales (Hua et al. 2021), coupons in Taobao Deals can stimulate user activity (Zhang et al. 2021) and incentives in the Kuaishou video platform are offered to improve user retention (Ai et al. 2022).

引言

营销是提高用户参与度和平台收入的最有效机制之一。因此，各种营销活动在许多在线互联网平台上得到了广泛应用。例如，盒马鲜生利用易腐产品的降价来促进销售（Hua et al. 2021），淘宝特价版利用优惠券来刺激用户活跃度（Zhang et al. 2021），快手视频平台提供激励措施来提高用户留存率（Ai et al. 2022）。

Despite the incremental revenue, marketing actions can also incur significant consumption of marketing resources (e.g., budget). Hence, only part of individuals (e.g., shops or goods) may be assigned the marketing treatments due to the limited volume of them. In marketing, such decision-making problems can be formulated as resource allocation problems, and have been investigated for decades.

尽管收入增加，但营销活动也会导致营销资源（例如预算）的大量消耗。因此，由于数量有限，只有部分个人（例如商店或商品）可能被分配到营销策略中。在营销中，此类决策问题可以表述为资源分配问题，并且已经研究了几十年。

Most of existing studies usually use two-phase methods to solve these problems (Ai et al. 2022; Zhao et al. 2019; Du, Lee, and Ghaffarizadeh 2019). As is shown by Fig. 1(a), the first stage is ML, where the (incremental) response of individuals under different treatments is predicted by predictive/uplift models. The second stage is OR and the prediction results in ML are fed as the input to the combinatorial optimization algorithms. Hence, existing works mainly focus on the decoupled optimization for predictive/uplift modeling and combinatorial optimization.

现有的大多数研究通常使用两阶段方法来解决这些问题(Ai et al. 2022; Zhao et al. 2019; Du, Lee, and Ghaffarizadeh 2019)。如图1(a)所示，第一阶段是机器学习，其中通过预测/增益模型预测不同干预下个体的（增益）反应。第二阶段是运筹学，机器学习中的预测结果作为组合优化算法的输入。因此，现有的工作主要集中在预测/增益建模和组合优化的解耦优化上。

Despite the widespread application, there are two major defects in two-phase methods. The first one is that the solution is obtained after conducting many intermediate computations on the prediction results in ML, e.g., the combination of multiple factors or complex mathematical operations in OR. Therefore, the improved precision on the prediction parameters may not have a positive correlation on the final solution. The second is that the errors of model prediction are not respected, and the complex operations performed on prediction results in OR lead to the increased accumulative errors. Due to the accumulative errors, the theoretically optimal algorithm in OR cannot always achieve the practically optimum and is even inferior to heuristic strategies in some scenarios. Therefore, the decoupled optimization for ML and OR cannot induce a global optimization for the original problem.

尽管两阶段方法应用广泛，但其存在两个主要缺陷。第一，在机器学习中，解是在对预测结果进行许多中间计算之后获得的，例如运筹学中多个因素的组合或复杂的数学运算。因此，预测参数精度的提高可能与最终解决方案不具有正相关性。第二，没有重视模型预测的误差，并且在运筹学中对预测结果执行的复杂运算导致累积误差增加。由于累积误差，运筹学中理论上最优的算法并不总是能达到实际最优，在某些情况下甚至不如启发式策略。因此，机器学习和运筹学的解耦优化无法对原始问题产生全局优化。

Instead of two-phase methods, we propose a novel approach for solving resource allocation problems to mitigate the above defects. First of all, we define the decision factor of an algorithm as a factor on which the solution can be directly obtained by performing only the sorting or comparison operations. As is shown by Fig. 1(b), the decision factor derived from OR is taken as the learning objective, and direct heterogeneous causal learning in ML is conducted in our approach. Based on the definition, there is no alternative mathematical operations on the prediction results in OR. Therefore, the ranking performance of one model on the decision factor directly determines the quality of the solution and improving the model can guarantee a better solution. Specifically, the model error can be used to measure the ranking performance instead, which is respected and not enlarged in OR. Hence, the new challenges are how to identify such a decision factor in OR and how to make a direct prediction for it in ML.

为了缓解上述缺陷，我们提出了一种解决资源分配问题的新方法，而不是两阶段方法。首先，我们将算法的决策因子定义为只需执行排序或比较操作即可直接获得解决方案的因子。如图1(b)所示，从运筹学得出的决策因子被视为学习目标，并在我们的方法中进行机器学习中的直接异质因果学习。根据定义，在运筹学中对预测结果没有其他的数学操作。因此，一个模型在决策因子上的排序能力直接决定了解决方案的质量，改进模型可以保证获得更好的解决方案。具体来说，模型误差可以用来衡量排序能力来代替，这在运筹学中受到重视而不是放大模型误差。因此，新的挑战是如何在运筹学中识别这样的决策因子以及如何在机器学习中对其进行直接预测。

Following this idea, we investigate two crucial problems in marketing. The first one is the binary treatment assignment problem. When ignoring the cost incurred by the treatment (the cost-unaware version), the conditional average treatment effect (CATE) can be regarded as the decision factor. The common uplift models to predict CATE include meta-learners (Kunzel et al. 2019; Nie and Wager 2021) and causal forests (Wager and Athey 2018; Athey, Tibshirani, and Wager 2019). The former are composed of multiple base models, and the latter usually combines generalized random forests (GRF) with double machine learning (DML) methods. Different from them, we propose a novel uplift model to make a direct prediction based on neural networks, which achieves good performance in both theory and practice. Despite the incremental revenue, the treatment can also incur different costs. In this cost-aware version, ROI (Return on Investment) of individuals can be regarded as the decision factor, which is calculated by the division of the incremental revenue and the incremental cost. However, most of existing works in causal inference did not involve the treatment cost and cannot apply to such a direct prediction. Although some works (Du, Lee, and Ghaffarizadeh 2019) investigated a similar problem to this, their loss function cannot converge to a stable extreme point in theory. In this paper, we design a convex loss function to guarantee an unbiased estimation of ROI of individuals when the loss converges.

根据这一思路，我们研究了营销中的两个关键问题。第一个是二值干预分配问题。当忽略干预成本(不考虑成本版本)时，条件平均因果效应(CATE)可视为决策因子。预测CATE的常见增益模型包括元学习器(Kunzel et al. 2019; Nie and Wager 2021)和因果森林(Wager and Athey 2018; Athey, Tibshirani, and Wager 2019)。前者由多个基础模型组成，后者通常会结合广义随机森林(GRF)与双重机器学习(DML)方法。不同于这些方法，我们提出了一种新的增益模型，基于神经网络进行直接预测，在理论和实践上都取得了良好的效果。尽管干预可以带来增量收入，但它也可能产生不同的成本。在考虑成本的版本，单个个体的ROI(投资回报率)可视为决策因子，它由增量收益除以增量成本计算得出。然而，现有因果推断的大部分工作都没有涉及干预成本，不能适用于这种直接预测。虽然一些工作(Du, Lee, and Ghaffarizadeh 2019)研究了类似的问题，但它们的损失函数不能收敛到理论上的极值点。在本文中，我们设计了一个凸损失函数，以保证在损失收敛时对个体投资回报率的无偏估计。

As the second case study, we apply our approach to the budget allocation problem with multiple treatments and propose a novel evaluation metric for this problem in this paper. The Lagrange duality is an effective algorithm to solve the budget allocation problem. However, the decision factor in this algorithm contains the Lagrange multiplier that is uncertain and varies much with different budgets. The direct prediction for such a decision factor with all possible Lagrange multipliers is difficult and unrealistic. In this paper, we propose an equivalent algorithm to the Lagrange dual method while the decision factor in this algorithm is determined and irrelevant to the Lagrange multiplier. In addition, the corresponding causal learning model is developed and a direct prediction for the decision factor can be obtained when the customized loss function converges. Finally, we also propose a novel evaluation metric named MT-AUCC to estimate the prediction result, which is similar to Area Under Uplift Curve (AUUC) (Rzepakowski and Jaroszewicz 2010) but involves both multiple treatments and incremental cost.

作为第二个案例研究，我们将我们的方法应用于具有多种干预的预算分配问题，并在本文中为该问题提出了一种新的评估指标。拉格朗日对偶是解决预算分配问题的有效算法。然而，该算法中的决策因子包含不确定且随不同预算变化很大的拉格朗日乘子。直接预测具有所有可能拉格朗日乘子的此类决策因子是困难且不现实的。在本文中，我们提出了一种与拉格朗日对偶方法等价的算法，而该算法中的决策因子是确定的并且与拉格朗日乘子无关。此外，我们还开发了相应的因果学习模型，当定制的损失函数收敛时，可以获得决策因子的直接预测。最后，我们还提出了一种名为MT-AUCC的新型评估指标来估计预测结果，该指标类似于增益曲线下面积(AUUC)(Rzepakowski and Jaroszewicz 2010)，但涉及多种干预和增量成本。

Large-scale simulations and online A/B Tests validate the effectiveness of our approaches. In the offline simulations, we use two real-world datasets collected from random control trials (RCT) in the online advertising/food delivery platforms. Multiple evaluation metrics and online AB tests show that our models and algorithms achieve significant improvement and increase the target reward by over 10% on average compared with state-of-the-art.

大规模仿真和在线A/B测试验证了我们方法的有效性。在离线模拟中，我们使用了从在线广告/食品配送平台的随机对照试验(RCT)中收集的两个真实数据集。多个评估指标和在线AB测试表明，与最先进的方法相比，我们的模型和算法实现了显著的改进，将目标奖励平均提高了10%以上。

Related Work

Two-phase methods. The composition of machine learning (ML) and operation research (OR) is one of the most common approaches to solve the resource allocation problem, which is called two-phase methods in this paper. In the first stage, uplift models are designed to predict the incremental
response of individuals under different treatments. Besides meta-learners (Kunzel et al. 2019; Nie and Wager 2021) and causal forests (Wager and Athey 2018; Athey, Tibshirani, and Wager 2019; Zhao, Fang, and Simchi-Levi 2017; Ai et al. 2022), representation learning (Johansson, Shalit, and Sontag 2016; Shalit, Johansson, and Sontag 2017; Yao et al. 2018) was also developed for uplift modeling. Instead of deriving an unbiased estimator, some works (Betlei, Diemert, and Amini 2021; Kuusisto et al. 2014) proposed a unified framework for learning to rank CATE. As one of the most effective algorithms, the Lagrangian duality is frequently used to solve many decision-making problems of different areas in the second stage. For example, it was developed to solve the budget allocation problem in marketing (Du, Lee, and Ghaffarizadeh 2019; Ai et al. 2022; Zhao et al. 2019) and compute the optimal bidding policy in online advertising (Hao et al. 2020).

Binary Treatment Assignment Problem

We begin with a common marketing scenario, where part of $M$ individuals are selected to receive the marketing action. We adopt the potential outcome framework (Sekhon 2008) to formulate this problem. Let $\in \mathbb{R}^d$ denote the feature vector and $x$ its realization. Despite the incremental revenue, marketing actions can also incur significant costs. Let $Y^r, Y^c$ denote the revenue and the cost respectively, and $y^r, y^c$ its realization. Denote the treatment by
$\in {0, 1}$ and its realization by $t_i$ . Let $Y^r(1), Y^r(0))$ and $Y^c(1), Y^c(0))$ be the corresponding potential outcome when the individual receives the treatment or not. Define $\tau^r(x_i), \tau^c(x_i)$ as the conditional average treatment effect, which can be calculated by

$\tau^{*} (x_i)=E[Y^{*}(1) - Y^{*}(0)|X=x_i], * \in {r,c}$

二值干预分配问题

我们从一个常见的营销场景开始，其中 $M$ 个个体中的一部分被选中接受营销动作。我们采用潜在结果框架(Sekhon 2008)来阐述这个问题。 $\in \mathbb{R}^d$ 表示特征向量， $x$ 表示其实现。尽管营销动作会带来增量收益，但它也会产生巨大的成本。 $Y^r, Y^c$ 分别表示收益和成本， $y^r, y^c$ 表示其实现。干预表示为 $\in {0, 1}$ ， $t_i$ 是其实现。 $Y^r(1), Y^r(0))$ 和 $Y^c(1), Y^c(0))$ 是个体接受或不接受干预时对应的潜在结果。 $\tau^r(x_i), \tau^c(x_i)$ 定义为条件平均因果效应，可以通过下面的公式计算：

$\tau^{*} (x_i)=E[Y^{*}(1) - Y^{*}(0)|X=x_i], * \in {r,c}$

Since most marketing actions have a positive effect on the response of an individual, we have $\tau^r(x_i)>0$ and $\tau^c(x_i)>0$ . The binary treatment assignment problem (BTAP) is to assign the treatment to part of the individuals to maximize the overall revenue on the platform, but requires that the
incremental cost does not exceed a limited budget $B$ . Let $z_i \in {0, 1}$ be the decision variables. Therefore, BTAP can be formulated as the integer programming problem (1).

$\begin{aligned} & max \sum_{i} z_i \tau^r (x_i) \\\\ & s.t. \sum_{i} z_i \tau^c(x_i) \leq B \\\\ & z_i \in {0,1}, \forall i. \end{aligned} \tag {1}$

由于大多数营销活动都会对个体的响应产生积极影响，因此有 $\tau^r(x_i)>0$ 和 $\tau^c(x_i)>0$ 。二元干预分配问题(BTAP)是将干预分配给部分个体，以最大化平台的整体收入，但要求增量成本不超过有限的预算 $B$ 。 $z_i \in {0, 1}$ 为决策变量。因此，BTAP可以表述为整数规划问题(1)。

$\begin{aligned} & max \sum_{i} z_i \tau^r (x_i) \\\\ & s.t. \sum_{i} z_i \tau^c(x_i) \leq B \\\\ & z_i \in {0,1}, \forall i. \end{aligned} \tag {1}$

This problem is an equivalent $0/1$ knapsack problem, which is NP-Hard. Fortunately, the simple Greedy algorithm (Algorithm 1) can achieve excellent performance whose approximation ratio satisfies $\rho \geq 1 − \frac {max_{i}\tau(x_i)} {OPT}$ , where OPT is the optimal solution of the problem (1).

这个问题是一个等价的 $0/1$ 背包问题，是NP难问题。幸运的是，简单的贪心算法(Algorithm 1)可以取得很好的性能，其近似比满足 $\rho \geq 1 − \frac {max_{i}\tau(x_i)} {OPT}$ ，其实OPT是问题1的最优解。

Definition 1. The decision factor of a combinatorial optimization algorithm is defined as a factor on which the final solution can be obtained by performing only the sorting or comparison operations.

定义1。组合优化算法的决策因子定义为一个可以通过仅执行排序或比较操作来获得最终解决方案的因子。

The decision factor is directly to the final solution of an algorithm and will be regarded as the learning objective in this paper. As is shown by Algorithm 1, the factor $\tau^r(x_i)/\tau^c(x_i)$ can be taken as the decision factor, which is called as ROI (Return on Investment) of individual $i$ .

Algorithm 1

决策因子直接关系到一个算法的最终解，本文将其作为学习目标。如算法1所示，因子 $\tau^r(x_i)/\tau^c(x_i)$ 可以作为决策因子，称为个体 $i$ 的ROI(投资回报率)。

Algorithm 1

Cost-unaware Treatment Assignment Problem

When the treatment cost is nonexistent or the same for all the individuals (e.g., a push message), the prediction for ROI of individuals is reduced to the estimation of $\tau^r(x_i)$ . The latter is an important problem in causal inference, and existing studies mainly involve meta-learners and causal forests. In
this paper, we propose a novel uplift model to make a direct prediction for $\tau^r(x_i)$ or the rank of $\tau^r(x_i)$ , which can achieve good performance in both theory and practice.

不考虑干预成本问题

当干预成本不存在或对所有个体都相同(例如，推送消息)时，对个体ROI的预测会减少到对 $\tau^r(x_i)$ 的估计。后者是因果推断中的一个重要问题，现有的研究主要涉及元学习器和因果森林。在本文中，我们提出了一种新颖的增益模型来直接预测 $\tau^r(x_i)$ 或 $\tau^r(x_i)$ 的排序，这在理论和实践上都能取得良好的表现。

Following the above notations, suppose that there is a data set of size $N$ collected from random control trials (RCT) and denote the i-th sample by $(x_i, t_i, y^r_i)$ . Denote the sample size that receive the treatment or not by $N_1$ and $N_0$ , respectively. Let $s_i = \hbar(x_i)$ represent a score to rank $\tau^r(x_i)$ , where $\hbar(x_i)$ can be any machine learning model (e.g., linear regression or neural networks). Minimizing the loss function (2), we can obtain an unbiased estimation of $\tau^r(x_i)$ . Due to the space limit, the detailed analysis is presented by Theorem 1 and its proof in Appendix A (Zhou et al. 2022).

$\begin{aligned} & min L(s) = -(\frac {1} {N_1} \sum_{i|t_i=1} y^r_i ln q_i - \frac {1} {N_0} \sum_{i|t_i=0} y^r_i ln q_i) \\\\ s.t. & q_i = \frac {e^{s_i}} {\sum_{i} e^{s_i}}, \forall i\\\\ & s_i = \hbar(x_i) \in \mathbb{R},\forall i \end{aligned} \tag {2}$

按照上述符号，假设有一个大小为 $N$ 的数据集，它是从随机对照试验(RCT)中收集的，并将第i个样本表示为 $(x_i, t_i, y^r_i)$ 。分别用 $N_1$ 和 $N_0$ 表示接受干预或未接受干预的样本数量。令 $s_i = \hbar(x_i)$ 表示对 $\tau^r(x_i)$ 进行排序的分数，其中 $\hbar(x_i)$ 可以是任何机器学习模型（例如线性回归或神经网络）。最小化损失函数(2)，我们可以得到 $\tau^r(x_i)$ 的无偏估计。由于篇幅限制，定理1详细分析及其证明在附录A中(Zhou et al. 2022)提供。

Theorem 1. When the loss function (2) converges, $s_i$ can be used to rank $\tau^r(x_i)$ and $q_i = \frac {\tau^r(x_i)} {\sum_{i} \tau^r(x_i)}$
can be used to obtain an unbiased estimation of $\tau^r(x_i)$ .

定理1. 当损失函数(2)收敛时， $s_i$ 可用来对 $\tau^r(x_i)$ 进行排序，而 $q_i = \frac {\tau^r(x_i)} {\sum_{i} \tau^r(x_i)}$ 可用来获得 $\tau^r(x_i)$ 的无偏估计。

Cost-aware Treatment Assignment Problem

ROI of individuals is a composite object and most existing works predict it by the combination of multiple models. The latter may cause an enlargement of model errors due to the mathematical operations during combination. Therefore, we propose a novel learning model for direct ROI prediction.

$\begin{aligned} & min L(s) = -[\frac {1} {N_1} \sum_{i|t_i=1} (y^r_i \ln \frac {q_i} {1-q_i} + y^c_i \ln (1-q_i)) - \frac {1} {N_0} \sum_{i|t_i=0} (y^r_i \frac {q_i} {1-q_i} + y^c_i \ln(1-q_i))] \\\\ s.t. & q_i = \sigma(s_i), \forall i\\\\ & s_i = \hbar(x_i) \in \mathbb{R},\forall i \end{aligned} \tag {3}$

考虑成本的干预分配问题

个体ROI是一个复合对象，现有的大多数工作都是通过多个模型的组合来预测ROI，而后者由于组合过程中的数学运算可能会导致模型误差的扩大。因此，我们提出了一种新的直接ROI预测学习模型。

Similarly, suppose that the data set $(x_i, t_i, y^r_i, y^c_i)$ of size $N$ is collected from RCT, and the count of the samples receiving the treatment or not is $N_1$ and $N_0$ , respectively. The division operation may result in the high variance of ROI especially when $\tau^c(x_i)$ is small. Therefore, the range of ROI is limited to $(0, 1)$ by the scaling and truncating operations for $Y^r$ or $Y^c$ to decrease the risk of overfitting. Let $s_i = \hbar(x_i)$ represent a score to rank ROI, where $\hbar(x_i)$ can be any machine learning model. Define $\sigma(\cdot)$ as the sigmoid function. The loss function (3) is designed to obtain an unbiased estimation of ROI or the rank of ROI for each individual. The detailed proof can be found in Theorem 2 and Appendix B (Zhou et al. 2022).

类似地，假设从RCT中收集了大小为 $N$ 的数据集 $(x_i, t_i, y^r_i, y^c_i)$ ，其中接受干预和未接受干预的样本数量分别为 $N_1$ 和 $N_0$ 。除法运算可能导致ROI的方差过大，尤其是在 $\tau^c(x_i)$ 较小的情况下。因此，通过对 $Y^r$ 或 $Y^c$ 进行缩放和截断操作将ROI的范围限制在 $(0, 1)$ 以降低过拟合的风险。令 $s_i = \hbar(x_i)$ 表示对ROI进行排序的分数，其中 $\hbar(x_i)$ 可以是任何机器学习模型。将 $\sigma(\cdot)$ 定义为sigmoid函数。损失函数(3)旨在获得每个个体ROI的无偏估计或ROI排序。详细证明见定理2和附录B(Zhou et al. 2022)。

Theorem 2. The loss function (3) is convex, and $s_i$ can be used to rank ROI and $q_i = \frac {\tau^r(x_i)} {\tau^c(x_i)}$ is an unbiased estimation
of ROI of individual $i$ when the loss converges.

定理2。损失函数(3)是凸函数， $s_i$ 用来对ROI进行排序，损失函数收敛时， $q_i = \frac {\tau^r(x_i)} {\tau^c(x_i)}$ 是个体 $i$ 的ROI无偏估计

Budget Allocation Problem with Multiple Treatments

In this section, we will discuss a more general marketing scenario, which was also introduced in many existing works. Most of them solve this problem by two-phase methods. Different from them, the machine learning model will be designed based on the decision factor to guarantee the consistency between the predictive objective and the optimization objective in this paper.

多种干预情况下的预算分配问题

在本节中，我们将讨论一个更通用的营销场景，该场景也在许多现有工作中介绍过。它们中的大多数通过两阶段方法解决这个问题。与它们不同，本文将基于决策因子设计机器学习模型，以保证预测目标和优化目标的一致性。

Suppose that there are multiples treatments and denote it by $\in {1, 2, ..., L}$ . Let $r_{ij}$ and $c_{ij}$ be the revenue and the cost of individual $i$ receiving treatment $j$ , respectively. In marketing campaigns, multiple treatments usually refers to the different level of the treatment, e.g., the different discount of some products [1], the different count of gold pieces on the online video platform [2], the different price of cinema tickets in different regions [3] and so on. Suppose that $\in T$ represent the level of the marketing treatment and the treatment effect is larger if the level is higher. Therefore, we have $r_{ij} < r_{ik}$ and $c_{ij} < c_{ik}$ if $j < k$ holds. Given a limited budget $B$ , the budget allocation problem with multiple treatments (MTBAP) is to assign a certain treatment to each individual with the objective of optimizing the overall revenue on the platform. Let $z_{ij} \in {0, 1}$ be the decision variable to denote whether to assign treatment $j$ to individual $i$ . Therefore, MTBAP can be formulated as the integer programming (4).

$\begin{aligned} max \sum_{ij} z_{ij}r_{ij} \\\\ s.t. \sum_{ij} z_{ij}c_{ij} \leq B \\\\ \sum_{j} z_{ij}=1, \forall i \\\\ z_{ij} \in {0, 1}, \forall i,j \end{aligned} \tag {4}$

假设有多种干预，记为 $\in {1, 2, ..., L}$ 。令 $r_{ij}$ 和 $c_{ij}$ 分别为个体 $i$ 接受干预 $j$ 的收入和成本。在营销活动中，多种干预通常指处理的不同级别，例如某些产品的不同折扣[1]、在线视频平台上金币的不同数量[2]、不同地区的电影票价格不同[3]等等。假设 $\in T$ 代表营销干预的级别，级别越高，干预效果越大。因此，如果 $j < k$ 成立，则有 $r_{ij} < r_{ik}$ 和 $c_{ij} < c_{ik}$ 。给定有限的预算 $B$ ，具有多种干预的预算分配问题(MTBAP)是向每个个体分配特定的干预，目标是优化平台的整体收益。令 $z_{ij} \in {0, 1}$ 为决策变量，表示是否将干预 $j$ 分配给个体 $i$ 。因此，MTBAP可以表述为整数规划(4)。

$\begin{aligned} max \sum_{ij} z_{ij}r_{ij} \\\\ s.t. \sum_{ij} z_{ij}c_{ij} \leq B \\\\ \sum_{j} z_{ij}=1, \forall i \\\\ z_{ij} \in {0, 1}, \forall i,j \end{aligned} \tag {4}$

Combinatorial Optimization Algorithm

This problem is a classical multiple choice knapsack problem and remains NP-Hard. Existing studies usually solve this problem by using Lagrangian duality theory. Specifically, the upper bound of the original problem (4) can be obtained by solving the following dual problem (5).

$\underset {\alpha \geq 0} {\mathrm{min}} \begin{pmatrix} \mathrm{max} \: \alpha B + \sum_{ij} z_{ij}(r_{ij}-\alpha c_{ij}) \\\\ s.t. \: \sum_{j} z_{ij}=1,\forall j \\\\ z_{ij} \in {0,1},\forall i,j \end{pmatrix} = \underset {\alpha \geq 0} {\mathrm{min}} + (\alpha B + \sum_i \underset {j} {max}(r_{ij}-\alpha c_{ij})) \tag {5}$

组合优化算法

该问题是经典的多选择背包问题，仍然是NP-Hard问题。现有研究通常利用拉格朗日对偶理论来解决该问题。具体而言，原问题(4)的上界可以通过求解下面的对偶问题(5)得到。

The optimal $\alpha^*$ for the dual problem can be derived by using the decreasing gradient algorithm or a binary search for $\alpha$ with the terminal condition of $\sum_{ij}z_{ij} c_{ij} \leq \epsilon$ . Given the optimal $\alpha^*$ , an approximation solution for the original problem (4) is

$\forall i,j, \: z_{ij}=1 \Longleftrightarrow j = \mathrm{arg} \underset {j} {\mathrm{max}} \: r_{ij} - \alpha^*c_{ij} \tag {6}$

According to Definition 1, the factor $r_{ij}−c_{ij}$ is a decision factor for the Lagrangian duality algorithm and can be taken as the learning objective. However, the value of $\alpha^*$ depends on the budget $B$ , which is undetermined and varies much with the marketing environment. Hence, there is only a limited number of $\alpha^*$ in the training dataset and it is possible that the optimal $\alpha^*$ used in future campaigns has never been seen by the predictive model before, which dramatically decreases the precision of the model. Therefore, the direct prediction of the decision factor for all possible $\alpha^*$ is difficult and unrealistic, and will not be considered in this paper.

对偶问题的最优 $\alpha^*$ ，可以使用梯度下降算法或二分搜索求得，终止条件为 $\sum_{ij}z_{ij} c_{ij} \leq \epsilon$ 。给定最优 $\alpha^*$ ，原始问题(4)的近似解为

$\forall i,j, \: z_{ij}=1 \Longleftrightarrow j = \mathrm{arg} \underset {j} {\mathrm{max}} \: r_{ij} - \alpha^*c_{ij} \tag {6}$

根据定义1，因子 $r_{ij}−c_{ij}$ 是拉格朗日对偶算法的决策因子，可以作为学习目标。然而， $\alpha^*$ 的值取决于预算 $B$ ，而预算 $B$ 是不确定的，并且随营销环境变化很大。因此，训练数据集中的 $\alpha^*$ 的数量有限，并且将来活动中使用的最优 $\alpha^*$ 可能从未被预测模型见过，这会大大降低模型的精度。因此，直接预测所有可能的 $\alpha^*$ 的决策因子是困难且不现实的，本文将不予考虑。

However, another equivalent optimization algorithm can be derived from the above Lagrangian duality method. Before the details, we present an important assumption in economics, which is called as the Law of Diminishing Marginal Utility (Polleit 2011).

然而，从上述拉格朗日对偶方法可以推导出另一个等价的优化算法。在详细介绍之前，我们先提出经济学中的一个重要假设，即边际效应递减定律(Polleit 2011)。

Assumption 1 (The Law of Diminishing Marginal Utility). The marginal utility of individuals decreases with the increasing investment of the marketing cost. Specifically, denote the marginal utility by $l_{ij}=\frac {r_{ij+1} - r_ij} {c_{ij+1} - c_{ij}}$ and we have $l_{ij}\leq l_{ij-1}, \forall i,j$ .

假设1 边际效应递减定律个体的边际效应随着营销成本的投入增加而减少。具体来说，将边际效应表示为 $l_{ij}=\frac {r_{ij+1} - r_ij} {c_{ij+1} - c_{ij}}$ 我们有 $l_{ij}\leq l_{ij-1}, \forall i,j$ 。

Let $(c_{ij}, r_{ij}) be a point in rectangular coordinates. The value of $r_{ij} − \alpha c_{ij}$ can be regarded as the projection of $(c_{ij}, r_{ij}) on $(−\alpha, 1)$ . Let $KaTeX parse error: Undefined control sequence: \∗ at position 3: j^\̲∗̲$ be the solution of Lagrangian duality methods in Eq. (6), i.e., $KaTeX parse error: Undefined control sequence: \∗ at position 3: j^\̲∗̲ = arg max_{j} …$ . Based on Assumption 1, we can prove that it satisfies $KaTeX parse error: Undefined control sequence: \∗ at position 7: l_{ij^\̲∗̲} \leq \alpha \…$ . Therefore, an equivalent optimization algorithm is obtained in Algorithm 2. Due to the space limit, the formal proof of Theorem 3 can be found in Appendix C (Zhou et al. 2022).

Algorithm 2

令 $c_{ij}, r_{ij})为直角坐标系中一点，$ r_{ij} − \alpha c_{ij} $的值可看作$ (c_{ij}, r_{ij})在 $(−\alpha, 1)$ 上的投影。令 $KaTeX parse error: Undefined control sequence: \∗ at position 3: j^\̲∗̲$ 为式(6)中拉格朗日对偶方法的解，即 $KaTeX parse error: Undefined control sequence: \∗ at position 3: j^\̲∗̲ = arg max_{j} …$ 。基于假设1，可证明其满足 $KaTeX parse error: Undefined control sequence: \∗ at position 7: l_{ij^\̲∗̲} \leq \alpha \…$ 。由此，在算法2中可得到等价的优化算法。限于篇幅，定理3的正式证明可参见附录C (Zhou et al. 2022)。

Algorithm 2

Theorem 3. Algorithm 2 is equivalent to the Lagrangian duality method.

定理3 算法2等价于拉格朗日对偶法。

Notice that the factor $l_{ij}$ in Algorithm 2 can be taken as a decision factor, which is irrelevant to the Lagrangian multiplier $\alpha$ . Therefore, the value of $l_{ij}$ (or implicitly, the rank of $l_{ij}$ for all $i, j$ ) can be taken as the learning objective, which avoids the difficulty of directly predicting $r_{ij} − \alpha c_{ij}$ for all possible $\alpha$ in the Lagrangian duality method.

注意，算法2中的因子 $l_{ij}$ 可以作为决策因子，与拉格朗日乘数 $\alpha$ 无关。因此， $l_{ij}$ 的值(或隐含地， $l_{ij}$ 对所有 $i, j$ 的排序)可以作为学习目标，从而避免了在拉格朗日对偶方法中对于所有可能的 $\alpha$ 直接预测 $r_{ij} − \alpha c_{ij}$ 的困难。

Direct Prediction Model

Based on the above analysis, we propose a novel method for a direct prediction of $l_{ij}$ in this subsection. Suppose that there is a data set $(x_i, t_i, y^c_i, y^r_i)$ of size $N$ collected from RCT, and denote by $N_j$ the count of samples receiving treatment $j$ . Similarly, the range of $l_{ij}$ is limited to $(0, 1)$ by scaling and truncating operations for $Y^r$ or $Y^c$ to reduce the risk of overfitting. Let $s_{ij} = \hbar(x_i, j)$ be the prediction of the rank of $l_{ij}$ , where $\hbar(\cdot)$ is any machine learning model. Hence, minimize the loss function (7) and we can get the unbiased estimation of $l_{ij}$ . The detailed analysis is shown by Theorem 4 and its proof in Appendix D (Zhou et al. 2022).

$\begin{aligned} & \mathrm {min} L(s)=-[\underset {j>1} {\sum} \underset {i|t_i=j} {\sum} \frac {1} {N_j} (q_{ij-1}y^r_i-q^2_{ij-1}y^c_i) - \underset {j<L} {\sum}] \underset {i|t_i=j} {\sum} \frac {1} {N_j} (q_{ij}y^r_i-q^2_{ij}y^c_i)] \\\\ s.t. &q_{ij}=\sigma(s_{ij}), \forall i,j \\\\ \quad & s_{ij}=\hbar(x_i,j) \in \mathbb{R}, \forall i,j. \\\\ \end{aligned} \tag {7}$

直接预测模型

基于以上分析，本节我们提出一种直接预测 $l_{ij}$ 的新方法。假设有一个从RCT中收集的大小为 $N$ 的数据集 $(x_i, t_i, y^c_i, y^r_i)$ ，其中 $N_j$ 表示接受干预 $j$ 的样本数量。类似地，通过对 $Y^r$ 或 $Y^c$ 进行缩放和截断操作将 $l_{ij}$ 的范围限制为 $(0, 1)$ ，以降低过拟合的风险。令 $s_{ij} = \hbar(x_i, j)$ 表示对 $l_{ij}$ 排序的预测，其中 $\hbar(\cdot)$ 是任意机器学习模型。因此，最小化损失函数(7)，我们就可以得到 $l_{ij}$ 的无偏估计。详细分析如定理4所示，其证明在附录D中(Zhou et al. 2022)。

$\begin{aligned} & \mathrm{min} L(s)=-[\underset {j>1} {\sum}] \underset {i|t_i=j} {\sum} \frac {1} {N_j} (q_{ij-1}y^r_i-q^2_{ij-1}y^c_i) - \underset {j<L} {\sum}] \underset {i|t_i=j} {\sum} \frac {1} {N_j} (q_{ij}y^r_i-q^2_{ij}y^c_i) \\\\ s.t. & q_{ij}=\sigma(s_{ij}), \forall i,j\\\\ & s_{ij}=\hbar(x_i,j) \in \mathbb{R}, \forall i,j. \\\\ \end{aligned} \tag {7}$

Theorem 4. When the loss function (7) converges, $s_{ij}$ can be used to rank $l_{ij}$ and $q_{ij}$ can be used to obtain the unbiased estimation of $l_{ij}$ .

定理4. 当损失函数(7)收敛时， $s_{ij}$ 可以用来对 $l_{ij}$ 进行排序，而 $q_{ij}$ 可以用来获得 $l_{ij}$ 的无偏估计。

An Evaluation Metric

Although AUUC (Area Under Uplift Curve) and AUCC (Area Under Cost Curve) (Du, Lee, and Ghaffarizadeh 2019) have been developed to evaluate the ranking performance of uplift models without/with treatment cost respectively, there is no evaluation metric for the estimation of marginal utilities ($l_{ij}) under different treatments. The latter is directly related to the business objective of MTBAP. Therefore, we propose a novel evaluation metric for such a purpose, which is called as MT-AUCC (Area Under Cost Curve for Multiple Treatments) in this paper.

评估指标

尽管已经开发了AUUC(增益曲线下面积)和AUCC(成本曲线下面积)(Du, Lee, and Ghaffarizadeh 2019)来分别评估无/有干预成本的增益模型的排序性能，但没有用于不同干预下边际效应($l_{ij})的评估指标。后者与MTBAP的业务目标直接相关。因此，我们为此目的提出了一种新的评估指标，本文将其称为MT-AUCC(多种干预下的成本曲线下面积)。

Cost Curve for Multiple Treatments. Suppose that there is a model $M = f(x_i, j)$ to predict the value (or the rank) of $l_{ij}$ . Firstly, we obtain two new quintuple sets based on $M$ .
• For each sample $(x_i, t_i, y^c_i, y^r_i)$ with $t_i < L$ , use model $M$ to obtain a score $S_i = f(x_i, t_i)$ and get a new quintuple set $\tilde{T} = {(x_i, t_i, \alpha_{i}y^c_{i}, \alpha_{i}y^r_{i}, S_i)|t_i < L, \alpha_{i} =\frac {N} {N_{t_{i}}}}$ ;
• for each sample $(x_i, t_i, y^c_i, y^r_i)$ with $t_{i} \> 1$ , use this mode again to obtain $S_i = f(x_i, t_i − 1)$ and get a quintuple set
$\tilde{C}={(x_i, t_i, \alpha_{i}y^c_{i}, \alpha_{i}y^r_{i})|t_i\>1, \alpha_i=\frac {N} {N_{t_i}}}$ .

多种干预的成本曲线。假设有一个模型 $M = f(x_i, j)$ 来预测 $l_{ij}$ 的值（或排名）。首先，我们基于 $M$ 获得两个新的五元组集。

对于每个样本 $(x_i, t_i, y^c_i, y^r_i)$ 且 $t_i < L$ ，使用模型 $M$ 获得一个得分 $S_i = f(x_i, t_i)$ 并得到一个新的五元组集 $KaTeX parse error: Expected 'EOF', got '}' at position 111: …N} {N_{t_{i}}}}}̲$ ；
对于每个样本 $(x_i, t_i, y^c_i, y^r_i)$ ，其中 $t_{i} \> 1$ ，再次使用此模式获得 $S_i = f(x_i, t_i − 1)$ 并得到五元组集
$\tilde{C}={(x_i, t_i, \alpha_{i}y^c_{i}, \alpha_{i}y^r_{i})|t_i\>1, \alpha_i=\frac {N} {N_{t_i}}}$ 。

Notice that the weight $\alpha_{i}$ is used to balance the count of samples under different treatments. Next regard $\tilde{T}$ and $\tilde{C}$ as the new treatment group and control group respectively. Sort all the quintuples in $\tilde{T}$ and $\tilde{C}$ in descending order of $S_i$ . For the top $k$ quintuples in this sorted list, denote by $\tilde{T}(k)$ (or $\tilde{C}(k)$ ) the quintuples that belong to $\tilde{T}$ (or $\tilde{C}$ ). Therefore, the incremental cost $\triangle Y^c(k)$ and reward $\triangle Y^r(k)$ can be calculated for each point $\leq k \leq |\tilde{T}| + |\tilde{C}|$ by Eq. (8).
$\triangle Y^*(k)=\frac {k} {|\tilde{T}|+|\tilde{C}|} \left( \frac {\sum _{i \in \tilde{T}(k)} \alpha_i y^*_i} {|\tilde{T}(k)|} - \frac {\sum _{i \in \tilde{T}(C)} \alpha_i y^*_i} {|\tilde{C}(k)|} \right), \mathrm{where} * \in {r, c} \tag {8}$
As is shown in Fig. 2, take the tuple $(\triangle Y^c(k), \triangle Y^r(k))$ as the coordinates and we can get a cost curve.

注意权重 $\alpha_{i}$ 用于平衡不同干预下的样本数量。接下来将 $\tilde{T}$ 和 $\tilde{C}$ 分别视为新的干预组和对照组。将 $\tilde{T}$ 和 $\tilde{C}$ 中的所有五元组按 $S_i$ 的降序排列。对于排序列表中的前 $k$ 个五元组，用 $\tilde{T}(k)$ （或 $\tilde{C}(k)$ ）表示属于 $\tilde{T}$ （或 $\tilde{C}$ ）的五元组。因此，可以通过公式(8)为每个点 $\leq k \leq |\tilde{T}| + |\tilde{C}|$ 计算增量成本 $\triangle Y^c(k)$ 和奖励 $\triangle Y^r(k)$ 。
$\triangle Y^*(k)=\frac {k} {|\tilde{T}|+|\tilde{C}|} \left( \frac {\sum _{i \in \tilde{T}(k)} \alpha_i y^*_i} {|\tilde{T}(k)|} - \frac {\sum _{i \in \tilde{T}(C)} \alpha_i y^*_i} {|\tilde{C}(k)|} \right), \mathrm{where} * \in {r, c} \tag {8}$
如图2所示，以元组 $(\triangle Y^c(k), \triangle Y^r(k))$ 为坐标，可以得到一条成本曲线。

Denote by $\triangle Y_c$ and $\triangle Y_r$ the average incremental cost and reward of all the samples in $\tilde{T}$ and $\tilde{C}$ , which satisfies $\triangle Y_{*} = \triangle Y^*(|\tilde{T}| + |\tilde{C}|)$ . For convenience of calculations, we can also use some points with $p$ percent of $\triangle Y_c$ to draw the cost curve. In addition, the $X$ and $Y$ axis in this curve can be normalized by being divided by $\triangle Y_c$ and $\triangle Y_r$ respectively.

设 $\triangle Y_c$ 和 $\triangle Y_r$ 分别为 $\tilde{T}$ 和 $\tilde{C}$ 中所有样本的平均增量成本和增量收益，满足 $\triangle Y_{*} = \triangle Y^*(|\tilde{T}| + |\tilde{C}|)$ 。为了计算方便，我们也可以用一些 $\triangle Y_c$ 的 $p$ 百分比的点来绘制成本曲线。另外，曲线中的 $X$ 和 $Y$ 轴可以分别除以 $\triangle Y_c$ 和 $\triangle Y_r$ 来归一化。

Area Under Cost Curve for Multiple Treatments (MTAUCC). Similar to AUUC and AUCC, the area under this cost curve can be regarded as an evaluation metric. As is shown by Fig. 2, denote by $A_M$ and $A_R$ the area under a model curve and a random benchmark curve respectively. In order to bound the result within $(0, 1]$ , MT-AUCC of this model is defined as $A_M/2A_R$ .

多种干预下的成本曲线下面积(MTAUCC)。 与AUUC和AUCC类似，此成本曲线下面积可视为评估指标。如图2所示，分别用 $A_M$ 和 $A_R$ 表示模型曲线下面积和随机基准曲线下面积。为了将结果限制在 $(0, 1]$ 内，此模型的MT-AUCC定义为 $A_M/2A_R$ 。

Evaluation

In this section, we will conduct large-scale offline and online numerical experiments to validate the performance of our models and algorithms.

评估

在本节中，我们将进行大规模离线和在线数值实验，以验证我们的模型和算法的性能。

Offline Simulation

Dataset. Two types of datasets are provided in this paper: an open real-world dataset and a marketing dataset collected from an online food delivery platform.

CRITEO-UPLIFT v2. This dataset is provided by the AdTech company Criteo in the AdKDD’18 workshop (Diemert Eustache, Betlei Artem, Renaudin, and Massih-Reza 2018). The data is collected from a random control trial (RCT) that prevents a random part of users from being targeted by advertising. It contains 12 features, 1 binary treatment indicator and 2 response labels (visit/conversion). The (incremental) visit is regarded as the predictive objective for (cost-unaware) uplift modeling. In order to compare the performance of different models to predict ROI of individuals, we take the visit label as the cost and the conversion label as the reward. The whole dataset contains 13.9 million samples, and is randomly partitioned into two parts for train ( $70$ ) and test ( $30$ ), respectively.
Marketing data. Money Off is a common marketing campaign in Meituan, an online food delivery platform. We conduct a four-week RCT in this platform where online shops will offer a random discount every day. Notice that the discount of a shop is the same for all the users to prevent price discrimination but may randomly change in different days, and different shops may offer different discounts. The data in the first two weeks is used for train and the others for test. The discount $\in {0, 1, ..., 6}$ is taken as the treatment, where $T = t$ means $t cash off for each order whose price meets a given threshold. This dataset contains 75 features, 1 treatment label and 2 response labels (daily cost/orders). For the binary treatment assignment problem, we take the samples with $T = 0$ as the control group and the samples with $T > 0$ as the treatment group. For the budget allocation problem with multiple treatments, the budget refers to the whole cost of all the shops and different discounts represent different treatments. This dataset contains 4.1 million samples.

离线仿真

数据集。 本文提供了两种类型的数据集：开放的真实世界数据集和从在线食品配送平台收集的营销数据集。

CRITEO-UPLIFT v2。该数据集由广告技术公司Criteo在 AdKDD’18研讨会(Diemert Eustache, Betlei Artem, Renaudin, and Massih-Reza 2018)上提供。数据来自一项随机对照试验(RCT)，该试验防止随机部分用户成为广告目标。它包含12个特征、1个二元干预指标和2个响应标签（访问/转化）。（增量）访问被视为（不考虑成本的）增益建模的预测目标。为了比较不同模型预测个体投资回报率的性能，我们将访问标签作为成本，将转化标签作为收益。整个数据集包含1390万个样本，并随机分为两部分，分别用于训练 ( $70$ ) 和测试 ( $30$ )。
营销数据。满减是美团（一家在线食品配送平台）的常见营销活动。我们在这个平台上进行了为期四周的RCT，在线商店每天都会提供随机折扣。注意，商店的折扣对所有用户都相同，以防止价格歧视，但可能会在不同的日子随机变化，不同的商店可能会提供不同的折扣。前两周的数据用于训练，其他数据用于测试。折扣 $\in {0, 1, ..., 6}$ 被当作干预，其中 $T = t$ 表示每笔价格达到给定阈值的订单可获得 $t 折扣。该数据集包含 75 个特征、 1 个干预标签和 2 个响应标签（日常成本 / 订单）。对于二元干预分配问题，我们将$ T=0 $的样本作为对照组，将$ T>0$的样本作为干预组。对于具有多种干预的预算分配问题，预算是指所有商店的总成本，不同的折扣代表不同的干预。该数据集包含410万个样本。

Evaluation Metric. Multiple evaluation metrics are provided for offline evaluation in this experiment.

AUUC (Area under Uplift Curve). A common metric to evaluate uplift models (Rzepakowski and Jaroszewicz 2010). In this experiment, the auuc score is computed by using CausalML packages (Chen et al. 2020).
AUCC (Area under Cost Curve). A similar metric to AUUC, but designed for evaluating the performance to rank ROI of individuals (Du, Lee, and Ghaffarizadeh 2019).
MT-AUCC. It is proposed in this paper and used to evaluate the performance of models to rank marginal utilities of different individuals under different treatments.
EOM (Expected Outcome Metric). Based on RCT data, the expected outcome (reward/cost) can be obtained for arbitrary policy by using the computing methods in (Ai et al. 2022; Zhao, Fang, and Simchi-Levi 2017).

评估指标。 本实验提供了多种评估指标用于离线评估。

AUUC（增益曲线下面积）。评估增益模型的常用指标(Rzepakowski and Jaroszewicz 2010)。在本实验中，auuc分数是使用CausalML包计算的(Chen et al. 2020)。
AUCC（成本曲线下面积）。与AUUC类似的指标，但用于评估对个体ROI进行排序的性能(Du, Lee, and Ghaffarizadeh 2019)。
MT-AUCC。本文提出了该方法，用于评估模型对不同干预下不同个体的边际效用进行排序的性能。
EOM（预期结果指标）。基于RCT数据，可以使用(Ai et al. 2022; Zhao, Fang, and Simchi-Levi 2017)中的计算方法获得任意策略的预期结果（收益/成本）。

Benchmark. For each problem considered in this paper, multiple different models/algorithms are implemented and taken as the benchmarks.

Cost-unaware binary treatment assignment problem
– S-Learner. A single model predicting the response of individuals with/without the treatment. The CATE is computed by $E (Y ∣ X, T = 1) - E (Y ∣ X, T = 0)$ .
– X-Learner. A meta-learner approach proposed in (Kunzel et al. 2019).
– Causal Forest. An uplift model proposed in (Athey, Tibshirani, and Wager 2019). It is implemented here
based on EconML packages (Keith Battocchi 2019).
– DUM. The direct uplift modeling method in this paper.
Cost-aware binary treatment assignment problem
– TPM-SL. The two-phase method which uses two SLearner models to predict the incremental revenue and cost, respectively. Predict ROI of individuals by computing the ratio between these two models.
– Direct Rank. Similar to our work, a loss function is designed for ranking ROI of individuals in this model (Du, Lee, and Ghaffarizadeh 2019). However, we prove that it cannot achieve the correct rank when the loss converges in Appendix E (Zhou et al. 2022).
– DRP. The direct ROI prediction model in this paper.
Budget allocation problem with multiple treatments
– TPM-SL. The two-phase method mentioned in many existing works (Ai et al. 2022; Zhao et al. 2019). In the first stage, we use a S-Learner model to predict the response (reward/cost) of individuals under different treatments. In the second stage, the Lagrangian duality algorithm is developed to compute the approximately optimal solution.
– TPM-CF. Instead of S-Learner, we use Causal Forests to predict the incremental response. It is implemented based on generalized random forests (GRF) in EconML packages (Keith Battocchi 2019), which can also support multiple treatments.
– DPM. The approach in this paper that combines the direct prediction of marginal utilities and Algorithm 2.

**基准。**对于本文考虑的每个问题，都实现了多种不同的模型/算法并将其作为比较基准。

不考虑成本的二元干预分配问题
– S-Learner。一个预测接受/未接受干预的个体反应的单一模型。CATE由 $E (Y ∣ X, T = 1) - E (Y ∣ X, T = 0)$ 计算。
– X-Learner。(Kunzel et al. 2019)中提出的一种元学习器方法。
– 因果森林。(Athey, Tibshirani, and Wager 2019)中提出的一种增益模型。它在此基于EconML包(Keith Battocchi 2019)实现。
– DUM。本文中的直接增益建模方法。
考虑成本的二元干预分配问题
– TPM-SL。两阶段方法，分别使用两个SLearner模型来预测增量收益和成本。通过计算这两个模型之间的比值来预测个体的投资回报率。
– 直接排序。与我们的工作类似，在这个模型中设计了一个损失函数来对个体的投资回报率进行排序(Du, Lee, and Ghaffarizadeh 2019)。然而，我们在附录E中证明，当损失收敛时，它无法达到正确的排序(Zhou et al. 2022)。
– DRP。本文中的直接投资回报率预测模型。
具有多种干预的预算分配问题
– TPM-SL。许多现有作品中提到的两阶段方法(Ai et al. 2022; Zhao et al. 2019)。在第一阶段，我们使用S-Learner模型来预测不同干预下个体的反应（收益/成本）。在第二阶段，开发了拉格朗日对偶算法来计算近似最优解。
– TPM-CF。我们使用因果森林来预测增量响应，而不是S-Learner。它是基于EconML包(Keith Battocchi 2019)中的广义随机森林(GRF)实现的，它还可以支持多种干预。
– DPM。本文的方法结合了边际效应的直接预测和算法2。

The hyperparameters in these algorithms are obtained based on grid search and each data point in the experimental results is computed by running the programs for 20 times.

这些算法中的超参数都是基于网格搜索获得的，实验结果中的每个数据点都是通过运行程序20次来计算的。

Experimental Results. For the cost-unaware binary treatment assignment problem, Fig. 3(a)-3(b) presents the comparison of four uplift models. First of all, our model DUM performs best in both CRITEO-UPLIFT v2 and Marketing data. As the common baseline, the result of S-Learner is not too bad. It is near to our algorithm DUM in both two datasets. For comparison, X-Learner and Causal Forest is not always superior to S-Learner. The former is worse in CRITEO-UPLIFT v2 and the latter is inferior to S-Learner in Marketing data. The detailed results can be found in Table 1 in Appendix F (Zhou et al. 2022).

实验结果。对于不考虑成本的二元干预分配问题，图3(a)-3(b)展示了四种增益模型的比较。首先，我们的模型DUM在CRITEO-UPLIFT v2和Marketing数据中表现最佳。作为常见的基线，S-Learner 的结果还不错。它在两个数据集中都接近我们的算法DUM。相比之下，X-Learner和因果森林并不总是优于S-Learner。前者在CRITEO-UPLIFT v2中表现较差，后者在Marketing数据中不如S-Learner。详细结果可在附录F中的表1中找到(Zhou et al. 2022)。

Due to the robustness of S-Learner, it is still taken as the base model to predict ROI of individuals. As is shown by Fig. $3 (c) - 3 (d)$ , TPM-SL cannot perform well especially in Marketing data at this time. The incorrect loss function of Direct Rank causes that it cannot converge to a stable extreme point and is inferior to our model DRP. In Appendix E (Zhou et al. 2022), we will present the detailed analysis for the convergence of Direct Rank. Compared with TPM-SL and Direct Rank, our model DRP always performs best and achieves significant improvement.

由于S-Learner的鲁棒性，它仍然被用作预测个体投资回报率的基础模型。如图 $3 (c) - 3 (d)$ 所示，TPM-SL目前在Marketing数据中表现不佳。Direct Rank的损失函数不正确，导致它无法收敛到稳定的极值点，并且不如我们的模型DRP。在附录E(Zhou et al. 2022)中，我们将对Direct Rank的收敛进行详细分析。与TPM-SL和Direct Rank相比，我们的模型DRP始终表现最佳并取得显著改进。

Fig. 3(e)-3(f) shows the results of different models and algorithms to solve the budget allocation problem with multiple treatments. Since the tree-based uplift models were often used in many existing works (Ai et al. 2022; Zhao, Fang, and Simchi-Levi 2017) to deal with this problem, we also take TPM-CF as the baseline. Our approach DPM significantly outperforms TPM-SL and TPM-CF in MT-AUCC, which indicates that DPM is better at ranking marginal utilities. We also use EOM to test the incremental reward of different approaches when given different budget in Fig. 3(f). In order to protect the data privacy of this platform, the budget and reward have been normalized. In spite of this, it is still clear that our approach DPM can always help the platform to obtain much more reward under different budget.

图3(e)-3(f)展示了不同模型和算法解决具有多种干预的预算分配问题的结果。由于基于树的增益模型在许多现有工作(Ai et al. 2022; Zhao, Fang, and Simchi-Levi 2017)中经常用于处理此问题，因此我们也将TPM-CF作为基线。我们的方法DPM在MT-AUCC中明显优于TPM-SL和TPM-CF，这表明DPM在对边际效应进行排序方面表现更好。我们还使用EOM来测试图3(f)中给定不同预算时不同方法的增量收益。为了保护该平台的数据隐私，预算和收益已标准化。尽管如此，我们的方法DPM仍然显然可以帮助平台在不同的预算下获得更多的收益。

As is shown by Table 1 in Appendix F (Zhou et al. 2022), all the models proposed in this paper are more stable and have lower variance than other existing works. This is because our models can make a direct prediction for the final objective, and always converge to a stable extreme point.

Table 1
如附录F中的表1所示(Zhou et al. 2022)，本文提出的所有模型都比其它现有模型更稳定，方差更低。这是因为我们的模型可以直接预测最终目标，并且始终收敛到稳定的极值点。

Table 1

Online A/B Test

Setups. We deploy our algorithm (DPM) to support the Money Off campaign in Meituan (a food delivery platform), and conduct an online AB test for four weeks. There are 310k total shops in this experiment and they are randomly partitioned into three groups, named G-DPM, G-TPM and G-Control respectively. The discount $\in {0, 1, ..., 6}$ is taken as the treatment and assigning a shop with treatment $T = t$ means $t cash off for each order whose price meets
a given threshold. Given a limited budget, the objective is to decide the discount every day for each shop so as to maximize the total number of orders and GMV (Gross Merchandise Value) in this platform. Algorithm DPM and TPM-CF are deployed in the experiment groups named G-DPM and G-TPM, respectively. The group G-Control is taken as the control group and does not offer any discount. These groups will be randomly broken up every week. Therefore, the AB experiment is repeated four times and the period of each time is one week.

在线A/B测试

设置。 我们部署了我们的算法(DPM)来支持美团(一个食品配送平台)的满减活动，并进行了为期四周的在线AB测试。本次实验共有310k家商铺，它们被随机分成三组，分别命名为G-DPM、G-TPM和G-Control。折扣 $\in {0, 1, ..., 6}$ 作为干预，分配一个干预 $T = t$ 的商铺意味着每笔价格达到给定阈值的订单可获得$t现金折扣。在有限的预算下，目标是每天为每个商铺决定折扣，以最大化该平台的总订单数量和GMV(商品交易总额)。算法DPM和TPM-CF分别部署在名为G-DPM和G-TPM的实验组中。G-Control组作为对照组，不提供任何折扣。这些组每周随机拆分。因此，AB实验重复四次，每次周期为一周。

Results. Fig. 3(e)-3(f) shows the incremental orders and GMV relative to G-Control in each week. To protect data privacy, all the data points have been normalized that are divided by the incremental orders or GMV of TPM-SL in the first week. The shadow area in Fig. 3(e)-3(f) represents the confidence interval with a confidence level of 0.95, which is calculated by student’s t-test. Compared with TPM-CF, our approach DPM always performs better in incremental orders and is not inferior to it in incremental GMV in each week. To sum up, DPM achieves a significant growth by $14.3$ in incremental orders and $13.6$ in GMV on average.

结果。 图3(e)-3(f)显示了每周相对于G-Control的增量订单和增量GMV。为了保护数据隐私，所有数据点都已归一化，并除以第一周TPM-SL的增量订单或增量GMV。图3(e)-3(f)中的阴影区域表示置信区间，置信度为0.95，由学生t检验计算得出。与TPM-CF相比，我们的方法DPM在增量订单方面始终表现更好，并且在每周的增量GMV方面也不逊色于它。总而言之，DPM在增量订单方面实现了平均 $14.3$ 的显著增长，在GMV方面实现了 $13.6$ 的显著增长。

Conclusion

In this paper, we proposed a novel approach for solving resource allocation problems based on the decision factor. Taking it as the learning objective can avoid alternative mathematical operations performed on the prediction results. This idea was applied to solve two crucial problems in marketing and presented great advantages both theoretically and practically. Large-scale offline simulations and online AB tests validated the effectiveness of our approach.

结论

在本文中，我们提出了一种基于决策因子解决资源分配问题的新方法。以此为学习目标可以避免对预测结果进行其它数学运算。这一想法被应用于解决营销中的两个关键问题，在理论和实践上都表现出了巨大的优势。大规模离线模拟和在线AB测试验证了我们方法的有效性。

Our future work will focus on the application of this approach in more complex marketing scenarios. For example, multiple marketing campaigns may be conducted at the same time and interact with each other. Therefore, deriving the decision factor and conducting direct heterogeneous causal learning in this situation are more challenging.

我们未来的工作将专注于将这种方法应用于更复杂的营销场景。例如，多个营销活动可能同时进行并相互影响。因此，在这种情况下推导决策因子并进行直接的异质因果学习更具挑战性。

References

Ai, M.; Li, B.; Gong, H.; Yu, Q.; Xue, S.; Zhang, Y.; Zhang, Y.; and Jiang, P. 2022. LBCF: A Large-Scale BudgetConstrained Causal Forest Algorithm. In The ACM Web Conference (WWW), 2310–2319.

Athey, S.; Tibshirani, J.; and Wager, S. 2019. Generalized Random Forests. The Annals of Statistics, 47(2): 1148–1178.

Athey, S.; and Wager, S. 2021. Policy Learning with Observational Data. Econometrica, 89(1): 133–161.

Betlei, A.; Diemert, E.; and Amini, M.-R. 2021. Uplift Modeling with Generalization Guarantees. In the ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 55–65.

Chen, H.; Harinen, T.; Lee, J.-Y.; Yung, M.; and Zhao, Z. 2020. CausalML: Python Package for Causal Machine Learning. arXiv:2002.11631.

Diemert Eustache, Betlei Artem; Renaudin, C.; and MassihReza, A. 2018. A Large Scale Benchmark for Uplift Modeling. In the ACM AdKDD and TargetAd Workshop. ACM.

Donti, P.; Amos, B.; and Kolter, J. Z. 2017. Task-Based End-to-End Model Learning in Stochastic Optimization. Advances in Neural Information Processing Systems (NIPS), 30.

Du, S.; Lee, J.; and Ghaffarizadeh, F. 2019. Improve User Retention with Causal Learning. In the ACM SIGKDD Workshop on Causal Discovery, volume 104, 34–49. PMLR.

Elmachtoub, A. N.; and Grigas, P. 2022. Smart ”Predict,then Optimize”. Management Science, 68(1): 9–26.

Hao, X.; Peng, Z.; Ma, Y.; Wang, G.; Jin, J.; Hao, J.; Chen, S.; Bai, R.; Xie, M.; Xu, M.; et al. 2020. Dynamic Knapsack Optimization towards Efficient Multi-Channel Sequential Advertising. In International Conference on Machine Learning (ICML), 4060–4070. PMLR.

Hua, J.; Yan, L.; Xu, H.; and Yang, C. 2021. Markdowns in E-Commerce Fresh Retail: A Counterfactual Prediction and Multi-Period Optimization Approach. In The ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 3022–3031.

Johansson, F.; Shalit, U.; and Sontag, D. 2016. Learning Representations for Counterfactual Inference. In International Conference on Machine Learning (ICML), 3020–3029. PMLR.

Keith Battocchi, M. H. G. L. P. O. M. O. V. S., Eleanor Dillon. 2019. EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation. https://github.com/microsoft/EconML. Version 0.x.

Kunzel, S. R.; Sekhon, J. S.; Bickel, P. J.; and Yu, B. 2019. ¨Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning. The National Academy of Sciences, 116(10): 4156–4165.

Kuusisto, F.; Costa, V. S.; Nassif, H.; Burnside, E.; Page, D.; and Shavlik, J. 2014. Support Vector Machines for Differential Prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 50–65. Springer.

Mandi, J.; Bucarey, V.; Tchomba, M. M. K.; and Guns, T. 2022. Decision-Focused Learning: Through the Lens of Learning to Rank. In International Conference on Machine Learning (ICML), 14935–14947. PMLR.

Nie, X.; and Wager, S. 2021. Quasi-oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2): 299–319.

Polleit, T. 2011. What Can the Law of Diminishing Marginal Utility Teach Us. Mises Institute. Rzepakowski, P.; and Jaroszewicz, S. 2010. Decision Trees for Uplift Modeling. In The IEEE International Conference on Data Mining (ICDM), 441–450. IEEE.

Sekhon, J. S. 2008. The Neyman-Rubin Model of Causal Inference and Estimation via Matching Methods. The Oxford Handbook of Political Methodology, 2: 1–32.

Shah, S.; Wang, K.; Wilder, B.; Perrault, A.; and Tambe, M. 2022. Decision-Focused Learning without DecisionMaking: Learning Locally Optimized Decision Losses. In Advances in Neural Information Processing Systems (NIPS).

Shalit, U.; Johansson, F. D.; and Sontag, D. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms. In International Conference on Machine Learning (ICML), 3076–3085. PMLR.

Wager, S.; and Athey, S. 2018. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523): 1228–1242.

Wilder, B.; Dilkina, B.; and Tambe, M. 2019. Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization. In The AAAI Conference on Artificial Intelligence (AAAI), 1658–1665. AAAI Press.

Xiao, S.; Guo, L.; Jiang, Z.; Lv, L.; Chen, Y.; Zhu, J.; and Yang, S. 2019. Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing. In the ACM International Conference on Information and Knowledge Management (CIKM), 971–980.

Yao, L.; Li, S.; Li, Y.; Huai, M.; Gao, J.; and Zhang, A. 2018. Representation Learning for Treatment Effect Estimation from Observational Data. Advances in Neural Information Processing Systems (NIPS), 31.

Zhang, Y.; Tang, B.; Yang, Q.; An, D.; Tang, H.; Xi, C.; LI, X.; and Xiong, F. 2021. BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market. In Annual Conference on Neural Information Processing Systems (NIPS), 20410–20422.

Zhao, K.; Hua, J.; Yan, L.; Zhang, Q.; Xu, H.; and Yang, C. 2019. A Unified Framework for Marketing Budget Allocation. In the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 1820–1830.

Zhao, Y.; Fang, X.; and Simchi-Levi, D. 2017. Uplift Modeling with Multiple Treatments and General Response Types. In International Conference on Data Mining (ICDM), 588–596. SIAM.

Zhou, H.; Li, S.; Jiang, G.; Zheng, J.; and Wang, D. 2022. Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing. arXiv preprint, arXiv:2211.15728.

Zhou, Z.; Athey, S.; and Wager, S. 2022. Offline MultiAction Policy Learning: Generalization and Optimization.
Operations Research.