Purpose
**To formalize the impact of a training point on a
prediction, we ask the counterfactual: what would happen
if we did not have this training point, or if the values of this
training point were changed slightly?**
Main method
Answering this question by perturbing the data and retraining the model can be prohibitively expensive. To overcome this problem, we use influence functions, a classic technique from robust statistics (Cook & Weisberg, 1980) that tells us how the model parameters change as we upweight a training point by an infinitesimal amount.
Approach
1.Upweighting a training point
how would the model’s predictions change if we did not have this training point?
1.1删去一个样本后,模型参数的变化(参数增大了多少)
将某个样本z从训练集中删去后,模型新参数与原来参数相比,变化了:
其中:
如果把每个样本都去掉后各自训练一个新模型,再分别计算参数变化,显然需要很大的计算量。influence fucntions可以帮助我们解决这个问题。
首先,考虑将某一个样本z的权重增加一点点,计算新的模型的参数:

该博客探讨了如何使用影响函数来分析训练点对模型预测的影响。主要方法包括通过增权或扰动训练点来观察模型参数和预测误差的变化,以此来理解模型行为、对抗训练样本、检查领域误匹配以及修正错误标签。影响因素分析表明,训练误差大和对参数变化敏感的样本具有较大影响。
最低0.47元/天 解锁文章
3646





