【Chapter 10】Zero Code development of Gradient Boosting Decison Tree algorithm_Setosa_DSML

Kenneth風车

已于 2024-12-24 10:41:35 修改

阅读量1.3k

点赞数 38

文章标签：机器学习人工智能低代码数据分析数据挖掘梯度提升决策树

于 2024-12-24 10:41:16 首次发布

本文链接：https://blog.youkuaiyun.com/qq_45586013/article/details/144687061

版权

文章目录

1、Algorithm concept
2、Algorithm principle
3、Advantages and disadvantages of GBDT
- （1）advantages
- （2）disadvantages
4、Implementation of GBDT Classification Task
五、Implementation of GBDT Regression Task
6、summarize

1、Algorithm concept

What is a gradient boosting decision tree?
Gradient Boosting Decison Tree is a member of the Boosting family in ensemble learning.
在这里插入图片描述

Ensemble learning is a method of improving overall predictive performance by combining multiple base learners (models). It integrates multiple learners to form a strong learner, thereby improving the generalization ability and accuracy of the model. The core idea of integrated learning is to use the combination of different models to make up for the shortcomings of a single model. Ensemble learning can be divided into two categories: one is serialization methods: individual learners have strong dependencies and must be generated serially, such as boosting; One type is parallelization methods: individual learners do not have strong dependencies and can generate simultaneously, such as bagging (also known as bootstrap aggregation).
The most famous representative of boosting algorithms is the Adaboost algorithm. The principle of Adaboost is to update the weights of training samples through the error rate of the previous round of weak learners, and continuously iterate to improve model performance.
GBDT is significantly different from the traditional Adaboost algorithm. GBDT also improves the performance of the model through iteration, but it uses the Forward Stagewise Algorithm and its weak learner is limited to CART regression trees. In addition, the iterative approach of GBDT differs from that of Adaboost. The GBDT algorithm process is as follows:
在这里插入图片描述

2、Algorithm principle

（1） GBDT and negative gradient fitting principle

GBDT (Gradient Boosting Decision Tree) is an ensemble learning algorithm that utilizes multiple decision trees to solve classification and regression problems. The core idea is to construct a new decision tree based on the residuals of the previous model round. In order to improve the fitting effect, Friedman proposed using the negative gradient of the loss function to approximate the residuals and fit a new CART regression tree. The expression formula for the negative gradient is:
$r_{t,i} = -\left[\frac{\partial L(y_i, f(x_i))}{\partial f(x_i)}\right]_{f(x) = f_{t-1}(x)}$
Among them$ R {t, i} $represents the negative gradient of the loss function for the i-th sample in the i-th round, where 𝐿 is the loss function and 𝑓 (𝑥) is the predicted value of the model.
In each iteration, we first fit a CART regression tree using sample $(x_i, r_ {t, i}) $. Here, $r_ {t, i} $represents the negative gradient of the 𝑡 th round, which represents the error of the sample. Each leaf node of the regression tree contains a certain range of input data, called the leaf node region $R_ {t, j} $, and the number of leaf nodes is represented by J.
Each leaf node outputs a constant value $c_ {t, j} $, which is obtained by minimizing the loss function. The goal is to find a 𝑐 that minimizes the loss function of all samples in the node, as follows:
$c_{t,j} = \arg\min_c \sum_{x_i \in R_{t,j}} L(y_i, f_{t-1}(x_i) + c)$
Next$ The h-t (x) $is represented as the weighted sum of the output values $c_ {t, j} $of each leaf node, and we obtain the decision tree fitting function for this round as follows:
$h_t(x) = \sum_{j=1}^{J} c_{t,j} I(x \in R_{t,j})$
Among them$ I (x \ in R_ {t, j}) $is an indicator function that indicates whether sample 𝑥 belongs to the leaf node region $R_ {t, j} $.
In each round, the strong learner is an update of the base learner, which is gradually optimized by superimposing the output of the current round’s decision tree on the previous model. The expression of the strong learner obtained in this round is as follows:
$f_t(x) = f_{t-1}(x) + \sum_{j=1}^{J} c_{t,j} I(x \in R_{t,j})$