【Chapter 10】Zero Code development of Gradient Boosting Decison Tree algorithm_Setosa_DSML

1、Algorithm concept

What is a gradient boosting decision tree?
  Gradient Boosting Decison Tree is a member of the Boosting family in ensemble learning.
在这里插入图片描述

  Ensemble learning is a method of improving overall predictive performance by combining multiple base learners (models). It integrates multiple learners to form a strong learner, thereby improving the generalization ability and accuracy of the model. The core idea of integrated learning is to use the combination of different models to make up for the shortcomings of a single model. Ensemble learning can be divided into two categories: one is serialization methods: individual learners have strong dependencies and must be generated serially, such as boosting; One type is parallelization methods: individual learners do not have strong dependencies and can generate simultaneously, such as bagging (also known as bootstrap aggregation).
  The most famous representative of boosting algorithms is the Adaboost algorithm. The principle of Adaboost is to update the weights of training samples through the error rate of the previous round of weak learners, and continuously iterate to improve model performance.
  GBDT is significantly different from the traditional Adaboost algorithm. GBDT also improves the performance of the model through iteration, but it uses the Forward Stagewise Algorithm and its weak learner is limited to CART regression trees. In addition, the iterative approach of GBDT differs from that of Adaboost. The GBDT algorithm process is as follows:
在这里插入图片描述

2、Algorithm principle

(1) GBDT and negative gradient fitting principle

  GBDT (Gradient Boosting Decision Tree) is an ensemble learning algorithm that utilizes multiple decision trees to solve classification and regression problems. The core idea is to construct a new decision tree based on the residuals of the previous model round. In order to improve the fitting effect, Friedman proposed using the negative gradient of the loss function to approximate the residuals and fit a new CART regression tree. The expression formula for the negative gradient is:
r t , i = − [ ∂ L ( y i , f ( x i ) ) ∂ f ( x i ) ] f ( x ) = f t − 1 ( x ) r_{t,i} = -\left[\frac{\partial L(y_i, f(x_i))}{\partial f(x_i)}\right]_{f(x) = f_{t-1}(x)} rt,i=[f(xi)L(yi,f(xi))]f(x)=ft1(x)
   Among them$ R {t, i} $represents the negative gradient of the loss function for the i-th sample in the i-th round, where 𝐿 is the loss function and 𝑓 (𝑥) is the predicted value of the model.
   In each iteration, we first fit a CART regression tree using sample $(x_i, r_ {t, i}) $. Here, $r_ {t, i} $represents the negative gradient of the 𝑡 th round, which represents the error of the sample. Each leaf node of the regression tree contains a certain range of input data, called the leaf node region $R_ {t, j} $, and the number of leaf nodes is represented by J.
   Each leaf node outputs a constant value $c_ {t, j} $, which is obtained by minimizing the loss function. The goal is to find a 𝑐 that minimizes the loss function of all samples in the node, as follows:
c t , j = arg ⁡ min ⁡ c ∑ x i ∈ R t , j L ( y i , f t − 1 ( x i ) + c ) c_{t,j} = \arg\min_c \sum_{x_i \in R_{t,j}} L(y_i, f_{t-1}(x_i) + c) ct,j=argcminxiRt,jL(yi,ft1(xi)+c)
   Next$ The h-t (x) $is represented as the weighted sum of the output values $c_ {t, j} $of each leaf node, and we obtain the decision tree fitting function for this round as follows:
h t ( x ) = ∑ j = 1 J c t , j I ( x ∈ R t , j ) h_t(x) = \sum_{j=1}^{J} c_{t,j} I(x \in R_{t,j}) ht(x)=j=1Jct,jI(xRt,j)
   Among them$ I (x \ in R_ {t, j}) $is an indicator function that indicates whether sample 𝑥 belongs to the leaf node region $R_ {t, j} $.
   In each round, the strong learner is an update of the base learner, which is gradually optimized by superimposing the output of the current round’s decision tree on the previous model. The expression of the strong learner obtained in this round is as follows:
f t ( x ) = f t − 1 ( x ) + ∑ j = 1 J c t , j I ( x ∈ R t , j ) f_t(x) = f_{t-1}(x) + \sum_{j=1}^{J} c_{t,j} I(x \in R_{t,j}) ft(x)=ft1(x)+j=1J

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值